From cff1b4f39ca50dd7feb40133d57bff34f64556c1 Mon Sep 17 00:00:00 2001 From: Gammerdinger Date: Fri, 7 Jun 2024 02:21:48 -0400 Subject: [PATCH] Rought draft of Oncoprint creation --- lessons/13_oncoprint_creation.md | 85 +++++++++++++++++++++++++++++++- 1 file changed, 84 insertions(+), 1 deletion(-) diff --git a/lessons/13_oncoprint_creation.md b/lessons/13_oncoprint_creation.md index dedc515..8bb64fe 100644 --- a/lessons/13_oncoprint_creation.md +++ b/lessons/13_oncoprint_creation.md @@ -1,4 +1,10 @@ -# Oncoprint Creation +--- +title: "Oncoprint Creation" +author: "Will Gammerdinger, Meeta Mistry" +date: "June 6, 2024" +--- + +Approximate time: 15 minutes ## Learning Objectives @@ -7,17 +13,51 @@ ## Oncoprints +

+ +

+ +An Oncoprint is a visual summary of the genomic alterations for a set of genes across a set of samples. Oncoprints can visualize alterations in: + +- Copy Number +- Insertions/deletions +- SNPs +- And more + +There are several tools for creating Oncoprints, such as [oncoplot within maftools](https://bioconductor.org/packages/devel/bioc/vignettes/maftools/inst/doc/oncoplots.html) but in this workshop, we are going to use the [Oncoprinter resource on cBioPortal](https://www.cbioportal.org/oncoprinter). ## Wrangling our data for an Oncoprint +The first step in using the Oncoprinter for our data is to fortmat the data in the form that the Oncoprinter will recognize. The Oncoprinter is expecting our input to have four columns: + +- Name of the sample +- Gene Symbol +- Description of the alteration event + - If it is a mutation, then a description of the amino acid change + - If it is a copy number variant, then is it a amplification or deletion +- Classification of the variant + - `MISSENSE` for missense mutations + - `INFRAME` for inframe mutations + - `TRUNC` for truncation mutations (frameshift mutations and stop codon gains) + - `PROMOTER` for promoter mutations + - `OTHER` for any other kind of mutations + +In our code below we are just going to focus on missense, frameshift, stop codon gain and inframe mutations. However, for your data, you could be interested in other types of mutational events, so please modify the code as needed. We are going to move into our scripts directory and create a bash script to wrangle our VCF into the format required by Oncoprinter: + +> Note: There is a great resource from the Human Genome Variation Society describing much of the nomenclature used to describe varaints [here](https://www.hgvs.org/mutnomen/recs-prot.html). + ``` cd ~/variant_calling/scripts/ vim VCF_to_oncoprint.sh ``` +We can copy and paste this code into our bash script: + ``` #!/bin/bash # This script was wrttien by the Training Team at the Harvard Chan Bioinformatics Core on June 6th, 2024 as part of training materials for the Introduction to Variant Analysis workshop. +# This is a working sample of code you might want to consider, when developing an Oncoprint for use in cBioPortal's Oncoprinter. +# You may need to alter this code for your needs. # USAGE: sh VCF_to_oncoprint.sh # Assign variable for input and output @@ -67,11 +107,54 @@ java -jar $SNPEFF/SnpSift.jar filter \ sed 's/missense_variant.*/MISSENSE/g' > $OUTPUT_FILE ``` +There are four main parts to the above code: + +- Extracting genes amino acid alteration and SnpEff effect from the VCF file like we have previously practiced using SnpSift +- Adding a sample name column +- Altering the 3-letter amino acid abbreviations to single-letter abbreviations (this is optional) +- Changing the SnpEff effects into the format that Oncoprinter recognizes + +In order to run the code, we will need to execute: + ``` sh VCF_to_oncoprint.sh /n/scratch/users/${USER:0:1}/${USER}/variant_calling/vcf_files/mutect2_syn3_normal_syn3_tumor_hg38-pass-filt-LCR.pedigree_header.snpeff.dbSNP.vcf syn3 ``` +We can now inspect our new Oncoprinter formatted text file with: + +``` +less /n/scratch/users/${USER:0:1}/${USER}/variant_calling/vcf_files/mutect2_syn3_normal_syn3_tumor_hg38-pass-filt-LCR.pedigree_header.snpeff.dbSNP.oncoprint.txt +``` + +Once you have done this for a given sample, you would use the `cat` command to combine all of your samples together into a single text file. However, we will just be working with a single sample in this example since we don't have access to other samples. + +During this process, you would also likely subset the output to your genes of interest. In order to simulate this, we are going going to grab the first handful of genes in the output to copy and paste into the Oncoprinter input field: + +

+ +

+ +Once you have placed the input into the Oncoprinter's input field, you can scroll to the bottom of the page and clikc the the `Submit` button. This will generate an Oncoprint for you. + +

+ +

+ +In order to export the Oncoprint, you can click on the `Download` dropdown and select the file format that you'd like to use: + +

+ +

+ +Once we have downloaded it, we can inspect it on our computer. It should look like: + +

+ +

+ +We have now made our first Oncoprint! If you would like to see what a Oncoprint might look like with more samples, you can return to the input page and click on `Load example data` and repeat the process. This example data has more samples than our single example dataset and has a more robust set of annotations, so it can give you a sense for the types of anotations that you can include in your Oncoprint and how to properly format those annotations. +[Back to Schedule](../schedule/README.md) ***