Skip to content

Commit

Permalink
Rought draft of Oncoprint creation
Browse files Browse the repository at this point in the history
  • Loading branch information
Gammerdinger authored Jun 7, 2024
1 parent 341ef0a commit cff1b4f
Showing 1 changed file with 84 additions and 1 deletion.
85 changes: 84 additions & 1 deletion lessons/13_oncoprint_creation.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,10 @@
# Oncoprint Creation
---
title: "Oncoprint Creation"
author: "Will Gammerdinger, Meeta Mistry"
date: "June 6, 2024"
---

Approximate time: 15 minutes

## Learning Objectives

Expand All @@ -7,17 +13,51 @@
## Oncoprints


<p align="center">
<img src="../img/cbioportal_logo.png" width="500">
</p>

An Oncoprint is a visual summary of the genomic alterations for a set of genes across a set of samples. Oncoprints can visualize alterations in:

- Copy Number
- Insertions/deletions
- SNPs
- And more

There are several tools for creating Oncoprints, such as [oncoplot within maftools](https://bioconductor.org/packages/devel/bioc/vignettes/maftools/inst/doc/oncoplots.html) but in this workshop, we are going to use the [Oncoprinter resource on cBioPortal](https://www.cbioportal.org/oncoprinter).

## Wrangling our data for an Oncoprint

The first step in using the Oncoprinter for our data is to fortmat the data in the form that the Oncoprinter will recognize. The Oncoprinter is expecting our input to have four columns:

- Name of the sample
- Gene Symbol
- Description of the alteration event
- If it is a mutation, then a description of the amino acid change
- If it is a copy number variant, then is it a amplification or deletion
- Classification of the variant
- `MISSENSE` for missense mutations
- `INFRAME` for inframe mutations
- `TRUNC` for truncation mutations (frameshift mutations and stop codon gains)
- `PROMOTER` for promoter mutations
- `OTHER` for any other kind of mutations

In our code below we are just going to focus on missense, frameshift, stop codon gain and inframe mutations. However, for your data, you could be interested in other types of mutational events, so please modify the code as needed. We are going to move into our scripts directory and create a bash script to wrangle our VCF into the format required by Oncoprinter:

> Note: There is a great resource from the Human Genome Variation Society describing much of the nomenclature used to describe varaints [here](https://www.hgvs.org/mutnomen/recs-prot.html).
```
cd ~/variant_calling/scripts/
vim VCF_to_oncoprint.sh
```

We can copy and paste this code into our bash script:

```
#!/bin/bash
# This script was wrttien by the Training Team at the Harvard Chan Bioinformatics Core on June 6th, 2024 as part of training materials for the Introduction to Variant Analysis workshop.
# This is a working sample of code you might want to consider, when developing an Oncoprint for use in cBioPortal's Oncoprinter.
# You may need to alter this code for your needs.
# USAGE: sh VCF_to_oncoprint.sh <INPUT_VCF_FILE> <SAMPLE_NAME>
# Assign variable for input and output
Expand Down Expand Up @@ -67,11 +107,54 @@ java -jar $SNPEFF/SnpSift.jar filter \
sed 's/missense_variant.*/MISSENSE/g' > $OUTPUT_FILE
```

There are four main parts to the above code:

- Extracting genes amino acid alteration and SnpEff effect from the VCF file like we have previously practiced using SnpSift
- Adding a sample name column
- Altering the 3-letter amino acid abbreviations to single-letter abbreviations (this is optional)
- Changing the SnpEff effects into the format that Oncoprinter recognizes

In order to run the code, we will need to execute:

```
sh VCF_to_oncoprint.sh /n/scratch/users/${USER:0:1}/${USER}/variant_calling/vcf_files/mutect2_syn3_normal_syn3_tumor_hg38-pass-filt-LCR.pedigree_header.snpeff.dbSNP.vcf syn3
```

We can now inspect our new Oncoprinter formatted text file with:

```
less /n/scratch/users/${USER:0:1}/${USER}/variant_calling/vcf_files/mutect2_syn3_normal_syn3_tumor_hg38-pass-filt-LCR.pedigree_header.snpeff.dbSNP.oncoprint.txt
```

Once you have done this for a given sample, you would use the `cat` command to combine all of your samples together into a single text file. However, we will just be working with a single sample in this example since we don't have access to other samples.

During this process, you would also likely subset the output to your genes of interest. In order to simulate this, we are going going to grab the first handful of genes in the output to copy and paste into the Oncoprinter input field:

<p align="center">
<img src="../img/Oncoprinter_input.png" width="600">
</p>

Once you have placed the input into the Oncoprinter's input field, you can scroll to the bottom of the page and clikc the the `Submit` button. This will generate an Oncoprint for you.

<p align="center">
<img src="../img/Oncoprinter_output.png" width="600">
</p>

In order to export the Oncoprint, you can click on the `Download` dropdown and select the file format that you'd like to use:

<p align="center">
<img src="../img/Oncoprinter_save_output.png" width="600">
</p>

Once we have downloaded it, we can inspect it on our computer. It should look like:

<p align="center">
<img src="../img/Oncoprint_final.png" width="600">
</p>

We have now made our first Oncoprint! If you would like to see what a Oncoprint might look like with more samples, you can return to the input page and click on `Load example data` and repeat the process. This example data has more samples than our single example dataset and has a more robust set of annotations, so it can give you a sense for the types of anotations that you can include in your Oncoprint and how to properly format those annotations.

[Back to Schedule](../schedule/README.md)

***

Expand Down

0 comments on commit cff1b4f

Please sign in to comment.