Detection of microproteins in fusion-positive rhabdomyosarcoma

This repository holds all the scripts I have used for the characterisation of the FP-specific transcriptome and translatome and its downstream analyses. The goal was to find genes that were driven by the protein produced by the fusion of either PAX3 / PAX7 with FOXO1. I have used a combination of in-house RNA-seq from the Maxima, St Jude pediatric hospital and tumor organoid models generated by the Drost lab.

All tools were ran on our Utrecht HPC using containers in singularity. The following docker images were used. They were extracted from the dockerhub if no specific location was provided.

Tool name	Version	Docker link
MultiQC	1.11	nanozoo/multiqc:1.11--9dfdee6
Cutadapt	3.4	quay.io/biocontainers/cutadapt:3.4--py37h73a75cf_1
FastQC	0.11.9	staphb/fastqc:0.11.9
TrimGalore	0.6.6	vanheeschlab/trimgalore:0.6.6
STAR	2.7.8a	mateongenaert/star:2.7.8a
samtools	1.12	staphb/samtools:1.12
stringtie	2.1.5	bschiffthaler/stringtie:2.1.5
gffcompare	0.12.6	quay.io/biocontainers/gffcompare:0.12.6--h4ac6f70_2
gffread	0.12.6	bschiffthaler/gffread:0.12.6
salmon	1.8.0	combinelab/salmon:1.8.0
howarewestrandedhere	1.0.1	vanheeschlab/howarewestrandedhere:1.0.1
bowtie2	2.4.2	quay.io/biocontainers/bowtie2:2.4.2--py37he8e2a3f_2
ORFquant	R v4.1.2	vanheeschlab/orfquant:4.1.2a
bedtools	2.31.0	pegi3s/bedtools:latest
MACS2	2.2.7.1	fooliu/macs2:version-2.2.7.1

When I started really diving into the project, I created rms_analysis as a catch-all for most files I have generated to have them in a neat tidy location for myself. I have documented which files are pulled from where for the next person to take over. In addition, I have provided some information regarding the code sub-directories.

01: RNA-seq

The first step was to find new transcripts and genes present in the tumor RNA-seq. After assembling the transcriptome, we quantified and analysed the expression of the RMS transcriptome against various cohorts, including GTEx, EVO-DEVO from the Kaessman lab and other in-house tumor cohorts. We established RMS-specific genes using the thresholds and filters established in the quantification part.

annotation
- The previous annotation that was generated for the project was not compatible with the containerised R version. These scripts generate a new custom annotation package for this analysis.
Correlations
- Create gene-gene correlations to see which genes are similarly expressed
Figures
- Markdown to generate various figures (heatmap, dot plots, volcano plots).
QC
- Scripts which visualise various QC parameters.
quant_all_cohorts
- Combine salmon quant files into single R object for downstream analysis.
quantification
- Code to run Salmon quant for samples.
rnaseq_pipeline
- Pipeline used to analyse the RNA-seq samples from a previous folder.
starfusion
- Small script to run containerised version of STARfusion to check fusion status of the samples.
transcriptome_characterisation
- Small script to visualise certain aspects of the transcriptome.

02: Ribo-seq

With the transcriptome generated in the previous section, I was able to look for new open reading frames (ORFs) in the ribo-seq generated using both patient tissue and tumor organoid models using both PRICE and ORFquant. The ORF calls were harmonised and the expression of each ORF in every sample was quantified using the in-frame P-sites. Using specific filters, I eventually created a list of FP-RMS translated (non-canonical) ORFs for further investigation.

orfquant_merged
- Sub-step of the pipeline to merge all P-sites for ORF calling using ORFquant.
riboseq_pipeline
- The pipeline used to process the ribo-seq samples for ORF calling.
QC
- Scripts which visualise various QC parameters.
price_pipeline
- Sub-step of the pipeline specifically to allow the output of the normal ribo-seq pipeline we use to be used for PRICE with an additional STAR alignment step.
orf_annotation
- Code to re-annotate ORFs based on sequence and coordinate overlap between Ensembl / UniProt annotated protein sequences.
orf_selection
- Markdown which specifies under which parameters I have selected ORFs for follow-up studies.

03: Prediction

This is a subset of Amalia's pipeline to check for interesting characteristics of the found new ncORFs. So far, I have only looked at the MHC predictions with netMHCpan of the ncORFs and used the linked peptides to select potential candidates for further investigation for immunotherapy.

04: deltaTE

The idea was to look at translational efficiency of the tumor organoid models contrasting FN and FP samples. However, due to time constraints, this never took off. The ideas was to convert the CDS regions of the translated ORFs to sequences and use those as input for salmon on both the RNA-seq and ribosome profiling of the tumor organoid samples.

integrated_omics
- Some code I've written where multiple data streams were connected
rnaseq_TE
- Pipeline to align RNA-seq reads the exact same way as ribo-seq reads. Did not fulfill the requirements we needed for direct RNA-seq and ribo-seq comparisons.
salmon_te_quant
- Actual TE processing pipeline, only required processed ORFs for the loci to quantify in both RNA-seq and ribosome profiling data.

05: ChIP-seq

Using a particular information-rich dataset for RMS tissue and cell-lines, I was able to generate IGV visualisation tracks using the code in this section.

06: HLA typing

We were interested in the HLA types of the tumor organoid models for downstream wet-lab validations of the predicted MHC binders in package 03. This pipeline uses arcasHLA to call the found HLA loci.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
01_rnaseq/scripts		01_rnaseq/scripts
02_riboseq		02_riboseq
03_prediction/scripts		03_prediction/scripts
04_deltaTE		04_deltaTE
05_chipseq		05_chipseq
06_hla_typing		06_hla_typing
deseq2/scripts		deseq2/scripts
netmhcpan_single/scripts		netmhcpan_single/scripts
poster		poster
qc_plots		qc_plots
quantification/scripts		quantification/scripts
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Detection of microproteins in fusion-positive rhabdomyosarcoma

01: RNA-seq

02: Ribo-seq

03: Prediction

04: deltaTE

05: ChIP-seq

06: HLA typing

About

Releases

Packages

Languages

jvandinter/rms_analysis

Folders and files

Latest commit

History

Repository files navigation

Detection of microproteins in fusion-positive rhabdomyosarcoma

01: RNA-seq

02: Ribo-seq

03: Prediction

04: deltaTE

05: ChIP-seq

06: HLA typing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages