-
Notifications
You must be signed in to change notification settings - Fork 5
Transcript Assembly with Cufflinks
Progress
- populate this page accordingly https://wikis.utexas.edu/display/bioiteam/Differential+expression+with+splice+variant+analysis+aug2012#Differentialexpressionwithsplicevariantanalysisaug2012-Step2:Runcufflinks
- more on exercises https://wikis.utexas.edu/display/bioiteam/Tuxedo+Suite+For+Splice+Variant+Analysis+and+Identifying+Novel+Transcripts+II
- examining output file https://wikis.utexas.edu/display/bioiteam/Introduction+to+RNA+Seq+Short+Course+Commands *https://wiki.hpcc.msu.edu/display/TEAC/2015-09-21%3A+RNA-Sequencing+Tools?src=search
Issue
- http://seqanswers.com/forums/showthread.php?t=31074
- https://groups.google.com/forum/#!topic/tuxedo-tools-users/CnCmSlvC9Xc
- http://seqanswers.com/forums/showthread.php?t=31074
Cufflinks assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples. It accepts aligned RNA-Seq reads and assembles the alignments into a parsimonious set of transcripts. Cufflinks then estimates the relative abundances of these transcripts based on how many reads support each one, taking into account biases in library preparation protocols.
Cufflinks command usage:
cufflinks [options] <aligned_reads.(sam/bam)>
Example of basic cufflinks command:
cufflinks -G <gtffile> -o <outputdirectory> accepted_hits.bam
Input:
-
accepted_hits.bam
: bam file produced by tophat -
gtffile
: gtf file with transcript information -
outputdirectory
: Directory to place output files in
Output: See link for more details on Cufflinks output files format
Cufflinks produces three output files:
-
transcripts.gtf
: This GTF file contains Cufflinks’ assembled isoforms. -
isoforms.fpkm_tracking
: This file contains the estimated isoform-level expression values in the generic FPKM Tracking Format. -
genes.fpkm_tracking
: This file contains the estimated gene-level expression values in the generic FPKM Tracking Format.
Step 1| Read alignment with tophat2 which we did in previous section.
To assemble transcripts for each sample, first create a script, cufflinks_job.sh
in the current directory where we have the Tophat2 outputs.
#!/bin/bash
#BSUB -q short
#BSUB -W 5:00
#BSUB -R "rusage[mem=5125]"
#BSUB -J "cufflinks_job"
#BSUB -o cufflinks.out
#BSUB -e cufflinks.err
module load samtools/1.3
module load cufflinks/2.2.1
cufflinks -p 8 -o C1_R1_clout C1_R1_thout/accepted_hits.bam
cufflinks -p 8 -o C1_R2_clout C1_R2_thout/accepted_hits.bam
cufflinks -p 8 -o C1_R3_clout C1_R3_thout/accepted_hits.bam
cufflinks -p 8 -o C2_R1_clout C2_R1_thout/accepted_hits.bam
cufflinks -p 8 -o C2_R2_clout C2_R2_thout/accepted_hits.bam
cufflinks -p 8 -o C2_R3_clout C2_R3_thout/accepted_hits.bam
Again, this has been run for you and the generated output is in your directory Submit the job:
bsub < cufflinks_job.sh
When the job is done, go to one of the output directory, say C1_R1_clout
cd ./C1_R1_clout
ls
# You should see all the files as listed below:
genes.fpkm_tracking isoforms.fpkm_tracking transcripts.gtf
```
### Step 3| Merging assemblies using cuffmerge
Create a file listing the paths of all per-sample transcripts.gtf files, then pass that to cuffmerge:
`nano assemblies.txt`
Write in the file the following lines and save it.
```
./C1_R1_clout/transcripts.gtf
./C2_R2_clout/transcripts.gtf
./C1_R2_clout/transcripts.gtf
./C2_R1_clout/transcripts.gtf
./C1_R3_clout/transcripts.gtf
./C2_R3_clout/transcripts.gtf
```
Then, run Cuffmerge on all your assemblies to create a single merged transcriptome annotation:
```
cuffmerge -g genes.gtf -s genome.fa -p 8 assemblies.txt
```
Take a look at the output files produced by cuffmerge in `./merged_sam`
```
ls ./merged_sam
# You should see all the files as listed below:
genes.fpkm_tracking
isoforms.fpkm_tracking
logs
merged.gtf
skipped.gtf
tmp
transcripts.gtf
```
The most important file is merged.gif, which contains the consensus transcriptome annotations cuffmerge has calculated.
Next up, we will identify differentially expressed genes and isoforms using cuffdiff.
---
| [[Previous Section|Read Alignment with TopHat2]] | [[This Section|Transcript Assembly with Cufflinks]] | [[Next Section|Differential Analysis with Cuffdiff]] |
|:------------------------------------:|:--------------------------:|:--------------------------------------------:|
| [[Read Alignment with TopHat2]]| [[Transcript Assembly with Cufflinks]]| [[Differential Analysis with Cuffdiff]]
6-iii. Integrated assignment answers
#Table of Contents
- Module 0 Setting Up for Data Analysis
- Introduction to High Performance Computing Cluster
- Connecting to MGHPCC
- Computing Environment
- Unix Tutorial Part 1: UNIX Bootcamp
- Unix Tutorial Part 2: Shell Scripting
- Unix Tutorial Practice
- Submitting computing jobs to HPC using LSF
- Ignore: Git Tutorial
- Module 1 Introduction/ Overview
- Overview of RNA-seq Experiment
- RNA-Seq Analysis Pipeline
- RNA-Seq Input Data
- RNA-seq File Formats and Software-Specific Files
- Getting Data for Analysis
- Module 2 Quality Control
- Module 3 Tuxedo Pipeline
- The Tuxedo Pipeline
- Read Alignment with TopHat2
- Transcript Assembly with Cufflinks
- Differential Analysis with Cuffdiff
- Visualization with CummeRbund
- Resources and Reference