Skip to content

Transcript Assembly with Cufflinks

Jeanie Lim edited this page Jul 13, 2016 · 12 revisions

Progress

Issue

Cufflinks

Cufflinks assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples. It accepts aligned RNA-Seq reads and assembles the alignments into a parsimonious set of transcripts. Cufflinks then estimates the relative abundances of these transcripts based on how many reads support each one, taking into account biases in library preparation protocols.

Running Cufflinks

Cufflinks command usage:

cufflinks [options] <aligned_reads.(sam/bam)>

Example of basic cufflinks command:

cufflinks -G <gtffile> -o <outputdirectory> accepted_hits.bam

Input:

  • accepted_hits.bam : bam file produced by tophat
  • gtffile : gtf file with transcript information
  • outputdirectory: Directory to place output files in

Output: See link for more details on Cufflinks output files format

Cufflinks produces three output files:

  • transcripts.gtf : This GTF file contains Cufflinks’ assembled isoforms.
  • isoforms.fpkm_tracking : This file contains the estimated isoform-level expression values in the generic FPKM Tracking Format.
  • genes.fpkm_tracking : This file contains the estimated gene-level expression values in the generic FPKM Tracking Format.

Running on MGHPCC

Step 1| Read alignment with tophat2 which we did in previous section.

Step 2| Transcript Assembly with cufflinks

To assemble transcripts for each sample, first create a script, cufflinks_job.sh in the current directory where we have the Tophat2 outputs.

#!/bin/bash

#BSUB -q short
#BSUB -W 5:00
#BSUB -R "rusage[mem=5125]"
#BSUB -J "cufflinks_job"
#BSUB -o cufflinks.out
#BSUB -e cufflinks.err

module load samtools/1.3
module load cufflinks/2.2.1

cufflinks -p 8 -o C1_R1_clout C1_R1_thout/accepted_hits.bam
cufflinks -p 8 -o C1_R2_clout C1_R2_thout/accepted_hits.bam
cufflinks -p 8 -o C1_R3_clout C1_R3_thout/accepted_hits.bam
cufflinks -p 8 -o C2_R1_clout C2_R1_thout/accepted_hits.bam
cufflinks -p 8 -o C2_R2_clout C2_R2_thout/accepted_hits.bam
cufflinks -p 8 -o C2_R3_clout C2_R3_thout/accepted_hits.bam

Again, this has been run for you and the generated output is in your directory Submit the job:

bsub < cufflinks_job.sh

When the job is done, go to one of the output directory, say C1_R1_clout

cd ./C1_R1_clout
ls 
# You should see all the files as listed below:
genes.fpkm_tracking  isoforms.fpkm_tracking  transcripts.gtf
```

### Step 3| Merging assemblies using cuffmerge 
Create a file listing the paths of all per-sample transcripts.gtf files, then pass that to cuffmerge:

`nano assemblies.txt`

Write in the file the following lines and save it.
```
./C1_R1_clout/transcripts.gtf 
./C2_R2_clout/transcripts.gtf 
./C1_R2_clout/transcripts.gtf 
./C2_R1_clout/transcripts.gtf 
./C1_R3_clout/transcripts.gtf 
./C2_R3_clout/transcripts.gtf
```

Then, run Cuffmerge on all your assemblies to create a single merged transcriptome annotation: 
```
cuffmerge -g genes.gtf -s genome.fa -p 8 assemblies.txt
```

Take a look at the output files produced by cuffmerge in `./merged_sam`
```
ls ./merged_sam
# You should see all the files as listed below:
genes.fpkm_tracking
isoforms.fpkm_tracking
logs 
merged.gtf
skipped.gtf
tmp
transcripts.gtf
```

The most important file is merged.gif, which contains the consensus transcriptome annotations cuffmerge has calculated.

Next up, we will identify differentially expressed genes and isoforms using cuffdiff.

---

| [[Previous Section|Read Alignment with TopHat2]] | [[This Section|Transcript Assembly with Cufflinks]] | [[Next Section|Differential Analysis with Cuffdiff]] |
|:------------------------------------:|:--------------------------:|:--------------------------------------------:|
| [[Read Alignment with TopHat2]]| [[Transcript Assembly with Cufflinks]]| [[Differential Analysis with Cuffdiff]]   
Clone this wiki locally