Skip to content

ANNEXA wiki

Thomas Derrien edited this page Sep 19, 2024 · 3 revisions

ANNEXA report description

See an example of report.

ANNEXA reports QC visuals plots at 3 levels of annotation : gene, transcript and exon.

Nomenclature of input files :

  • bamsample: corresponds to the input bam file(s) (e.g. 501Mel_1-3_OSS_CM-R_R1.sorted.bam)
  • refannot: corresponds to the input reference annotation used to launch ANNEXA (e.g. gencode.v46.annotation.gtf.gz)
  • refgenome: corresponds to the input reference genome used to launch ANNEXA (e.g. Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa)

⚠️ Be sure to match genome assembly version and transcriptome file (with or without chr).

GENE characterization

  • page3 : Number of genes

Number of genes per biotypes (lncRNAs/mRNAs) and source (from refannot i.e. known or novel).

  • Page 4: Gene length distribution

Distribution of gene lengths per biotypes (lncRNAs/mRNAs) and source. Gene lengths is computed by summing exon lengths of the longest isoform.

  • Page 5: Proportion of mono versus multi-isoform genes

Proportion of genes with 1 (strides) versus at least 2 (no stride) isoforms, according to biotype and source.

  • Page 6: Distribution of gene counts

Distribution of gene expression per biotypes (lncRNAs/mRNAs) and source accross all bamsample (sum of gene_count).

  • Page 7: Breadth of expression

Number of genes expressed in N samples (gene_count >1)

  • Page 8: Distribution of gene counts wrt Breadth of expression

Number of novel genes (log) based on isoform number and number of samples with gene_count >1

  • Page 9: Number of 5' and/or 3' gene extensions

Number of known genes (wrt biotypes) with extension thanks to novel isoforms in the 3'-end (1st stride), in 5'-end (2nd stride) and in both 5' and 3'-ends (two strides).

  • Page 10: Distribution of gene extensions lengths

Distribution of gene extensions lengths (at the genomic level) of gene from input annotation with novel isoforms.

TRANSCRIPT level

  • Page 12: Number of transcripts

Number of transcripts, according to biotype and source, which are already in the input annotation (darker), novel isoforms of known genes (intermediate) and novel transcripts from novel gene (lighter).

  • Page 13: Transcript length distribution

Same as page 4 on transcript level.

  • Page 14: Proportion of mono versus multi exonic transcripts

Proportion of transcripts with 1 (stide) versus >= 2 exons (normal), according to source and biotype.

EXON level

  • Page 16: Number of exon

Number of exons per biotypes (lncRNAs/mRNAs) and source (from refannot i.e. known or novel).

  • Page 17: Exon length distribution

Same as page 4 on exon level.

Clone this wiki locally