Somatic variants report

Coverage and total number of variants

Sample name: COLO829_60-30
Mean tumor coverage (fold): 63.05
Mean normal coverage (fold): 32.82
Total number of small variants: 45086
- Total number of SNV: 43042
- Total number of small deletion: 1629
- Total number of small insertion: 415
Total number of structural variants (SV): 148

BND	DEL	DUP	INS	INV
24	59	9	19	14

Tumor purity and ploidy

Note that purity and ploidy estimation can be unreliable at low coverage (<30X) and low tumor purity (<50%)
Estimated tumor purity (Purple): 0.99
Estimated tumor ploidy (Purple): 3
Inferred gender (Purple): MALE
Whole-genome doubling (Purple): TRUE

Homologous recombination deficiency prediction (CHORD HRD)

CHORD has not been tested extensively on long-reads dataset, so the prediction may not be accurate.
In particular, we observed that CHORD can predict wrong results for samples with < 15X effective tumor coverage (effective tumor coverage = tumor coverage * tumor purity).

Sample	Probability of BRCA1-type HRD	Probability of BRCA2-type HRD	Probability of HRD	HRD status	HRD type	Remarks on HRD status	Remarks on HRD type
COLO829_60-30	0	0	0	HR_proficient	none	NA	NA

Whole-genome copy number profile (Purple)

For visualization purpose, if the major copy number is more than 5, the plot is capped at 5.

Small variants (SNV/INDEL) coverage and variant allele frequency (VAF) distribution

Mutational signatures

Mutational signature is estimated using R package MutationalPattern based on SNVs only (INDELs are ignored).

Notes on small variants (SNV/INDEL) filtering

Variants are filtered with any of the following criteria:
- IMPACT is HIGH
- Existing_variation contains COS (COSMIC variants)
- CLIN_SIG contains pathogenic
- CANCER_TYPE is not NA (variants that are in IntOGen Cancer Gene Census)
- MAX_AF (maximum population allele frequency) is less than 3%
CANCER_TYPE_ROLE and CANCER_TYPE_CGC_GENE are merged columns from CANCER_TYPE, ROLE and CGC_CANCER_GENE. These columns are collapsed into single entries separated by semicolon. E.g. CANCER_TYPE = “Breast;Prostate” and ROLE - “LoF;Act” means that the gene is a LoF in breast cancer and an Act in prostate cancer. This is done so that the table is more readable.

Small variants (SNV/INDEL) table

Notes on structural variants (SVs)

SVs are filtered to only those that are part of the IntOGen Cancer Gene Census (CGC)
Annotation based on AnnotSV. However to make the output readable some columns with very long information (e.g. “_coord” and “_source”) are removed. Please refer to original AnnotSV output for more information.
Capital letter columns are from IntOGen CGC. Please see README from the IntOGen release for more information.
- CANCER_TYPE_ROLE and CANCER_TYPE_CGC_GENE are merged columns from CANCER_TYPE, ROLE and CGC_CANCER_GENE. These columns are collapsed into single entries separated by semicolon. E.g. CANCER_TYPE = “Breast;Prostate” and ROLE - “LoF;Act” means that the gene is a LoF in breast cancer and an Act in prostate cancer. This is done so that the table is more readable.
Each SV can affect multiple genes. AnnotSV “splits” the different genes into different entries. This is why there are multiple rows with the same AnnotSV_ID.
ALT allele for insertion is hidden as “Too long” in the table. Please refer to the original AnnotSV output for more information.
Note that Severus can call duplication as BND event, and AnnotSV has a tendency to annotate these as DEL event since it doesn’t make use of the “STRAND” information. Therefore, the “SV_type” column is not very accurate for BND events (You will recognize these with SEVERUS_BND in the ID column)
The “SAMPLE” column represents the FORMAT column in the VCF. For Severus this is “GT:GQ:VAF:hVAF:DR:DV”

Structural variants (SVs) table

Notes on DMR filtering

The table shows DMRs overlapping with promoters of genes in the IntOGen Cancer Gene Census (CGC) in the pipeline output generated using DSS.
Only DMRs with nCG >= 50 and are overlapping with known promoter regions (annotated using annotatr) are shown. There are other annotated regions in the pipeline output such as exonic and intronic CpG islands, but these are not shown.
meanMethyl1 refers to the mean methylation level in tumor.
meanMethyl2 refers to the mean methylation level in normal.
length refers to the length of the DMR.
nCG refers to then number of CpG sites in the DMR. By default the workflow requires at least 50 CpG sites in any DMR region.
areaStat refers to the area statistic of the DMR. The larger the area statistic, the more significant the DMR is. annot.X columns are produced by annotatr and all upper-case columns are extracted from IntOGen Compendium of Cancer Genes TSV file.

Table of DMRs overlapping with promoters of IntOGen CGC genes