Quality Control (QC) of human genome sequencing and exome + sequencing data is necessary to ensure they are of sufficient quality + for downstream analyses. While several QC tools are available to + measure quality parameters at various levels post-sequencing, their + output needs to be reviewed and interpreted in a very manual and + time-consuming process. Such manual review is a major challenge + towards standardization and consistency, as the process can be + subjective depending on the reviewer. To address these difficulties, + we have developed QuaC, which implements, integrates, and standardizes + QC best practices at our Center. It performs three major steps: (1) + runs several QC tools using data produced by the read alignment (BAM) + and small variant calling (VCF) as input and optionally accepts QC + output for raw sequencing reads (FASTQ); (2) executes QuaC-Watch to + perform QC checkup based on the expected thresholds for quality + metrics; and (3) aggregates QC metrics produced by all the QC tools as + well as QuaC-Watch results into single, self-contained MultiQC report, + both at the per-sample and across-project levels. This report provides + aggregate summaries for all samples within a project/cohort for + efficient comprehensive review while still allowing for granular + review down to individual metrics for a single sample. Finally, we + have developed a “Sample QC review system” schema to standardize QC + reviewer’s logging of results and simplify downstream users’ + interpretation of the reviewers finding.
+Application of Genome sequencing (GS) and exome sequencing (ES)
+ based approaches has increased dramatically for both research and
+ clinical purposes over the last decade. Several quality control (QC)
+ tools have become available to help ensure that sequenced reads meet
+ expected measures of quality, and to identify process related errors
+ such as sample swaps or contamination. In recent years, efforts have
+ been made to define QC metrics and acceptable thresholds for QC
+ standardization across research groups
+ (
QuaC is a configurable pipeline developed using Snakemake and
+ Python. QuaC provides a command-line interface (CLI), written in
+ Python, to support user input, configuration, and execution.
+ System-level tests along with mock data and example input
+ configuration files are included in QuaC to assert correct operation
+ after install and test future developments. Unit jobs triggered by
+ QuaC are executed in Singularity container environment, as such
+ setup provides the major advantage of reproducibility and
+ portability across various user environments. QuaC is run at the
+ project level, and samples in the project are provided as input in a
+ pedigree file format (
QuaC runs several QC tools
+ (
QC tools used in QuaC. Note that this list does not include
+ tools that QuaC can consume when run with
+
Tool | +Usage in QuaC | +QC type | +
---|---|---|
Qualimap
+ ( |
+ Summarizes several alignment metrics using BAM file | +BAM quality | +
Picard-CollectMultipleMetrics
+ ( |
+ Summarizes alignment metrics from BAM file using several + modules | +BAM quality | +
Picard-CollectWgsMetrics
+ ( |
+ Collects metrics about coverage and performance using + BAM file | +BAM quality | +
Mosdepth
+ ( |
+ Fast alignment depth calculation using BAM file | +BAM quality | +
Indexcov
+ ( |
+ Estimate coverage from BAM index for GS (Skipped in + exome mode) | +BAM quality | +
Covviz
+ ( |
+ Identifies large, coverage-based anomalies for GS using + Indexcov output (Skipped in exome mode) | +BAM quality | +
Bcftools stats
+ ( |
+ Summarizes VCF file stats | +VCF quality | +
VerifyBamID2
+ ( |
+ Estimates within-species (i.e., cross-sample) + contamination using BAM file | +Within-species contamination | +
Somalier
+ ( |
+ Estimation of sex, ancestry and relatedness using BAM + file | +Sex, ancestry, and relatedness estimation | +
QuaC includes a tool called QuaC-Watch, which consumes results
+ from the above-mentioned QC tools, compares QC metrics against the
+ acceptable thresholds, and summarizes results using color-coded
+ pass/fail flags for efficient review
+ (
To minimize the time needed to review QC metrics and assess
+ quality of samples across a project QuaC aggregates results produced
+ by all the QC tools and QuaC-Watch, using MultiQC
+ (
Aggregation and visualization of QC tools output and
+ QuaC-Watch output using MultiQC at the project level. QuaC-Watch
+ section shown here enables quick review of samples’ QC results and
+ helps to quickly identify samples that need further review. Users
+ may optionally toggle columns to view values for QC metrics of
+ interest and hover over the column title to view thresholds used
+ by QuaC-Watch (highlighted by red arrow). In addition to this
+ project-level report, similar MultiQC report is created at the
+ single-sample level for all the samples, which shows summarized QC
+ results for only one
+ sample..
Consistent and understandable dissemination of QC review results
+ can be challenging when quality issues are identified, and even more
+ so when these issues hamper accurate downstream analyses or
+ interpretation. To reduce this burden, we devised a “Sample QC
+ review system” where QC review results are flagged as pass,
+ acceptable, poor, and fail, along with a free text field for review
+ comments (
Fields logged in Sample QC database using controlled flags.
+ Type 1 flags are pass, acceptable, poor, and fail. Type 2 flags
+ are pass, fail, and not applicable.
+
Field | +Explanation | +Allowed values | +
---|---|---|
Sample - Overall Status | +Overall QC status considering results of all QC + performed | +Type 1 flags | +
FASTQ | +Overall QC status considering results of all QC + performed at FASTQ level | +Type 1 flags | +
FASTQ Comment | +Comments on QC at FASTQ level (e.g., small insert size, + high adapter content, etc.) | +Free text | +
BAM | +Overall QC status considering results of all QC + performed at BAM level | +Type 1 flags | +
BAM Comment | +Comments on QC at BAM level (e.g., low mean coverage, + high duplication rate, etc.) | +Free text | +
VCF | +Overall QC status considering results of all QC + performed at VCF level | +Type 1 flags | +
VCF Comment | +Comments on QC at VCF level (e.g., small insert size, + high adapter content, etc.) | +Free text | +
Other Species Contamination | +Sample contamination status due to other species’ + genomic material | +Type 1 flags | +
Human Cross-contamination | +Sample contamination status due to other human’s genomic + material | +Type 1 flags | +
Sex Check | +Did the predicted sex match the expected sex? | +Type 2 flags | +
Relatedness Check | +Did the predicted relatedness match expected + relatedness? | +Type 2 flags | +
Ancestry Check | +Did the predicted ancestry match expected ancestry? | +Type 2 flags | +
Other Comments/Notes | +Any other comments/notes concerning QC | +Free text | +
Source code for QuaC is available for download at + https://github.com/uab-cgds-worthey/quac under GNU GPLv3 license. + Installation, setup, configuration, and usage documentation is + available at https://quac.readthedocs.io.
+-
+
We would like to thank Donna Brown for providing feedback on + the utility of QuaC-Watch in research projects.
+This work was supported in part by an award from the CF + Foundation to Dr. Worthey (WORTHE19A0) and from UAB SOM Start-up + funds to Dr. Worthey.
+