Copyright (c) 2023-Present, Ryan L. Collins, Noah Fields, and the Van Allen, Gusev, and Haigis laboratories at Dana-Farber Cancer Institute.
Distributed under terms of the GNU GPL v2.0 License (see LICENSE
).
Note: this repository is under active development. More documentation will be added as the project evolves.
This repository contains the working code and scripts used to detect, genotype, filter, and annotate germline variants from germline WGS across cancer types
Directory | Description |
---|---|
docker/ |
Instructions for building project-related Docker images |
scripts/ |
Stand-alone scripts called by various workflows |
shell/ |
Shell snippets for running specific processes or analyses |
wdl/ |
Stand-alone WDL workflows |
All sample-level input data is stored in a secure Google Cloud bucket. Note that permissions must be granted for bucket access.
The bucket is organized as follows:
- One directory per cohort
- Each cohort has the following subdirectories:
gatk-hc/reblocked
: reblocked GATK-HC gVCFs and indexesgatk-sv
: evidence and metrics files collected by GATK-SVgatk-sv/coverage
: coverage counts filesgatk-sv/metrics
: per-sample metrics generated during GATK-SV module 01gatk-sv/pesr
: PE/SR metadata files
manta
: raw Manta VCFs and indexesmelt
: raw MELT VCFs and indexeswham
: raw Wham VCFs and indexes
Note: all raw gVCFs previously hosted in gatk-hc/
were deleted on Feb 7, 2024, but the nested directory structure of gatk-hc/reblocked/
was preserved for legacy code compatability