Open a new terminal and pull required images.
docker pull broadinstitute/gatk:latest
docker pull biocontainers/samtools:v1.9-4-deb_cv1
docker pull biocontainers/fastqc:v0.11.9_cv8
Change to your data directory and decompress the files.
gzip -d *.gz
Filter chromossome 22 from dbSNP. Output: dbSNP_chr22.vcf
grep -w '^#\|^#CHROM\|^22' 00-common_all.vcf > dbSNP_chr22.vcf
Correct 22 to chr22. Output: dbSNP_chr22_corrected.vcf
awk '{if($0 !~ /^#/) print "chr"$0; else print $0}' dbSNP_chr22.vcf > dbSNP_chr22_corrected.vcf
Open a new terminal start a container of the FastQC image, specifying the filesystem location you want to mount.
docker run -v ~/GATK/DesafioMendelics:/data -it biocontainers/fastqc:v0.11.9_cv8 bash
Run FastQC outputs: amostra-lbb_R1_fastqc.html, amostra-lbb_R2_fastqc.html, amostra-lbb_R1_fastqc.zip, amostra-lbb_R2_fastqc.zip
fastqc -t 4 -o amostra-lbb_R1.fq amostra-lbb_R2.fq
Start python webserver to visualize the results on http://localhost:8000/
and check some quality parameters.
python3 -m http.server
Create reference fasta index. Open another terminal and start a samtools container. Output: grch38.chr22.fasta.fai.
docker run -v ~/GATK/DesafioMendelics:/data -it biocontainers/samtools:v1.9-4-deb_cv1 bash
samtools faidx grch38.chr22.fasta
Open a new terminal and start a container of the GATK4 image, specifying the filesystem location you want to mount.
docker run -v ~/GATK/DesafioMendelics:/gatk/my_data -it broadinstitute/gatk:latest bash
Output: unaligned_read_pairs.bam
../gatk FastqToSam -F1 amostra-lbb_R1.fq -F2 amostra-lbb_R2.fq -O unaligned_read_pairs.bam --SAMPLE_NAME AMOSTRA-LBB
An index allows querying features by a genomic interval. The -java-options -Xmx6G argument specify memory allocation. Outputs: dbSNP_chr22_corrected.vcf.idx, pequeno-gabarito.vcf.idx.
../gatk --java-options -Xmx6G IndexFeatureFile -I dbSNP_chr22_corrected.vcf
../gatk --java-options -Xmx6G IndexFeatureFile -I pequeno-gabarito.vcf
Tools that utilize BWA-MEM (e.g. BwaSpark, PathSeqBwaSpark) require an index image file of the reference sequences. Output: reference.img
../gatk BwaMemIndexImageCreator -I grch38.chr22.fasta -O grch38.chr22.fasta.img
Output: reference.dict.
../gatk CreateSequenceDictionary -R grch38.chr22.fasta
Outputs: aligned_reads.bam and aligned_reads.bam.sbi
../gatk --java-options -Xmx6G BwaSpark -I unaligned_read_pairs.bam -R grch38.chr22.fasta -O aligned_reads.bam
Sequencing error propagate in duplicates, flagging them mitigates duplication artifacts. Outputs: marked_dup_metrics.txt, marked_duplicates.bam, marked_duplicates.bam.bai, marked_duplicates.bam.sbi
../gatk --java-options -Xmx6G MarkDuplicatesSpark -I aligned_reads.bam -O marked_duplicates.bam -M marked_dup_metrics.txt
Output: marked_duplicates_rg.bam
../gatk --java-options -Xmx6G AddOrReplaceReadGroups -I marked_duplicates.bam -O marked_duplicates_rg.bam -LB AMOSTRA-LBB -PL ILLUMINA -PU AMOSTRA-LBB -SM AMOSTRA-LBB
Output: recal_data.table
../gatk --java-options -Xmx6G BaseRecalibrator -I marked_duplicates_rg.bam -R grch38.chr22.fasta --known-sites dbSNP_chr22_corrected.vcf -O recal_data.table
Outputs: analysis_ready_reads.bai, analysis_ready_reads.bam
../gatk --java-options -Xmx6G ApplyBQSR -I marked_duplicates_rg.bam -R grch38.chr22.fasta --bqsr-recal-file recal_data.table -O analysis_ready_reads.bam
Use the dbSNP and capture files to improve quality of the analysis. Outputs: variants.vcf, variants.vcf.idx.
../gatk --java-options -Xmx6G HaplotypeCaller -I analysis_ready_reads.bam -R grch38.chr22.fasta --dbsnp dbSNP_chr22_corrected.vcf -L coverage.bed -ip 100 -O variants.vcf
Outputs: variants_score.vcf, variants_score.vcf.idx, variants_filtered.vcf, variants_filtered.vcf.idx
../gatk --java-options -Xmx6G CNNScoreVariants -R grch38.chr22.fasta -V variants.vcf -O variants_score.vcf
../gatk --java-options -Xmx6G FilterVariantTranches -V variants_score.vcf --resource dbSNP_chr22_corrected.vcf --info-key CNN_1D -O variants_filtered.vcf
- Data Source Downloader Tool.
../gatk FuncotatorDataSourceDownloader --germline --validate-integrity --extract-after-download
- Change to funcotator_dataSources.v1.7.20200521g directory
cd funcotator_dataSources.v1.7.20200521g
- Extract gnomAD_exome.tar.gz
tar -zxf gnomAD_exome.tar.gz
- Execute Funcotator annotation. Output: variants_annotation.vcf
../gatk --java-options -Xmx6G Funcotator -V variants_filtered.vcf -R grch38.chr22.fasta -O variants_annotation --output-file-format VCF --data-sources-path funcotator_dataSources.v1.7.20200521g --ref-version hg38
outputs: summary_variants.variant_calling_detail_metrics, summary_variants.variant_calling_summary_metrics
../gatk --java-options -Xmx6G CollectVariantCallingMetrics -I variants_filtered.vcf --DBSNP dbSNP_chr22_corrected.vcf -O summary_variants