Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bcftools 1.10 Unexpected type 0 #1123

Closed
bryce-turner opened this issue Dec 7, 2019 · 7 comments · Fixed by samtools/htslib#1000
Closed

bcftools 1.10 Unexpected type 0 #1123

bryce-turner opened this issue Dec 7, 2019 · 7 comments · Fixed by samtools/htslib#1000
Labels
bug htslib-dependent Cannot be fixed until htslib is fixed P1: Urgent

Comments

@bryce-turner
Copy link

After testing with the latest release (1.10) we've encountered an error when using bcftools view and filter:

chr1 17000202 . A C . clustered_events;haplotype;normal_artifact;strand_bias CONTQ=64;DP=125;ECNT=3;GERMQ=93;MBQ=17,29;MFRL=193,164;MMQ=60,60;MPOS=6;NALOD=-0.8027;NLOD=15.98;POPAF=6;ROQ=69;SEQQ=53;STRANDQ=1;TLOD=11.37 GT:AD:AF:DP:F1R2:F2R1:PGT:PID:PS:SB 0|1:38,6:0.143:44:8,3:4,1:0|1:17000202_A_C:17000202:8,30,6,0 0|0:68,1:0.028:69:12,1:22,0:0|1:17000[E::bcf_fmt_array] Unexpected type 0

However if we look at this same line with zcat we see:
chr1 17000202 . A C . clustered_events;haplotype;normal_artifact;strand_bias CONTQ=64;DP=125;ECNT=3;GERMQ=93;MBQ=17,29;MFRL=193,164;MMQ=60,60;MPOS=6;NALOD=-8.027e-01;NLOD=15.98;POPAF=6.00;ROQ=69;SEQQ=53;STRANDQ=1;TLOD=11.37 GT:AD:AF:DP:F1R2:F2R1:PGT:PID:PS:SB 0|1:38,6:0.143:44:8,3:4,1:0|1:17000202_A_C:17000202:8,30,6,0 0|0:68,1:0.028:69:12,1:22,0:0|1:17000202_A_C:17000202:11,57,1,0

We don't encounter this [E::bcf_fmt_array] Unexpected type 0 when using bcftools 1.9 though.
Additionally here is our header, excluding the contigs:

##FILTER=<ID=PASS,Description="All filters passed">
##FILTER=<ID=base_qual,Description="alt median base quality">
##FILTER=<ID=clustered_events,Description="Clustered events observed in the tumor">
##FILTER=<ID=contamination,Description="contamination">
##FILTER=<ID=duplicate,Description="evidence for alt allele is overrepresented by apparent duplicates">
##FILTER=<ID=fragment,Description="abs(ref - alt) median fragment length">
##FILTER=<ID=germline,Description="Evidence indicates this site is germline, not somatic">
##FILTER=<ID=haplotype,Description="Variant near filtered variant on same haplotype.">
##FILTER=<ID=low_allele_frac,Description="Allele fraction is below specified threshold">
##FILTER=<ID=map_qual,Description="ref - alt median mapping quality">
##FILTER=<ID=multiallelic,Description="Site filtered because too many alt alleles pass tumor LOD">
##FILTER=<ID=n_ratio,Description="Ratio of N to alt exceeds specified ratio">
##FILTER=<ID=normal_artifact,Description="artifact_in_normal">
##FILTER=<ID=numt_chimera,Description="NuMT variant with too many ALT reads originally from autosome">
##FILTER=<ID=numt_novel,Description="Alt depth is below expected coverage of NuMT in autosome">
##FILTER=<ID=orientation,Description="orientation bias detected by the orientation bias mixture model">
##FILTER=<ID=panel_of_normals,Description="Blacklisted site in panel of normals">
##FILTER=<ID=position,Description="median distance of alt variants from end of reads">
##FILTER=<ID=slippage,Description="Site filtered due to contraction of short tandem repeat region">
##FILTER=<ID=strand_bias,Description="Evidence for alt allele comes from one read direction only">
##FILTER=<ID=strict_strand,Description="Evidence for alt allele is not represented in both directions">
##FILTER=<ID=weak_evidence,Description="Mutation does not meet likelihood threshold">
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
##FORMAT=<ID=AF,Number=A,Type=Float,Description="Allele fractions of alternate alleles in the tumor">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">
##FORMAT=<ID=F1R2,Number=R,Type=Integer,Description="Count of reads in F1R2 pair orientation supporting each allele">
##FORMAT=<ID=F2R1,Number=R,Type=Integer,Description="Count of reads in F2R1 pair orientation supporting each allele">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=PGT,Number=1,Type=String,Description="Physical phasing haplotype information, describing how the alternate alleles are phased in relation to one another">
##FORMAT=<ID=PID,Number=1,Type=String,Description="Physical phasing ID information, where each unique ID within a given sample (but not across samples) connects records within a phasing group">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
##FORMAT=<ID=PS,Number=1,Type=Integer,Description="Phasing set (typically the position of the first variant in the set)">
##FORMAT=<ID=SB,Number=4,Type=Integer,Description="Per-sample component statistics which comprise the Fisher's Exact Test to detect strand bias.">
##GATKCommandLine=<ID=FilterMutectCalls,CommandLine="FilterMutectCalls  --output exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U.bwa.mutect2.all.vcf.gz --stats temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/merged.stats --filtering-stats temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/filtering.stats --max-alt-allele-count 2 --contamination-table temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/contamination.table --tumor-segmentation temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/segments.table --orientation-bias-artifact-priors temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/artifact-priors.tar.gz --variant temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa.mutect2.raw.vcf.gz --reference /home/tgenref/homo_sapiens/grch38_hg38/hg38tgen/genome_reference/GRCh38tgen_decoy_alts_hla.fa --tmp-dir temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/temp_filter/  --threshold-strategy OPTIMAL_F_SCORE --f-score-beta 1.0 --false-discovery-rate 0.05 --initial-threshold 0.1 --mitochondria-mode false --max-events-in-region 2 --unique-alt-read-count 0 --min-median-mapping-quality 30 --min-median-base-quality 20 --max-median-fragment-length-difference 10000 --min-median-read-position 1 --max-n-ratio Infinity --min-reads-per-strand 0 --autosomal-coverage 0.0 --max-numt-fraction 0.85 --min-allele-fraction 0.0 --contamination-estimate 0.0 --log-snv-prior -13.815510557964275 --log-indel-prior -16.11809565095832 --log-artifact-prior -2.302585092994046 --normal-p-value-threshold 0.001 --min-slippage-length 8 --pcr-slippage-rate 0.1 --distance-on-haplotype 100 --long-indel-length 5 --interval-set-rule UNION --interval-padding 0 --interval-exclusion-padding 0 --interval-merging-rule ALL --read-validation-stringency SILENT --seconds-between-progress-updates 10.0 --disable-sequence-dictionary-validation false --create-output-bam-index true --create-output-bam-md5 false --create-output-variant-index true --create-output-variant-md5 false --lenient false --add-output-sam-program-record true --add-output-vcf-command-line true --cloud-prefetch-buffer 40 --cloud-index-prefetch-buffer -1 --disable-bam-index-caching false --sites-only-vcf-output false --help false --version false --showHidden false --verbosity INFO --QUIET false --use-jdk-deflater false --use-jdk-inflater false --gcs-max-retries 20 --gcs-project-for-requester-pays  --disable-tool-default-read-filters false",Version="4.1.4.0",Date="December 7, 2019 11:15:33 AM MST">
##GATKCommandLine=<ID=Mutect2,CommandLine="Mutect2  --f1r2-tar-gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/1.f1r2.tar.gz --tumor-sample MMRF_1923_1_BM_CD138pos_T1 --normal-sample MMRF_1923_1_PB_WBC_C2 --germline-resource /home/tgenref/homo_sapiens/grch38_hg38/public_databases/gnomad/r3.0/gnomad.genomes.r3.0.sites.pass.AnnotationReference.vcf.gz --independent-mates true --output temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/1.mutect2.vcf.gz --intervals chr1:10001-207666 --intervals chr1:257667-297968 --intervals chr1:347969-535988 --intervals chr1:585989-2702781 --intervals chr1:2746291-12954384 --intervals chr1:13004385-16799163 --intervals chr1:16849164-29552233 --input exome/alignment/bwa/MMRF_1923_1_PB_WBC_C2_KHS5U/MMRF_1923_1_PB_WBC_C2_KHS5U.bwa.bam --input exome/alignment/bwa/MMRF_1923_1_BM_CD138pos_T1_KHS5U/MMRF_1923_1_BM_CD138pos_T1_KHS5U.bwa.bam --reference /home/tgenref/homo_sapiens/grch38_hg38/hg38tgen/genome_reference/GRCh38tgen_decoy_alts_hla.fa --tmp-dir temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/temp_mutect2_1/  --f1r2-median-mq 50 --f1r2-min-bq 20 --f1r2-max-depth 200 --genotype-pon-sites false --genotype-germline-sites false --af-of-alleles-not-in-resource -1.0 --mitochondria-mode false --tumor-lod-to-emit 3.0 --initial-tumor-lod 2.0 --pcr-snv-qual 40 --pcr-indel-qual 40 --max-population-af 0.01 --downsampling-stride 1 --callable-depth 10 --max-suspicious-reads-per-alignment-start 0 --normal-lod 2.2 --ignore-itr-artifacts false --gvcf-lod-band -2.5 --gvcf-lod-band -2.0 --gvcf-lod-band -1.5 --gvcf-lod-band -1.0 --gvcf-lod-band -0.5 --gvcf-lod-band 0.0 --gvcf-lod-band 0.5 --gvcf-lod-band 1.0 --minimum-allele-fraction 0.0 --disable-adaptive-pruning false --dont-trim-active-regions false --max-extension 25 --padding-around-indels 150 --padding-around-snps 20 --kmer-size 10 --kmer-size 25 --dont-increase-kmer-sizes-for-cycles false --allow-non-unique-kmers-in-ref false --num-pruning-samples 1 --min-dangling-branch-length 4 --recover-all-dangling-branches false --max-num-haplotypes-in-population 128 --min-pruning 2 --adaptive-pruning-initial-error-rate 0.001 --pruning-lod-threshold 2.302585092994046 --max-unpruned-variants 100 --debug-assembly false --debug-graph-transformations false --capture-assembly-failure-bam false --error-correct-reads false --kmer-length-for-read-error-correction 25 --min-observations-for-kmer-to-be-solid 20 --likelihood-calculation-engine PairHMM --base-quality-score-threshold 18 --pair-hmm-gap-continuation-penalty 10 --pair-hmm-implementation FASTEST_AVAILABLE --pcr-indel-model CONSERVATIVE --phred-scaled-global-read-mismapping-rate 45 --native-pair-hmm-threads 4 --native-pair-hmm-use-double-precision false --bam-writer-type CALLED_HAPLOTYPES --dont-use-soft-clipped-bases false --min-base-quality-score 10 --smith-waterman JAVA --emit-ref-confidence NONE --max-mnp-distance 1 --force-call-filtered-alleles false --min-assembly-region-size 50 --max-assembly-region-size 300 --assembly-region-padding 100 --max-reads-per-alignment-start 50 --active-probability-threshold 0.002 --max-prob-propagation-distance 50 --force-active false --interval-set-rule UNION --interval-padding 0 --interval-exclusion-padding 0 --interval-merging-rule ALL --read-validation-stringency SILENT --seconds-between-progress-updates 10.0 --disable-sequence-dictionary-validation false --create-output-bam-index true --create-output-bam-md5 false --create-output-variant-index true --create-output-variant-md5 false --lenient false --add-output-sam-program-record true --add-output-vcf-command-line true --cloud-prefetch-buffer 40 --cloud-index-prefetch-buffer -1 --disable-bam-index-caching false --sites-only-vcf-output false --help false --version false --showHidden false --verbosity INFO --QUIET false --use-jdk-deflater false --use-jdk-inflater false --gcs-max-retries 20 --gcs-project-for-requester-pays  --disable-tool-default-read-filters false --max-read-length 2147483647 --min-read-length 30 --minimum-mapping-quality 20 --disable-tool-default-annotations false --enable-all-annotations false",Version="4.1.4.0",Date="December 7, 2019 10:25:48 AM MST">
##INFO=<ID=CONTQ,Number=1,Type=Float,Description="Phred-scaled qualities that alt allele are not due to contamination">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth; some reads may have been filtered">
##INFO=<ID=ECNT,Number=1,Type=Integer,Description="Number of events in this haplotype">
##INFO=<ID=GERMQ,Number=1,Type=Integer,Description="Phred-scaled quality that alt alleles are not germline variants">
##INFO=<ID=MBQ,Number=R,Type=Integer,Description="median base quality">
##INFO=<ID=MFRL,Number=R,Type=Integer,Description="median fragment length">
##INFO=<ID=MMQ,Number=R,Type=Integer,Description="median mapping quality">
##INFO=<ID=MPOS,Number=A,Type=Integer,Description="median distance from end of read">
##INFO=<ID=NALOD,Number=A,Type=Float,Description="Negative log 10 odds of artifact in normal with same allele fraction as tumor">
##INFO=<ID=NCount,Number=1,Type=Integer,Description="Count of N bases in the pileup">
##INFO=<ID=NLOD,Number=A,Type=Float,Description="Normal log 10 likelihood ratio of diploid het or hom alt genotypes">
##INFO=<ID=OCM,Number=1,Type=Integer,Description="Number of alt reads whose original alignment doesn't match the current contig.">
##INFO=<ID=PON,Number=0,Type=Flag,Description="site found in panel of normals">
##INFO=<ID=POPAF,Number=A,Type=Float,Description="negative log 10 population allele frequencies of alt alleles">
##INFO=<ID=ROQ,Number=1,Type=Float,Description="Phred-scaled qualities that alt allele are not due to read orientation artifact">
##INFO=<ID=RPA,Number=.,Type=Integer,Description="Number of times tandem repeat unit is repeated, for each allele (including reference)">
##INFO=<ID=RU,Number=1,Type=String,Description="Tandem repeat unit (bases)">
##INFO=<ID=SEQQ,Number=1,Type=Integer,Description="Phred-scaled quality that alt alleles are not sequencing errors">
##INFO=<ID=STR,Number=0,Type=Flag,Description="Variant is a short tandem repeat">
##INFO=<ID=STRANDQ,Number=1,Type=Integer,Description="Phred-scaled quality of strand bias artifact">
##INFO=<ID=STRQ,Number=1,Type=Integer,Description="Phred-scaled quality that alt alleles in STRs are not polymerase slippage errors">
##INFO=<ID=TLOD,Number=A,Type=Float,Description="Log 10 likelihood ratio score of variant existing versus not existing">
##INFO=<ID=UNIQ_ALT_READ_COUNT,Number=1,Type=Integer,Description="Number of ALT reads with unique start and mate end positions at a variant site">
##MutectVersion=2.2
##bcftools_concatCommand=concat --output-type z --output temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa.mutect2.raw.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/1.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/2.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/3.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/4.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/5.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/6.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/7.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/8.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/9.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/10.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/11.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/12.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/13.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/14.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/15.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/16.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/17.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/18.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/19.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/20.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/21.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/22.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/23.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/24.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/25.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/26.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/27.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/28.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/29.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/30.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/31.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/32.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/33.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/34.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/35.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/36.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/37.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/38.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/39.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/40.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/41.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/42.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/43.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/44.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/45.mutect2.vcf.gz; Date=Sat Dec  7 11:15:16 2019
##bcftools_concatVersion=1.10+htslib-1.10```
@pd3
Copy link
Member

pd3 commented Dec 7, 2019

I am unable to reproduce the error with the header and the data line you provided. What is the exact command you are running? Any chance you could provide a test case?

@PedalheadPHX
Copy link

happy to provide the example file, do you have a DM link for the files?

@pd3
Copy link
Member

pd3 commented Dec 8, 2019

Thank you for the test case. The problem was introduced when 64-bit support was added to htslib. A minimal example to reproduce the problem:

$ cat test.vcf
##fileformat=VCFv4.2
##INFO=<ID=MPOS,Number=A,Type=Integer,Description="dummy">
##INFO=<ID=NALOD,Number=A,Type=Float,Description="dummy">
##INFO=<ID=NLOD,Number=A,Type=Float,Description="dummy">
##INFO=<ID=POPAF,Number=A,Type=Float,Description="dummy">
##contig=<ID=chr1,length=248956422>
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO
chr1    1   .   G   C   .   .   MPOS=-2147483648;NALOD=-8.279e-01;NLOD=15.45;POPAF=6.00

$ bcftools view test.vcf

@pd3 pd3 added bug htslib-dependent Cannot be fixed until htslib is fixed P1: Urgent labels Dec 8, 2019
jkbonfield added a commit to jkbonfield/htslib that referenced this issue Dec 9, 2019
Any 64-bit INFO field that wasn't the last in the list would cause
subsequent fields to be decoded incorrectly.

This commit fixes that, plus updates the tests accordingly so the bug
could be triggered.

Fixes samtools#999
Fixes samtools/bcftools#1123
@jmarshall
Copy link
Member

jmarshall commented Dec 10, 2019

Is it the case that the problematic line (from which Petr has distilled a minimal example) is in fact the line following the chr1 17000202 . A C line shown in @TGEN-BTurner's original report? (And if so it would be great if you'd use zcat to post that line here too.)

(Or it may be several lines further on — the way that line has been clipped at …|1:17000 suggests that the ‘final’ line of output you're seeing is an artefact of stdout buffering.)

@jkbonfield
Copy link
Contributor

jkbonfield commented Dec 11, 2019

Indeed we still haven't seen the original data which triggered the whole problem. @pd3 - was the MPOS field you constructed for your example the same name and value that was culled from the test data you were provided? This would really help in a bug report to know that the issue we found and fixed is infact the same one. @TGEN-BTurner can you please check whether PR samtools/htslib#1000 fixes your problem?

@bryce-turner
Copy link
Author

I can confirm that samtools/htslib#1000 fixes the problem. I tested on a different sample than before but here is a before and after the fix being applied:

Before:

chr1	43290221	.	T	A	.	base_qual;haplotype;weak_evidence	CONTQ=12;DP=12;ECNT=2;GERMQ=20;MBQ=32,10;MFRL=176,180;MMQ=60,60;MPOS=12;NALOD=1;NLOD=2.7;POPAF=6;ROQ=57;SEQQ=1;STRANDQ=16;TLOD=3.42	GT:AD:AF:DP:F1R2:F2R1:PGT:PID:PS:SB	0|1:2,1:0.4:3:2,0:0,0:0|1:43290221_T_A:43290221:2,0,1,0	0|0:9,0:0.091:9:4,0:3,0:0|1:43290221_T_A:43290221:5,4,0,0
chr1	43290242	.	C	A	.	haplotype;weak_evidence	CONTQ=13;DP=13;ECNT=2;GERMQ=22;MBQ=37,31;MFRL=176,180;MMQ=60,60;MPOS=33;NALOD=1.04;NLOD=3;POPAF=6;ROQ=60;SEQQ=1;STRANDQ=18;TLOD=3.67	GT:AD:AF:DP:F1R2:F2R1:PGT:PID:PS:SB	0|1:1,1:0.5:2:1,0:0,1:0|1:43290221_T_A:43290221:1,0,1,0	0|0:10,0:0.083:10:6,0:3,0:0|1:43290221_T_A:43290221:6,4,0,0
chr1	43314053	.	TTGTG	T,TTG	.	germline;normal_artifact	CONTQ=93;DP=208;ECNT=1;GERMQ=1;MBQ=38,38,38;MFRL=186,174,189;MMQ=60,60,60;MPOS=28,26;NALOD=-4.571,-20.77;[E::bcf_fmt_array] Unexpected type 0

After:

chr1    43290221        .       T       A       .       base_qual;haplotype;weak_evidence       CONTQ=12;DP=12;ECNT=2;GERMQ=20;MBQ=32,10;MFRL=176,180;MMQ=60,60;MPOS=12;NALOD=1;NLOD=2.7;POPAF=6;ROQ=57;SEQQ=1;STRANDQ=16;TLOD=3.42  GT:AD:AF:DP:F1R2:F2R1:PGT:PID:PS:SB     0|1:2,1:0.4:3:2,0:0,0:0|1:43290221_T_A:43290221:2,0,1,0 0|0:9,0:0.091:9:4,0:3,0:0|1:43290221_T_A:43290221:5,4,0,0
chr1    43290242        .       C       A       .       haplotype;weak_evidence CONTQ=13;DP=13;ECNT=2;GERMQ=22;MBQ=37,31;MFRL=176,180;MMQ=60,60;MPOS=33;NALOD=1.04;NLOD=3;POPAF=6;ROQ=60;SEQQ=1;STRANDQ=18;TLOD=3.67    GT:AD:AF:DP:F1R2:F2R1:PGT:PID:PS:SB     0|1:1,1:0.5:2:1,0:0,1:0|1:43290221_T_A:43290221:1,0,1,0 0|0:10,0:0.083:10:6,0:3,0:0|1:43290221_T_A:43290221:6,4,0,0
chr1    43314053        .       TTGTG   T,TTG   .       germline;normal_artifact        CONTQ=93;DP=208;ECNT=1;GERMQ=1;MBQ=38,38,38;MFRL=186,174,189;MMQ=60,60,60;MPOS=28,26;NALOD=-4.571,-20.77;NLOD=3.44,-18.72;POPAF=6,6;ROQ=93;RPA=11,9,10;RU=TG;SEQQ=93;STR;STRANDQ=44;STRQ=93;TLOD=3.28,17.87     GT:AD:AF:DP:F1R2:F2R1:SB        0/1/2:61,2,9:0.037,0.133:72:27,2,3:28,0,4:9,52,1,10     0/0:48,3,10:0.058,0.172:61:24,0,5:21,3,5:12,36,4,9
chr1    43363190        .       G       GT      .       normal_artifact;slippage;weak_evidence  CONTQ=30;DP=443;ECNT=1;GERMQ=93;MBQ=38,34;MFRL=181,184;MMQ=60,60;MPOS=21;NALOD=-3.447;NLOD=35.08;POPAF=6;ROQ=93;RPA=10,11;RU=T;SEQQ=1;STR;STRANDQ=54;STRQ=1;TLOD=3.29   GT:AD:AF:DP:F1R2:F2R1:SB        0/1:173,7:0.034:180:107,3:65,3:79,94,3,4        0/0:166,7:0.036:173:88,4:73,2:70,96,4,3
chr1    43422694        .       T       C       .       haplotype;normal_artifact;position;strand_bias  CONTQ=69;DP=269;ECNT=2;GERMQ=93;MBQ=37,31;MFRL=173,182;MMQ=60,60;MPOS=0;NALOD=-18.25;NLOD=8.84;POPAF=6;ROQ=64;SEQQ=93;STRANDQ=1;TLOD=21.52      GT:AD:AF:DP:F1R2:F2R1:PGT:PID:PS:SB     0|1:127,8:0.066:135:62,6:55,2:0|1:43422694_T_C:43422694:75,52,8,0       0|0:127,7:0.059:134:61,3:59,2:0|1:43422694_T_C:43422694:89,38,7,0
chr1    43422696        .       T       C       .       haplotype;normal_artifact;strand_bias   CONTQ=69;DP=279;ECNT=2;GERMQ=93;MBQ=38,33;MFRL=172,182;MMQ=60,60;MPOS=-2147483648;NALOD=-18.27;NLOD=8.58;POPAF=6;ROQ=55;SEQQ=93;STRANDQ=1;TLOD=21.51    GT:AD:AF:DP:F1R2:F2R1:PGT:PID:PS:SB     0|1:127,8:0.065:135:66,4:60,1:0|1:43422694_T_C:43422694:75,52,8,0       0|0:126,7:0.059:133:64,2:58,3:0|1:43422694_T_C:43422694:89,37,7,0
chr1    43499804        .       GT      G       .       slippage;weak_evidence  CONTQ=15;DP=19;ECNT=1;GERMQ=8;MBQ=39,36;MFRL=168,220;MMQ=60,60;MPOS=15;NALOD=0.715;NLOD=2.36;POPAF=6;ROQ=93;RPA=10,9;RU=T;SEQQ=1;STR;STRANDQ=14;STRQ=1;TLOD=3.67        GT:AD:AF:DP:F1R2:F2R1:SB        0/1:6,2:0.303:8:5,2:1,0:1,5,0,2 0/0:8,0:0.097:8:7,0:1,0:4,4,0,0
chr1    43592587        .       G       A       .       contamination;weak_evidence     CONTQ=1;DP=145;ECNT=1;GERMQ=93;MBQ=39,39;MFRL=194,159;MMQ=60,60;MPOS=33;NALOD=1.86;NLOD=21.07;POPAF=4.85;ROQ=44;SEQQ=1;STRANDQ=8;TLOD=3.56      GT:AD:AF:DP:F1R2:F2R1:SB        0/1:61,2:0.045:63:35,2:25,0:3,58,0,2    0/0:70,0:0.014:70:45,0:25,0:3,67,0,0
chr1    43621836        .       C       T       .       contamination;weak_evidence     CONTQ=1;DP=209;ECNT=1;GERMQ=93;MBQ=39,39;MFRL=197,217;MMQ=60,60;MPOS=35;NALOD=2.02;NLOD=30.39;POPAF=6;ROQ=63;SEQQ=1;STRANDQ=8;TLOD=3.08 GT:AD:AF:DP:F1R2:F2R1:SB        0/1:96,2:0.03:98:58,1:35,1:90,6,2,0     0/0:101,0:0.009441:101:65,0:36,0:92,9,0,0

@jkbonfield
Copy link
Contributor

On request, the proposal now is a bit different. That MPOS=-2147483648 will become MPOS=.. This is to permit such data to be able to be written to BCF. That's over in samtools/htslib#1004.

I think this is fine. The -2147483648 is just the result of a ghastly bug due to failure to initialise a variable correctly. Replacing it with the "missing" value is the most accurate representation of what happened.

pd3 added a commit that referenced this issue Dec 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug htslib-dependent Cannot be fixed until htslib is fixed P1: Urgent
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants