You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are finding examples where batch level R2 > 0.3 for all batches but the output R2 from hds-util is < 0.3 and thus the variants do not end up in the output data when using the -m 0.3 option. While this is a very small number of variants (just a few out of 22K HLA imputed variants) we are curious how this can happen in the calculation of the merged R2. Thanks for any explanation you can provide.
Apologies for the confusion. When I ran using "-m 0.3" the hds-util did not provide the output for this variant. I just ran the merge for this variant separately removing the -m filter and this is the result.
6 31324002 AA_B_167_31323991_exon3_SW A T . . IMPUTED;AF=0.99989;MAF=0.00010
9732;AVG_CS=0.999958;R2=0.174244
The R2 field is an estimated r-squared based on the mean the HDS values. It's plausible for the chunked data to "fit" their respective means better than the merged data. This is also plausible in empirical r-squared. In an extreme example where the x is unimodal and the y is bimodal due to two underlying distributions with different means, the r-squared of the combined data could be much worse than if you were to compute r-squared on the two underlying distributions separately.
I just double checked that the R2 INFO field was being computed correctly by separating an imputed VCF into two sample sets and then merging back together with hds-util. This will give equivalent results (barring precision errors).
We are finding examples where batch level R2 > 0.3 for all batches but the output R2 from hds-util is < 0.3 and thus the variants do not end up in the output data when using the -m 0.3 option. While this is a very small number of variants (just a few out of 22K HLA imputed variants) we are curious how this can happen in the calculation of the merged R2. Thanks for any explanation you can provide.
Example
hds-util -f GT,HDS,GP,DS -m 0.3 data1.dose.vcf.gz data2.dose.vcf.gz data3.dose.vcf.gz data4.dose.vcf.gz -O vcf.gz > merged.dose.vcf.gz
data1
AA_B_167_31323991_exon3_SW
6 31324002 AA_B_167_31323991_exon3_SW A T . .IMPUTED;AF=0.999851;MAF=0.00014919;AVG_CS=0.999931;R2=0.543911
data2
[batzler@mforgehn5 TAPphase1QCimp]$ zcat v4/HLA/HLAoutput/2/chr6.info.gz | grep AA_B_167_31323991_exon3_SW
6 31324002 AA_B_167_31323991_exon3_SW A T . .IMPUTED;AF=0.99989;MAF=0.000109911;AVG_CS=0.999933;R2=0.402355
data3
6 31324002 AA_B_167_31323991_exon3_SW A T . .IMPUTED;AF=0.999909;MAF=9.11355e-05;AVG_CS=0.999944;R2=0.384272
data4
6 31324002 AA_B_167_31323991_exon3_SW A T . .IMPUTED;AF=0.999825;MAF=0.000174999;AVG_CS=0.999938;R2=0.646454
merged
CHROM POS SNP REF ALT N ALT_Frq MAF minRsq minEmpRsq vtype RSQ
1 6 31324002 AA_B_167_31323991_exon3_SW A T 4 1.00 0.000131 0.384 NA AA NA
Thanks
The text was updated successfully, but these errors were encountered: