Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different site lists in empirical dose files from MIS #3

Open
dtaliun opened this issue Jul 17, 2024 · 2 comments
Open

Different site lists in empirical dose files from MIS #3

dtaliun opened this issue Jul 17, 2024 · 2 comments

Comments

@dtaliun
Copy link

dtaliun commented Jul 17, 2024

Hi Jonathon,

When merging the dose files from MIS, the imputed site lists are guaranteed to be the same as long as no R2 filters are applied because MIS outputs even quasi-monomorphic variants. However, empirical dose files may be different. When we split large GWASs into two batches, we sometimes end up with a few typed variants which are monomorphic in one batch but not in another. MIS eliminates monomorphic typed variants and doesn't output them in the empirical dose files of one batch but not another. Thus, we ended up with different site lists and merging errors.

Would you happen to have any suggestions on how to overcome this (without redoing the imputation)? Can the empirical dose files still be merged? Given that these are typically just a few variants, do you think downstream MetaMinimac will complain if we remove them from the empirical dose files?

Thanks,
Daniel

@jonathonl
Copy link
Contributor

jonathonl commented Jul 17, 2024

MetaMinimac wouldn't know the difference if you manually removed such variants before merging. I think that's the only option at the moment. It's unfortunate that MIS drops monomorphic variants (I'm not sure why they would do that). Thanks for raising this issue. I'll mention this to the MIS team, but I may end up modifying hds-util to automatically drop such variants when merging.

@dtaliun
Copy link
Author

dtaliun commented Jul 18, 2024

Thanks Jonathon,

I will leave here a quick bcftools command to remove such variants manually if someone else reads this issue:

bcftools isec -n=2 batch1.empiricalDose.vcf.gz batch2.empiricalDose.vcf.gz -p temp
hds-util -Ovcf.gz -o merged.empiricalDose.vcf.gz temp/0000.vcf temp/0001.vcf
rm -rf temp

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants