-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bcftools view——locate problematic genotype in GT field #2309
Comments
When the underlying htslib parser fails, using bcftools will not help with investigating a corrupted file. Instead try something like this
to check what the genotypes look like and how many there are |
Thanks @pd3! I tried as you suggested, based on which I also deleted all
This should leave only
It seems I failed to locate the corrupted genotype. I was wondering if it's OK if I neglect the error and keep working with the |
No, I wouldn't want to work with a file like that. Any chance you could share the |
Sure! Thank you so much for spending time on this issue. |
Mmm, I can't find anything wrong with that line. You mentioned it's a UKB file. I have an access, can you show which file is that? I can try to debug the problem right there |
Hi @pd3! The file I'm working with is the whole exome sequencing 450k final release in pVCF format for chr13. We have performed genotype QC and merged blocks into a whole chromosome.
Now I'm not sure why I faced the error during the first run, but now the result seems fine. Probably this issue can be closed? |
Hi! I was trying to split multiallelic variants and keep only SNPs in my vcf file when I encountered an error message saying:
It seems the genotypes of some individuals were corrupted (or coded in the wrong way I guess), so I retrieved the GT with the following command:
bcftools view --regions 13:19451111 chr13.hg38.vcf.gz -o chr13_error_snp.vcf
andzcat chr13.hg38.vcf.gz|grep "19451111" > error_snp.tsv
A preview of the tsv is presented here, and I also attached the genotype of 13:19451111 for all individuals in text format.
error_snp.txt
The genotypes seems fine at first glance. Since there were ~460k individuals (from the UK Biobank) in the vcf file, I was wondering how I could locate the individual with the problematic genotype. Or was the error message given by mistake?
Thank you!
The text was updated successfully, but these errors were encountered: