Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Majority of SNPs have allele frequency = 0.000000 #2321

Open
ryanskiba opened this issue Nov 19, 2024 · 0 comments
Open

Majority of SNPs have allele frequency = 0.000000 #2321

ryanskiba opened this issue Nov 19, 2024 · 0 comments

Comments

@ryanskiba
Copy link

Hi,

I'm using bcftools to call and filter markers for a haploid biparental population of 118 individuals. I have Illumina sequencing for all progeny. After indexing my reference genome (one of the parents of the biparental population), converting my .fq files to .sam to sorted .bam files, I called markers using:
bcftools mpileup -Ou -f genomic.fasta *.bam | bcftools call -mv -Ob --ploidy 1 --threads 4 -o calls1.bcf

and convert to .vcf using: bcftools view -Ov calls1.bcf > test.vcf

I then run bcftools stats calls1.bcf > stats.txt.

I don't understand why, out of the 3.2 million SNPs in my .bcf file, roughly 1.9 million have an allele frequency of 0.000000. Additionally when I use grep -c "AC=0" calls1.vcf, it gives me a result of 0. So I'm not sure how to look at these markers that are supposedly present at such low frequency. Nor do I understand why markers would be called if they truly aren't present in the population.

Here's a portion of the data from bcftools stats:

AF, Stats by non-reference allele frequency:

AF [2]id [3]allele frequency [4]number of SNPs [5]number of transitions [6]number of transversions [7]number of indels [8]repeat-consistent [9]repeat-inconsistent [10]not applicable

AF 0 0.000000 1925317 1318327 606990 271 0 0 271
AF 0 0.008475 25134 23556 1578 34 0 0 34
AF 0 0.016949 86417 79970 6447 110 0 0 110
AF 0 0.025424 82610 76945 5665 115 0 0 115

I can get rid of these by filtering with a minor allele frequency, but I'd like to know why they were present in the first place.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant