You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm using bcftools to call and filter markers for a haploid biparental population of 118 individuals. I have Illumina sequencing for all progeny. After indexing my reference genome (one of the parents of the biparental population), converting my .fq files to .sam to sorted .bam files, I called markers using:
bcftools mpileup -Ou -f genomic.fasta *.bam | bcftools call -mv -Ob --ploidy 1 --threads 4 -o calls1.bcf
and convert to .vcf using: bcftools view -Ov calls1.bcf > test.vcf
I then run bcftools stats calls1.bcf > stats.txt.
I don't understand why, out of the 3.2 million SNPs in my .bcf file, roughly 1.9 million have an allele frequency of 0.000000. Additionally when I use grep -c "AC=0" calls1.vcf, it gives me a result of 0. So I'm not sure how to look at these markers that are supposedly present at such low frequency. Nor do I understand why markers would be called if they truly aren't present in the population.
Here's a portion of the data from bcftools stats:
AF, Stats by non-reference allele frequency:
AF [2]id [3]allele frequency [4]number of SNPs [5]number of transitions [6]number of transversions [7]number of indels [8]repeat-consistent [9]repeat-inconsistent [10]not applicable
AF 0 0.000000 1925317 1318327 606990 271 0 0 271
AF 0 0.008475 25134 23556 1578 34 0 0 34
AF 0 0.016949 86417 79970 6447 110 0 0 110
AF 0 0.025424 82610 76945 5665 115 0 0 115
I can get rid of these by filtering with a minor allele frequency, but I'd like to know why they were present in the first place.
Thanks!
The text was updated successfully, but these errors were encountered:
Hi,
I'm using bcftools to call and filter markers for a haploid biparental population of 118 individuals. I have Illumina sequencing for all progeny. After indexing my reference genome (one of the parents of the biparental population), converting my .fq files to .sam to sorted .bam files, I called markers using:
bcftools mpileup -Ou -f genomic.fasta *.bam | bcftools call -mv -Ob --ploidy 1 --threads 4 -o calls1.bcf
and convert to .vcf using: bcftools view -Ov calls1.bcf > test.vcf
I then run bcftools stats calls1.bcf > stats.txt.
I don't understand why, out of the 3.2 million SNPs in my .bcf file, roughly 1.9 million have an allele frequency of 0.000000. Additionally when I use grep -c "AC=0" calls1.vcf, it gives me a result of 0. So I'm not sure how to look at these markers that are supposedly present at such low frequency. Nor do I understand why markers would be called if they truly aren't present in the population.
Here's a portion of the data from bcftools stats:
AF, Stats by non-reference allele frequency:
AF [2]id [3]allele frequency [4]number of SNPs [5]number of transitions [6]number of transversions [7]number of indels [8]repeat-consistent [9]repeat-inconsistent [10]not applicable
AF 0 0.000000 1925317 1318327 606990 271 0 0 271
AF 0 0.008475 25134 23556 1578 34 0 0 34
AF 0 0.016949 86417 79970 6447 110 0 0 110
AF 0 0.025424 82610 76945 5665 115 0 0 115
I can get rid of these by filtering with a minor allele frequency, but I'd like to know why they were present in the first place.
Thanks!
The text was updated successfully, but these errors were encountered: