Calculations of accuracy comparing Williams lab simulations to RFmix runs
Python script to calculate accuracy of RFmix ancestral calls against the simulated truth ancestral calls. As input, the program expects two files.
- An ancestral truth file with the chr, bp position, and space delimited phased ancestry calls. It is current hard-coded to expect 50 haplotypes (25 individuals) and a 2-way admixed scenario (ancestry calls can be 0 or 1). An example including two individuals at two SNPs:
1 570178 1 1 1 1
1 752566 1 1 0 1
I processed the bp-converted simulated output from Williams lab simu-mix program to generate my truth dataset.
- An RFmix output msp file.
These can both be for a single chromosome or the entire genome; the script matches first on chromosome and then will find the window of the RFmix output that the truth bp location fits in to verify call accuracy.
Usage:
python GlobalLAIaccuracy-userflags.py --Anc [TRUTH_FILE] --msp [RFMIX_MSPFILE]
Other files in this repository were hard coded for a specific project involving a very multi-way admixed population. These could be manually adapted for other use.