Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can Haplocheck be used to detect contamination in non-human species data? #18

Open
maruiqi0710 opened this issue Apr 21, 2023 · 2 comments

Comments

@maruiqi0710
Copy link

I have already sequenced a batch of yeast data and would like to use Haplocheck to detect contamination in the WGS data. I note that the examples mentioned in Contamination detection in sequencing studies using the mitochondrial phylogeny are all human sequencing data, and the software is based on Phylotree. I would like to know whether haplocheck can be used for other non-human species.
Thanks.

@haansi
Copy link
Member

haansi commented Apr 21, 2023

basically it would be possible, as we now updated the underlying haplogrep to version 3 - which allows other phylogenetic trees than the human one. However we would need a recent mt-phylogeny from yeast / mouse /... that comes in the form of a mutation annotated tree, with haplogroups / clades as identifier. So it's not straight-forward, but theoretically possible. In summary we would need the according phylogenetic tree similar represented as Phylotree - here an example: http://phylotree.org/tree/A.htm - then we need the reference sequence and its annotations in a gff3 file, could update haplogrep to work with this new tree and integrate it in haplocheck.

@maruiqi0710
Copy link
Author

maruiqi0710 commented Apr 21, 2023

Thanks for your reply. Can you provide detailed tutorials for beginners? I noticed the tutorial in https://haplogrep.readthedocs.io/en/latest/trees, but the description is too brief. I noticed an example of https://github.com/genepi/phylotree-rcrs-17/tree/main/src. I only know the files generated by bwa index and the *.dict file (generated by gatk CreateSequenceDictionary). But I don’t know how to create other files, such as a yaml file, rules.csv, tree.xml, weights.txt and annotations folder that contains so much information in https://github.com/genepi/phylotree-rcrs-17/tree/main/src. Another important thing is how to integrate the results of Haplogrep 3 into Haplocheck?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants