Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NGS reads and Hifi reads which will be better for running Merqury? #31

Open
Simon-Huang1 opened this issue Mar 3, 2021 · 3 comments
Open
Labels
best practices How do I run Merqury?

Comments

@Simon-Huang1
Copy link

Hi,
I assemble a genome using Hifi reads. However, I am confused that NGS reads and Hifi reads which might be better for qualifying our genome. I thinks NGS reads will be biased on high GC and repetitive region which might not be perfect for qualifying genome, but in the cookbook of Merqury I only see explanation on how to use NGS reads to run Merqury. So here I ask about this question that NGS reads and Hifi reads which will be better for running Merqury?

@arangrhie
Copy link
Contributor

Hi @Simon-Huang1 , although it is know that Illumina reads do have GC biases, the HiFi reads are still a bit error prone in homopolymers, which affects a larger amount of k-mers.
You could make a hybrid kmer db, combining kmers from both Illumina and HiFi, for measuring QV.
I would still recommend to use Illumina for other metrics; such as completeness and spectrum analysis.

@arangrhie arangrhie added the best practices How do I run Merqury? label Mar 15, 2021
@ASLeonard
Copy link

Hi Arang,
To follow up on this, do you recommend the HiFi+Illumina hybrid db only for QV, while using an illumina only db for the other metrics? After a quick check, illum only has QV50, while hybrid is closer to QV65. Maybe it is due to greater filtering value (gt3 for illum but gt16 for hybrid), or if there are hifi biases in the assembly that now exist in the hifi-biased db?

hybrid   29689	3067099605	63.3635	4.60946e-07
illum_only    1788932	3067099605	45.5623	2.77822e-05

This relates as well to the new merfin best practices, where illumina reads are suggested for the db. I guess in this context as well, illumina only is preferred over hybrid?

@arangrhie
Copy link
Contributor

Hi @ASLeonard ,

I assume the illum_only also comes from a filtered version?
It is not possible to entirely remove biases;
I'd say provide QVs measured from each platforms and the hybrid set.
Ultimately it is an estimation from a given sequencing platform, and all data are at least supporting that the assembly is in high quality.

For Merfin, I appreciate your very fast access :) I just posted it and you are asking about it next day!
Merfin relies on the k-mer multiplicity so it is difficult to reliably measure this from a hybrid set.
Although Illumina is known for its GC biases, the effect of homopolymer / microsatellite (simple tandem) errors in HiFi was more genome-widely affecting the kmer spectrum, making it difficult to make accurate copy number estimates compared to Illumina.
So yes, I would say use Illumina for Merfin.

Merqury's QV is less affected as it does not account for the expected multiplicity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
best practices How do I run Merqury?
Projects
None yet
Development

No branches or pull requests

3 participants