NGS reads and Hifi reads which will be better for running Merqury? #31

Simon-Huang1 · 2021-03-03T09:24:36Z

Hi,
I assemble a genome using Hifi reads. However, I am confused that NGS reads and Hifi reads which might be better for qualifying our genome. I thinks NGS reads will be biased on high GC and repetitive region which might not be perfect for qualifying genome, but in the cookbook of Merqury I only see explanation on how to use NGS reads to run Merqury. So here I ask about this question that NGS reads and Hifi reads which will be better for running Merqury?

arangrhie · 2021-03-03T18:26:17Z

Hi @Simon-Huang1 , although it is know that Illumina reads do have GC biases, the HiFi reads are still a bit error prone in homopolymers, which affects a larger amount of k-mers.
You could make a hybrid kmer db, combining kmers from both Illumina and HiFi, for measuring QV.
I would still recommend to use Illumina for other metrics; such as completeness and spectrum analysis.

ASLeonard · 2021-07-15T12:02:09Z

Hi Arang,
To follow up on this, do you recommend the HiFi+Illumina hybrid db only for QV, while using an illumina only db for the other metrics? After a quick check, illum only has QV50, while hybrid is closer to QV65. Maybe it is due to greater filtering value (gt3 for illum but gt16 for hybrid), or if there are hifi biases in the assembly that now exist in the hifi-biased db?

hybrid   29689	3067099605	63.3635	4.60946e-07
illum_only    1788932	3067099605	45.5623	2.77822e-05

This relates as well to the new merfin best practices, where illumina reads are suggested for the db. I guess in this context as well, illumina only is preferred over hybrid?

arangrhie · 2021-07-16T19:18:01Z

Hi @ASLeonard ,

I assume the illum_only also comes from a filtered version?
It is not possible to entirely remove biases;
I'd say provide QVs measured from each platforms and the hybrid set.
Ultimately it is an estimation from a given sequencing platform, and all data are at least supporting that the assembly is in high quality.

For Merfin, I appreciate your very fast access :) I just posted it and you are asking about it next day!
Merfin relies on the k-mer multiplicity so it is difficult to reliably measure this from a hybrid set.
Although Illumina is known for its GC biases, the effect of homopolymer / microsatellite (simple tandem) errors in HiFi was more genome-widely affecting the kmer spectrum, making it difficult to make accurate copy number estimates compared to Illumina.
So yes, I would say use Illumina for Merfin.

Merqury's QV is less affected as it does not account for the expected multiplicity.

arangrhie added the best practices How do I run Merqury? label Mar 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NGS reads and Hifi reads which will be better for running Merqury? #31

NGS reads and Hifi reads which will be better for running Merqury? #31

Simon-Huang1 commented Mar 3, 2021

arangrhie commented Mar 3, 2021

ASLeonard commented Jul 15, 2021

arangrhie commented Jul 16, 2021

NGS reads and Hifi reads which will be better for running Merqury? #31

NGS reads and Hifi reads which will be better for running Merqury? #31

Comments

Simon-Huang1 commented Mar 3, 2021

arangrhie commented Mar 3, 2021

ASLeonard commented Jul 15, 2021

arangrhie commented Jul 16, 2021