You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
I assemble a genome using Hifi reads. However, I am confused that NGS reads and Hifi reads which might be better for qualifying our genome. I thinks NGS reads will be biased on high GC and repetitive region which might not be perfect for qualifying genome, but in the cookbook of Merqury I only see explanation on how to use NGS reads to run Merqury. So here I ask about this question that NGS reads and Hifi reads which will be better for running Merqury?
The text was updated successfully, but these errors were encountered:
Hi @Simon-Huang1 , although it is know that Illumina reads do have GC biases, the HiFi reads are still a bit error prone in homopolymers, which affects a larger amount of k-mers.
You could make a hybrid kmer db, combining kmers from both Illumina and HiFi, for measuring QV.
I would still recommend to use Illumina for other metrics; such as completeness and spectrum analysis.
Hi Arang,
To follow up on this, do you recommend the HiFi+Illumina hybrid db only for QV, while using an illumina only db for the other metrics? After a quick check, illum only has QV50, while hybrid is closer to QV65. Maybe it is due to greater filtering value (gt3 for illum but gt16 for hybrid), or if there are hifi biases in the assembly that now exist in the hifi-biased db?
This relates as well to the new merfin best practices, where illumina reads are suggested for the db. I guess in this context as well, illumina only is preferred over hybrid?
I assume the illum_only also comes from a filtered version?
It is not possible to entirely remove biases;
I'd say provide QVs measured from each platforms and the hybrid set.
Ultimately it is an estimation from a given sequencing platform, and all data are at least supporting that the assembly is in high quality.
For Merfin, I appreciate your very fast access :) I just posted it and you are asking about it next day!
Merfin relies on the k-mer multiplicity so it is difficult to reliably measure this from a hybrid set.
Although Illumina is known for its GC biases, the effect of homopolymer / microsatellite (simple tandem) errors in HiFi was more genome-widely affecting the kmer spectrum, making it difficult to make accurate copy number estimates compared to Illumina.
So yes, I would say use Illumina for Merfin.
Merqury's QV is less affected as it does not account for the expected multiplicity.
Hi,
I assemble a genome using Hifi reads. However, I am confused that NGS reads and Hifi reads which might be better for qualifying our genome. I thinks NGS reads will be biased on high GC and repetitive region which might not be perfect for qualifying genome, but in the cookbook of Merqury I only see explanation on how to use NGS reads to run Merqury. So here I ask about this question that NGS reads and Hifi reads which will be better for running Merqury?
The text was updated successfully, but these errors were encountered: