Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unusual Kmer multiplicity of HiFi reads and assemblies generated by hifiasm and nextdenovo #145

Open
ZexuanZhao opened this issue Oct 29, 2024 · 0 comments

Comments

@ZexuanZhao
Copy link

Hi!

First of all, thanks for developing merqury!

I'm assembling a genome of fig wasps. As they are super miniature, we used multiple (40-60) diploid female offspring from one or a few mother(s) in the same fig as input material for HiFi sequencing. At this timepoint, we do not know how many mothers those wasps belongs to for sure, but from the biology of fig wasps it's most likely a single mother mated with a single dad. Because fig wasps are haplodiploid, if we sequence all the female offspring from a single mother, the sample should be pseudo-triploid (25% from mother's hap1, 25% from mother's hap2, and 50% from the father). However, as the mating usually happens between siblings and as a consequence fig wasps are highly inbred, I would expect nearly no genetic differences between the 3 haplotypes.

The Kmer multiplicity of the HiFi reads looks unusual, as there are 4 peaks at 20, 85, 140 and 228:

Screenshot 2024-10-29 at 12 23 24 PM

The multiplicity of the peaks does not seems to be a result of ploidy which would be evenly spaced.

We assembled the reads using HiFiasm, but the assembly size was 1.5 Gbp but we expect around 450-500 Mbp. The copy number spectrum plot revealed that the assembly is highly duplicated, with 90.8% duplicated BUSCO :

Fcitrifolia_pollinator_hifiasm_merqOutput Fcitrifolia_pollinator_hifiasm PRI spectra-cn st

Although the assembly is way larger than we expected, the assembly is quite loyal to the reads, as the QV score is 45.8 and kmer completeness is 97.69% with assembly-only Kmer number = 44059 in 1511469476 total Kmers.

And we also assembled using NextDenovo, which seems to be better at deduplication. The assembly size is 498Mbp as we expected and duplicated BUSCO is only 2.9%.

Screenshot 2024-10-29 at 12 32 33 PM

The QV score of this assembly is 48.85 but the kmer completeness is only 66.58%.

################################
As a background if you are curious about using sibling wasps as HiFi input, we had a case where the assembly is much more satisfactory:

In this case the Kmer multiplicity of HiFi reads is as expected uni-modal:

Screenshot 2024-10-29 at 12 36 29 PM

The hifiasm assembly is also duplicated and extra-large (1Gbp), with QV score 61, Kmer completeness 98.96% and 77% duplicated BUSCOs:

Screenshot 2024-10-29 at 12 37 24 PM

But the nextDenovo assembly is totally fine, with assembly size 496 Mbp, QV score 57.15, Kmer completeness 97.98% and 0.7% duplicated BUSCOs:

Screenshot 2024-10-29 at 12 38 42 PM

So in this case, the nextDenovo assembly is not only less duplicated but also did not impair the Kmer completeness.

I would greatly appreciate if you could share some ideas on what happened in the first case about the weird modality of kmer multiplicity of HiFi reads and what could the nextDenovo assembly tell about the assembly/reads.

Thanks,
Zexuan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant