You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm assembling a genome of fig wasps. As they are super miniature, we used multiple (40-60) diploid female offspring from one or a few mother(s) in the same fig as input material for HiFi sequencing. At this timepoint, we do not know how many mothers those wasps belongs to for sure, but from the biology of fig wasps it's most likely a single mother mated with a single dad. Because fig wasps are haplodiploid, if we sequence all the female offspring from a single mother, the sample should be pseudo-triploid (25% from mother's hap1, 25% from mother's hap2, and 50% from the father). However, as the mating usually happens between siblings and as a consequence fig wasps are highly inbred, I would expect nearly no genetic differences between the 3 haplotypes.
The Kmer multiplicity of the HiFi reads looks unusual, as there are 4 peaks at 20, 85, 140 and 228:
The multiplicity of the peaks does not seems to be a result of ploidy which would be evenly spaced.
We assembled the reads using HiFiasm, but the assembly size was 1.5 Gbp but we expect around 450-500 Mbp. The copy number spectrum plot revealed that the assembly is highly duplicated, with 90.8% duplicated BUSCO :
Although the assembly is way larger than we expected, the assembly is quite loyal to the reads, as the QV score is 45.8 and kmer completeness is 97.69% with assembly-only Kmer number = 44059 in 1511469476 total Kmers.
And we also assembled using NextDenovo, which seems to be better at deduplication. The assembly size is 498Mbp as we expected and duplicated BUSCO is only 2.9%.
The QV score of this assembly is 48.85 but the kmer completeness is only 66.58%.
################################
As a background if you are curious about using sibling wasps as HiFi input, we had a case where the assembly is much more satisfactory:
In this case the Kmer multiplicity of HiFi reads is as expected uni-modal:
The hifiasm assembly is also duplicated and extra-large (1Gbp), with QV score 61, Kmer completeness 98.96% and 77% duplicated BUSCOs:
But the nextDenovo assembly is totally fine, with assembly size 496 Mbp, QV score 57.15, Kmer completeness 97.98% and 0.7% duplicated BUSCOs:
So in this case, the nextDenovo assembly is not only less duplicated but also did not impair the Kmer completeness.
I would greatly appreciate if you could share some ideas on what happened in the first case about the weird modality of kmer multiplicity of HiFi reads and what could the nextDenovo assembly tell about the assembly/reads.
Thanks,
Zexuan
The text was updated successfully, but these errors were encountered:
Hi!
First of all, thanks for developing merqury!
I'm assembling a genome of fig wasps. As they are super miniature, we used multiple (40-60) diploid female offspring from one or a few mother(s) in the same fig as input material for HiFi sequencing. At this timepoint, we do not know how many mothers those wasps belongs to for sure, but from the biology of fig wasps it's most likely a single mother mated with a single dad. Because fig wasps are haplodiploid, if we sequence all the female offspring from a single mother, the sample should be pseudo-triploid (25% from mother's hap1, 25% from mother's hap2, and 50% from the father). However, as the mating usually happens between siblings and as a consequence fig wasps are highly inbred, I would expect nearly no genetic differences between the 3 haplotypes.
The Kmer multiplicity of the HiFi reads looks unusual, as there are 4 peaks at 20, 85, 140 and 228:
The multiplicity of the peaks does not seems to be a result of ploidy which would be evenly spaced.
We assembled the reads using HiFiasm, but the assembly size was 1.5 Gbp but we expect around 450-500 Mbp. The copy number spectrum plot revealed that the assembly is highly duplicated, with 90.8% duplicated BUSCO :
Although the assembly is way larger than we expected, the assembly is quite loyal to the reads, as the QV score is 45.8 and kmer completeness is 97.69% with assembly-only Kmer number = 44059 in 1511469476 total Kmers.
And we also assembled using NextDenovo, which seems to be better at deduplication. The assembly size is 498Mbp as we expected and duplicated BUSCO is only 2.9%.
The QV score of this assembly is 48.85 but the kmer completeness is only 66.58%.
################################
As a background if you are curious about using sibling wasps as HiFi input, we had a case where the assembly is much more satisfactory:
In this case the Kmer multiplicity of HiFi reads is as expected uni-modal:
The hifiasm assembly is also duplicated and extra-large (1Gbp), with QV score 61, Kmer completeness 98.96% and 77% duplicated BUSCOs:
But the nextDenovo assembly is totally fine, with assembly size 496 Mbp, QV score 57.15, Kmer completeness 97.98% and 0.7% duplicated BUSCOs:
So in this case, the nextDenovo assembly is not only less duplicated but also did not impair the Kmer completeness.
I would greatly appreciate if you could share some ideas on what happened in the first case about the weird modality of kmer multiplicity of HiFi reads and what could the nextDenovo assembly tell about the assembly/reads.
Thanks,
Zexuan
The text was updated successfully, but these errors were encountered: