-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question: Non parental hap-mer identification #6
Comments
Hi @bmansfeld, Fabulous question. :) Will leave this issue open to see if anyone else can chime in. -Arang |
Hey Arang, Basically I aligned raw illumina reads to phase0 and phase1 fastas and classified them I converted the bams back to fastqs and then used meryl to count the kmers associated with each of these. My specific question is if I should first filter these kmer set to be unique (using meryl difference)? or should i use the full set including the shared kmers? I know this may be a bit circular but I'm trying to define these reads as hapmers to see if classifying by alignment can help identify phase switches and make blob plots with out the real parent data. Here are the blobs if I A) use And B) if I dont use Not sure how to interpret. A looks too good to be meaningful? and B) looks too bad to be useful? If you have any other feed back or things I can try that would be great. If this is really stupid please let me know too. I know it might. I'm thinking about this as a sort of "testing you de novo transcriptome by aligning your reads back to it". If they align well it's kinda what you expect, but if they don't you know something is wrong... Maybe that analogy is not the best for this approach... Thanks, |
Hi @arangrhie, Building on the conversation started by Ben Mansfeld in Issue #6 regarding hap-mer generation without access to parental genomes, I've adopted a similar strategy using Illumina reads alongside ONT R10 data to construct and evaluate phased genome assemblies. After assembly improvement steps with tools like purged_dups, NextPolish, and RagTag, and phasing with HapDup, I'm now in the process of quality assessment for these "dual" assemblies. This is a diploid genome of highly heterozygous plant pathogen. Workflow and Issue Description: Approach for Hapmer Creation and Blob Plot Generation:
head i8735_canu_hetopt_purged_polished_scaffolded_hapdup_dual_hap1_vs_hap2.qv
i8735_canu_hetopt_purged_polished_scaffolded_hapdup_dual_1 207900 134365932 40.6539 8.60222e-05
i8735_canu_hetopt_purged_polished_scaffolded_hapdup_dual_2 203620 134563709 40.7507 8.41261e-05
Both 411520 268929641 40.7021 8.50734e-05 The concern arises when the blob plots show an overly distinct separation of hap-mers, potentially indicating over-filtering or other issues in the hap-mer generation pipeline. Concern: Request for Input: Specific Questions:
Thank you for your time and consideration, Camilo PS. The plots are raw as they come out so there is some overlap with legends. |
Hello,
I'm very excited to test and assess the quality of our phased assembly (FALCON-Unzip -> FALCON-Phase).
However due to the nature of our system we are unable to acquire the parents of our sequenced individual.
In the preprint you mention:
Though this is beyond the scope of Mergury's pipeline could you direct me to methods for creating such a hap-mer set using Hi-C data?
While a little circular (Hi-C was used to phase the genome) It would be great to be able to asses the phasing accuracy using this method.
Thanks,
Ben
The text was updated successfully, but these errors were encountered: