Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When scaffolding against a high quality reference, what do outputs of "hybrid" scaffolds most likely indicate? #84

Open
DaRinker opened this issue May 8, 2023 · 4 comments

Comments

@DaRinker
Copy link

DaRinker commented May 8, 2023

I am scaffolding several ONT+illumina assemblies (flye) against a T2T reference genome of a sister species. Each of my assemblies represents a specific strain of my species of interest--so while I expect some variation I don't necessarily expect massive structural rearrangements.

And, MOST of my output scaffolds find 1-to-1 correspondence to the T2T reference. However I'm seeing multiple instances where my output scaffolds are not always so clear cut. For example in 70% of my samples I get a single scaffold corresponding to "chr_1" (using the T2T headers) but in 30% I see "chr1" PLUS "chr1_chr6" (so looking like a chunk for chr1 moved to chr6). And it's not just a random thing as most of these "hybrid" scaffolds (when they appear) are always the same pairs of reference chromosomes.

Since I've tried multiple strategies (assembly parameter variation, different ragout reference sequences) I'm beginning to think that what I'm seeing is at least supported by my sequencing data. Are there any "sanity checks" can I do (within the ragout framework) to convince myself that what appear to be chromosomal translocations are actually real?

@DaRinker
Copy link
Author

DaRinker commented May 8, 2023

In looking at my output scaffolds in more detail, I don't think they're all correct. Not sure why, but for one reference chromosome, ragout is consistently inserting lots of small fragments that both a) do not align well to my reference and b) end up extending some scaffolds by over 2Mb(!!).

UPDATE: I tried soft masking all my contigs, as well as softmasking the T2T assembly, but nothing I try seems to stop this behavior. And it occurs in ALL my de novo assembles samples, so it's something beyond a random edge case...

@mikolmogorov
Copy link
Owner

Can you post the log file? Are you using the repeat resolution mode? How small are the fragments? You can perharps adjust the synteny block size to prevent it from inserting.

In general, Ragout won't make a connection (e.g. chr1 - chr6 fusion) unless there is evidence of it in at least one of the reference genomes or the target genome. For each iteration, you should have the file with synteny block order in each genome, and you may be able to tell which reference supports the fusion. Also, the links file should have the list of genomes that support each adjacency. If you can pinpoint which adjacency corresponds to the fusion, you can see which genomes support it.

@DaRinker
Copy link
Author

In general, Ragout won't make a connection (e.g. chr1 - chr6 fusion) unless there is evidence of it in at least one of the reference genomes or the target genome.

This is useful. Since no references support the translocation, it sounds like I can assume the evidence is coming from the target assembly itself. And since the translocations I'm seeing DO make (parsimonious) sense within the phylogenetic context, I'm starting to think they may be real.

@mikolmogorov
Copy link
Owner

Could be! If I remember correctly, Ragout may keep an adjacency that is unsupported by references if (i) it sees complementary breakpoints (e.g. in inversion should have two) and (ii) it should not alter the chromosome structure significantly. Does chr1-chr6 fusion lead to a kariotype change? If so, not sure why this happens.. But if its more like a smaller translocation, it means that all its breakpoints should be contained in your assmebled genome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants