Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to check validity of results #86

Open
Alexkortsi opened this issue Jul 22, 2023 · 1 comment
Open

How to check validity of results #86

Alexkortsi opened this issue Jul 22, 2023 · 1 comment
Labels

Comments

@Alexkortsi
Copy link

Hi,

i have ONT and Illumina PE data for a fungal genome with a size of 40Mbp. I have tried several methods to reach to a final assembly. My final assembly with Flye has 30 contigs (i would like to complete the assembly to the 7 chromosomes of this organism). My questions are:

  1. I used ragout with a reference genome (which is assembled in chromosomes) of the Same species but different strain. What would be the next step to validate the correctness of the 7 chromosomes assembled for my strain? Given that a certain degree of rearrangements must have happened, is there any way that i have lost any important information by this method? Is there any way i can manually check results?

  2. Regarding the unplaced contigs, how should i handle these reads if they are not placed within the 7 chromosomes created by ragout?

thanks a lot for your time and help!!!

@mikolmogorov
Copy link
Owner

Hi,

Wrt to validation, they are usually tailored for particular projects. Your final assembly is based on the information from long reads and the reference genome, so ideally you'd need some kind of orthogonal data to validate, but this of course is not always possible. You may look into various assembly metrics computed by QUAST, or methods used in the recent genome assembly papers (e.g. https://www.nature.com/articles/s41586-022-05325-5).

For unplaced contigs, you may look into tools for gap filling, but they likely won't be able to help much. The completeness and quality of the original long-read assembly is the main limitation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants