Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add local assembly support for breakpoint validation in Nanopore inputs #305

Open
1 of 4 tasks
zhemingfan opened this issue Feb 3, 2022 · 1 comment
Open
1 of 4 tasks
Assignees
Labels
enhancement long read support Support for long read sequence data, e.g. from Oxford Nanopore or PacBio
Milestone

Comments

@zhemingfan
Copy link
Collaborator

zhemingfan commented Feb 3, 2022

Overview

From internal testing, wtdbg2 (long read SV assembler) performs well for assembly of breakpoints, something that MAVIS must implement.

The following changes must be made to integrate wtdbg2:

  • Ensure that local assemblies can validate known breakpoints
  • @zhemingfan will add a test for collecting informative reads from a bam file (add a function to the gather.py file and accompanying unit tests
  • @creisle will add a function to the bam.read module to process/simplify long read assembly alignments by removing indels below a certain size threshold as they are likely to be artifacts. This will ensure we can still use the CIGAR string of the alignment to call events downstream
  • TODO: add option to config to support bam types (long read vs paired end short read etc)
@zhemingfan zhemingfan added enhancement long read support Support for long read sequence data, e.g. from Oxford Nanopore or PacBio labels Feb 3, 2022
@zhemingfan zhemingfan added this to the v3.0.0 milestone Feb 3, 2022
@zhemingfan zhemingfan self-assigned this Feb 3, 2022
@creisle creisle modified the milestones: v3.0.0, v3.1.0 Feb 22, 2022
@creisle creisle self-assigned this May 6, 2022
@creisle
Copy link
Member

creisle commented Jul 19, 2022

For the initial long read assembly integration tests

Changes to MAVIS

  • incorporate option for long read assembler
  • choose "weird" reads in the evidence gathering step
  • assemble with long read assembler
  • re-align assemblies to the reference genome with long-read aligner
  • continue usual downstream processing for now

To create our ground truth sequences

  • pick several events found and validated by short reads
  • create the breakpoint sequence using the reference genome for ~20 base pairs either side (or other)

For each test

  • check that validate gather's all the reads you expect
  • check that the assembly it builds contains the ground truth sequence
    • align the ground truth to the assembly sequence using minimap2 and check that it looks alright

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement long read support Support for long read sequence data, e.g. from Oxford Nanopore or PacBio
Projects
None yet
Development

No branches or pull requests

2 participants