Add local assembly support for breakpoint validation in Nanopore inputs #305

zhemingfan · 2022-02-03T19:31:19Z

Overview

From internal testing, wtdbg2 (long read SV assembler) performs well for assembly of breakpoints, something that MAVIS must implement.

The following changes must be made to integrate wtdbg2:

Ensure that local assemblies can validate known breakpoints
@zhemingfan will add a test for collecting informative reads from a bam file (add a function to the gather.py file and accompanying unit tests
@creisle will add a function to the bam.read module to process/simplify long read assembly alignments by removing indels below a certain size threshold as they are likely to be artifacts. This will ensure we can still use the CIGAR string of the alignment to call events downstream
TODO: add option to config to support bam types (long read vs paired end short read etc)

creisle · 2022-07-19T18:44:50Z

For the initial long read assembly integration tests

Changes to MAVIS

To create our ground truth sequences

pick several events found and validated by short reads
create the breakpoint sequence using the reference genome for ~20 base pairs either side (or other)

For each test

check that validate gather's all the reads you expect
check that the assembly it builds contains the ground truth sequence
- align the ground truth to the assembly sequence using minimap2 and check that it looks alright

zhemingfan added enhancement long read support Support for long read sequence data, e.g. from Oxford Nanopore or PacBio labels Feb 3, 2022

zhemingfan added this to the v3.0.0 milestone Feb 3, 2022

zhemingfan self-assigned this Feb 3, 2022

creisle modified the milestones: v3.0.0, v3.1.0 Feb 22, 2022

creisle self-assigned this May 6, 2022