Add support for long-read bams (genome) #210

oneillkza · 2020-05-27T15:09:31Z

Reading in vcfs from variant callers that run on long-read bams is only part of the problem. MAVIS still needs bam files for most operations. Such bams have a few key differences from short-read ("NGS") sequence:

Single end rather than paired-end
Variable (and long) read length
Relatively high error rate (5-10%), especially for homopolymers

This makes them very good for detecting large structural variants, especially since they can map through low-complexity regions, but less good for smaller variants.

This ticket is to track work on reading in long-read genome bams.

oneillkza · 2020-06-10T19:34:45Z

So, the first major design decision is to create a new file type, genome_longread for long read genomic bams. This is distinct from genome, for short read paired-end genomic bams. I'm probably going to be copying a lot of the code to handle the genome bam type, but I think that'll be cleaner than having if statements everywhere.

e.g. in stats I've created compute_genome_longread_bam_stats, which is a modified copy of compute_genome_bam_stats

oneillkza · 2020-06-17T19:04:17Z

OK, got it as far as being able to do config and setup. Clustering works, but it fails on validate.

ValueError: ('protocol error', 'genome_longread')

This is somewhat unsurprising. Looks like the next step is to create a class in validate/evidence.py, and a case in validate/main.py to match up the genome_longread protocol to.

oneillkza added the long read support Support for long read sequence data, e.g. from Oxford Nanopore or PacBio label May 27, 2020

oneillkza added the enhancement label Jun 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for long-read bams (genome) #210

Add support for long-read bams (genome) #210

oneillkza commented May 27, 2020

oneillkza commented Jun 10, 2020

oneillkza commented Jun 17, 2020

Add support for long-read bams (genome) #210

Add support for long-read bams (genome) #210

Comments

oneillkza commented May 27, 2020

oneillkza commented Jun 10, 2020

oneillkza commented Jun 17, 2020