You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Reading in vcfs from variant callers that run on long-read bams is only part of the problem. MAVIS still needs bam files for most operations. Such bams have a few key differences from short-read ("NGS") sequence:
Single end rather than paired-end
Variable (and long) read length
Relatively high error rate (5-10%), especially for homopolymers
This makes them very good for detecting large structural variants, especially since they can map through low-complexity regions, but less good for smaller variants.
This ticket is to track work on reading in long-read genome bams.
The text was updated successfully, but these errors were encountered:
So, the first major design decision is to create a new file type, genome_longread for long read genomic bams. This is distinct from genome, for short read paired-end genomic bams. I'm probably going to be copying a lot of the code to handle the genome bam type, but I think that'll be cleaner than having if statements everywhere.
e.g. in stats I've created compute_genome_longread_bam_stats, which is a modified copy of compute_genome_bam_stats
OK, got it as far as being able to do config and setup. Clustering works, but it fails on validate.
ValueError: ('protocol error', 'genome_longread')
This is somewhat unsurprising. Looks like the next step is to create a class in validate/evidence.py, and a case in validate/main.py to match up the genome_longread protocol to.
Reading in vcfs from variant callers that run on long-read bams is only part of the problem. MAVIS still needs bam files for most operations. Such bams have a few key differences from short-read ("NGS") sequence:
This makes them very good for detecting large structural variants, especially since they can map through low-complexity regions, but less good for smaller variants.
This ticket is to track work on reading in long-read genome bams.
The text was updated successfully, but these errors were encountered: