-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for svim and sniffles vcf as input #208
Comments
OK, so first issue, looking at sniffles vcfs, is that sniffles has an svtype "INV/DEL", eg:
However, there are only two of those, and they both look like false positives/artifacts. I've also noticed that the same variants seem to get called in other samples. Having checked in a case where I know there is a combined inv/del event, that event is not called as DEL/INV, but the artifactual ones are. It's also not clear how these events would fit into the vcf format, since the combined event has three breakpoints, and I believe vcf only allows for specifying two. I think the correct behaviour would be to ignore lines with SVTYPE=DEL/INV. |
There's also a DUP/INS:
which is a bit of a mess when I look in IGV since there does look like a real insertion, with maybe a real duplication, but they're in the middle of a poly(T) region. The variant reported seems to correspond in terms of breakpoints to the insertion. Also the insertion, when I BLASTed it, seemed to be a real germline variation reported in this paper: https://www.ncbi.nlm.nih.gov/pubmed/28250455 However, there is only one of these, and again it may be easiest to ignore them, since it isn't going to be clear which of the variants the breakpoints are referring to. I'll also make a ticket over at the Sniffles repo about this. This is also a note to myself: for insertions, Sniffles reports bp2 = bp1 + svlen. I guess this looks nicer in genome browsers, but will likely need correcting when loading in MAVIS. |
Lastly, sniffles uses the SVTYPE INVDUP, for an inverted duplication. This one at least should only have two breakpoints, and may be best to treat like a translocation. Or ignored. The only places these are called in the COLO829 test data is in MT and GL000225.1/GL000220.1. It might be safe to assume that they are always artifactual. |
OK, got it to load in the rows by ignoring any unrecognised SVTYPEs. Next is to add some checks/fixes for breakpoints being in the wrong order:
|
Sniffles vcf conversion seems to all be working, and now has test coverage. I've merged that into the long read branch in #211 |
Svim and sniffles are two SV callers specifically for long-read sequence data. It would be highly beneficial to be able to input them to MAVIS, both to cluster calls with each other, do somatic calling, and to integrate them with short-read sequence data.
My initial tests suggest that the 'vcf' input in MAVIS crashes for vcfs from both tools. This is somewhat unsurprising given the lack of standardisation for representing SVs in vcf format. So I'll undertake to create load scripts for the vcfs from these two tools.
The text was updated successfully, but these errors were encountered: