Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

output vcf format #11

Open
drvenki opened this issue Apr 3, 2019 · 2 comments
Open

output vcf format #11

drvenki opened this issue Apr 3, 2019 · 2 comments
Assignees

Comments

@drvenki
Copy link

drvenki commented Apr 3, 2019

Hi
I am trying to use svict for prostate cancer cfdna data and I have trouble understanding the format of the vcf

#CHROM  POS ID  REF ALT QUAL  FILTER  INFO
1	26608813	.	<DEL>	C	.	PASS	SVTYPE=DEL;END=26608889;CLUSTER=2;CONTIG=2;SUPPORT=17;

Why is the REF column showing a DEL tag? and could you please explain what this particular variant means?

Thanks,
Venki
Venkatesh Chellappa

@yenyilin yenyilin self-assigned this Apr 3, 2019
@yenyilin
Copy link
Collaborator

yenyilin commented Apr 3, 2019

We pinpoint the issue of the header and will resolve it in the next few days. For REF base we currently annotate DEL for long deletion, in this case between 26608814 and 26608889. We will use the actual deleted bases (at least for deletions within 100bps or so) for deletion events in the next update.

@drvenki
Copy link
Author

drvenki commented Apr 5, 2019

Thanks for the explanation!! :)
But by default, VCF processing programs downstream are not accepting this vcf as input and are throwing errors regarding this format.
Example scenario 1: IGV expects A,G,C,T,.,N in the REF columns. So when we have DEL and special characters like "<>", the VCF file is not compatible according to the VCF specifications.

As a workaround, we could have "N" in the REF column and "DEL" in the ALT column. Would it make the vcf look more friendly?

#CHROM  POS ID  REF ALT QUAL  FILTER  INFO
1	26608813	.	N	<DEL> or <INS>	.	PASS	SVTYPE=DEL;END=26608889;CLUSTER=2;CONTIG=2;SUPPORT=17;

Please look at page 4 of VCF 4.2 specifications (https://samtools.github.io/hts-specs/VCFv4.2.pdf) on how to display the REF and ALT alleles.

4. REF - reference base(s): Each base must be one of A,C,G,T,N (case insensitive). Multiple bases are permitted. The value in the POS field refers to the position of the first base in the String. For simple insertions and deletions in which either the REF or one of the ALT alleles would otherwise be null/empty, the REF and ALT Strings must include the base before the event (which must be reflected in the POS field), unless the event occurs at position 1 on the contig in which case it must include the base after the event; this padding base is not required (although it is permitted) for e.g. complex substitutions or other events where all alleles have at least one base represented in their Strings. If any of the ALT alleles is a symbolic allele (an angle-bracketed ID String “<ID>”) then the padding base is required and POS denotes the coordinate of the base preceding the polymorphism. Tools processing VCF files are not required to preserve case in the allele Strings. (String, Required).

5. ALT - alternate base(s): Comma separated list of alternate non-reference alleles. These alleles do not have to be called in any of the samples. Options are base Strings made up of the bases A,C,G,T,N,*, (case insensitive) or an angle-bracketed ID String (“<ID>”) or a breakend replacement string as described in the section on breakends. The ‘*’ allele is reserved to indicate that the allele is missing due to a upstream deletion. If there are no alternative alleles, then the missing value should be used. Tools processing VCF files are not required to preserve case in the allele String, except for IDs, which are case sensitive. (String; no whitespace, commas, or angle-brackets are permitted in the ID String itself) 

Page 11 of VCF 4.2 specifications gives an idea on how to encode structural variants! :)

Secondly, parsing the vcf programmatically is easy and friendly. I like the information provided especially the TWO_BP and Supporting Reads.
Great job guys.
Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants