-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Additional info during ClinVar parsing #83
Comments
Thanks Christophe! I brought this up with the team during this morning's stand-up meeting. We'll investigate how this is represented in the XML file so that we can provide useful haplotype information. |
Hi @olingerc , It will be really helpful if you can describe a set of RCVs that are connected via this mechanism and which fields indicate the inter-relationship and how in more details. In short, I am asking for a description of your use case with real examples so that we better understand the feature you are requesting. Thanks. |
Hi @rajatshuvro, An example variant would be: 1-171076966-G-A Nirvana gives me the following ClinVar list (v3.18.1) There are a total of 3 different (alleleSpecific) VCVs:
However, when opening the ClinVar pages of the two pathogenic variants: here and here it is obvious that they are only pathogenic in case they are coupled with another variant (Haplotype). It would be very helpful if we had the "Haplotype" Info. It is stored in the <MeasureSet Type="Haplotype" ID="217371" Acc="VCV000217371" Version="1">
</MeasureSet> (extracted from the full xml). If I read your code correctly you almost read the info already here Here are all possible values:
A bonus would be having the info which other variant is in the haplotype. A quick fix would be extracting the title: <ClinVarResult-Set>
<ClinVarSet ID="101183654">
<RecordStatus>current</RecordStatus>
<Title>
NM_006894.4(FMO3):c.[472G>A;560T>C] AND Trimethylaminuria
</Title>
<ReferenceClinVarAssertion ID="477812" DateLastUpdated="2022-06-24" DateCreated="2015-10-30">
... within brackets, we see the identification of the second variant. Having the full list of variants would of course be nice as well, but I guess this would mean more changes to your code. Thanks for considering the request! Here is the corresponding line from a vcf file:
|
Thanks @olingerc . We are actively considering this a an upcoming feature. |
Dear Nirvana team,
I'm sorry to mis-use the issue tracker for a feature request. I was not sure on how to best approach you.
Thanks for the detailed information on how you compile the ClinVar entries (HERE). Quite often we have the situation were we have many Clinvar entries on a position. Even reducing to isAlleleSepcific, it is sometimes difficult to get a good understanding on which entries are relevant to our variant.
Specifically in the context of Clinvar entries that relate to variants at multiple sites (meaning they make only sense in case multiple variants are present = Haplotype). This information is stored in the
Measure
andGenotypeSet
Fields. Would it be possible to at least includeMeasure
? The example below from your documentation displays "single nucleotide variant" but we would be interested to identify cases for which this value would be "Haplotype" or "Genotype". Like this we could remove VCVs if they only make sense in case all variants are present.The text was updated successfully, but these errors were encountered: