Beacon Instance Variant Query Options and Information #151
mbaudis
started this conversation in
Long-term Schema Plans
Replies: 1 comment 1 reply
-
I wonder if these points could be addressed by aligning on the VRS 2 SequenceReference class? |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Beacon missing on option to inform about features & content
While beacons can inform about
filteringTerms
and their scopes and generally indicate which entities are supported by them, especially for genomic variants such information is mostly missing beyond the general indication of supported request parameters (which probably sometimes just represent the default). So in the current state there is no direct support for topics like:Which genomes and genome editions can be queried?
Beacon provides a default query parameter
assemblyId
which can be used to specify the genome edition (e.g.GRCh38
) in a query but the Beacon itself does not have an endpoint to expose supported assemblies, and according to the documentation no specific format is required in queries:Therefore it isn't clear if a beacon supports or understands a requested assembly , and no specific error will be given upon a failed request.
Additionally, the
assemblyId
parameter might clash with version specificreferenceName
parameters.Suggestions
assemblyId
and move to versioned sequence identifiersrefseq:NC_000007.14
conversion to front ends / implementations but outside of the Beacon specificationWhich sequence identifiers are supported?
As mentioned above everything is permitted (although we suggest "assembly specific RefSeqId"):
Suggestions
What identifiers can be used for the
variantType
parameter1?This is as above for sequence identifiers. Here a more stringent use of well documented variant types (for CNV from EFO, for sequence variants from SO as a first guess; possible use of VCF symbolic alleles if better documented...) should be settled on, but also indicated through some informational endpoint.
How are variants normalized2?
The positional normalization of variants will affect positional matching, particularly for sequence variations with shortened sequences; and there is no documentation / enforcement of a given style. This will lead e.g. to non-matches for VCF style sequence deletions when using the VRS concept of full left normalization.
Suggestions
What types of variant queries are supported and/or what types of variants exist in a given beacon or in its datasets?
The correct representation of supported variant parameters does not necessarily indicate what types of queries will be supported (and that is compounded by the lack of indication of supported
What to do next...
Footnotes
For CNVs we have a standards documentation - developed with GA4GH GKS and the ELIXIR hCNV community inside the Beacon documentation. ↩
The topic was brought up by @ahwagner at a variant scouts meeting. ↩
Beta Was this translation helpful? Give feedback.
All reactions