-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add optional values field to response filtering terms #160
base: develop
Are you sure you want to change the base?
Conversation
This solution and PR has been presented to the Beacon Filters Scout and we are asking for votes on whether to accept. During the Filters Scout meeting today, there was a vote of 5 in favour of accepting and 0 against. We want to give those who couldn't attend the meeting, or need more time to consider things, an opportunity to vote. Therefore, leave +1 or -1 before the next Filters Scout meeting on 25th November 2024. A final decision will be made then. |
I think, the proposed schema should be more strict. "dependentSchemas": {
"values": {
"properties": {
"type": { "const": "enumeration" }
}
}
} So basically allow "values" only when "type" is "enumeration" Cheers, Dmitry |
Hi @gsfk and thanks for this PR. I'm 100% in favor of accepting this proposal, but I would first direct it towards dev and then, when we decide which milestone this fits in, merge it to main with the corresponding announcement. I know this is not a breaking change but this is not an urgent fix but a feature, so we should follow the branch management rules. Why do you want to make the request to main directly? +1 |
Didn't get it... Do we need explicit string values for the "ontology" or "alphanumeric" filters? +1 D. |
Regarding the target branch, no trouble to merge to develop first. When I wrote this pr, develop was trailing main (rather than the other way around) |
+1 That being said: In the long run the same structure could remodel the current structure of filtering terms since it is basically turning the current structure on its head:
When turning this around for the ontology terms one could do the same... e.g. instead of ~800 objects for |
Agree with the PR to better align FilteringTerm in the request with FilteringTerm in the response. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving this in current form and leaving additional changes (enum discussion etc.) for another time.
@costero-e Can you take up the merge? Thanks! |
PR for issue #79, where I suggested adding a field to filtering terms, so here is the change.
Targeting the main branch since dev is trailing main.The proposal is to bring FilteringTerm in the request into better alignment with FilteringTerm in the response from various
/filtering_terms
endpoints. I ignore a third, similar definition of filters available at /framework/src/configuration/filteringTermSchema, this file is orphaned and is falling behind the rest of the spec.This PR proposes an optional addition to the response; this is a non-breaking change (see this example) which among other selling points reduces the verbosity of /filtering_terms responses. See these slides for further discussion.
Motivation / use cases
An alphanumeric query has required fields
id
,operator
,value
; an example query (from the docs) isBut the response filtering term discards both the
operator
andvalue
fields. The expected workaround among some beacon users appears to be to package the id, operator and value into a single string so it can be returned as an id, eg:or something similar. But this is unsupported by the specification and presents other limitations, such as verbose unstructured responses from /filtering_terms endpoints, see slides linked above for discussion. The proposed solution is to not discard the "value" field.
The proposed change was motivated by the addition of beacon v2 to the Bento platform, where we encountered a few issues with alphanumeric search:
(1) String fields not covered by a specification: In order to handle data from the Quebec covid bioank (Biobanque québécoise de la COVID-19), we added an extra property field for subject domicile, with values matching the clinical data collected:
Some fields (such as "homeless") map well to existing ontology terms, but others involve Quebec-specific terminology such as "CHSLD"; domicile data is stored as a string from an enum of acceptable values. We anticipate similar issues handling data from the upcoming Pan-Canadian Genome Library.
(2) Binned Search: Bento beacon is focused on public anonymous search, so some fields are searchable by predefined bins only, to inhibit re-identification attacks. Here are some example bins for BMI, again from the covid biobank:
but these bins are not discoverable without workarounds. Some fields may have a large number of bins, which makes the expected workaround above needlessly verbose.
Discussion
The goal here is to improve alignment between request and response filtering terms, since only the latter is discoverable.
The problem is not simply that
value
andoperator
go missing when we make a /filtering_term response, but that there is also an ambiguity in how different queries treat theid
field. For OntologyFilter, id represents a value to be found in the data, but for AlphanumericFilter, id is a field to be queried. The design of response filtering terms seems to have considered only the first kind ofid
. Consequently, only OntologyFilter is truly discoverable, other kinds of query require workarounds outside of the specification.What if I don't like this PR?
The discoverability problem for alphanumeric filters still exists! There are alternative solutions:
Both of these alternatives leave us with a flattened, verbose /filtering_terms response which this PR eliminates, but these are better than changing nothing. Or, as I've heard suggested, do a complete redesign of filters. But that looks like Beacon v3, which is too far in the future.