Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combination of categorical filter and filering on "unknown" in numerical produces discrepancy #11204

Open
alisman opened this issue Nov 18, 2024 · 5 comments
Assignees
Labels

Comments

@alisman
Copy link
Contributor

alisman commented Nov 18, 2024

curl 'http://localhost:8082/api/column-store/filtered-samples/fetch'
-H 'Accept: application/json'
-H 'Accept-Language: en-US,en;q=0.9'
-H 'Cache-Control: no-cache'
-H 'Connection: keep-alive'
-H 'Content-Type: application/json'
-H 'Cookie: _ga=GA1.1.1887007066.1710956751; _ga_5260NDGD6Z=GS1.1.1721930557.11.1.1721930574.0.0.0'
-H 'Origin: http://localhost:8082'
-H 'Pragma: no-cache'
-H 'Referer: http://localhost:8082/study/summary?id=genie_public'
-H 'Sec-Fetch-Dest: empty'
-H 'Sec-Fetch-Mode: cors'
-H 'Sec-Fetch-Site: same-origin'
-H 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Safari/537.36'
-H 'sec-ch-ua: "Chromium";v="130", "Google Chrome";v="130", "Not?A_Brand";v="99"'
-H 'sec-ch-ua-mobile: ?0'
-H 'sec-ch-ua-platform: "macOS"'
--data-raw '{"clinicalDataFilters":[{"attributeId":"CANCER_TYPE_DETAILED","values":[{"value":"BREAST INVASIVE DUCTAL CARCINOMA"}]},{"attributeId":"AGE_AT_SEQ_REPORT","values":[{"start":65,"end":70},{"start":70,"end":75},{"start":75,"end":80}]}],"studyIds":["genie_public"],"alterationFilter":{"copyNumberAlterationEventTypes":{"AMP":true,"HOMDEL":true},"mutationEventTypes":{"any":true},"structuralVariants":null,"includeDriver":true,"includeVUS":true,"includeUnknownOncogenicity":true,"includeUnknownTier":true,"includeGermline":true,"includeSomatic":true,"includeUnknownStatus":true,"tiersBooleanMap":{}}}'

@onursumer
Copy link
Member

This curl doesn't seem to produce any discrepancy. I think it is missing the UNKNOWN filter.

Updating the filters to something like this reproduces the issue:

{"clinicalDataFilters":[{"attributeId":"CANCER_TYPE_DETAILED","values":[{"value":"BREAST INVASIVE DUCTAL CARCINOMA"}]},{"attributeId":"AGE_AT_SEQ_REPORT","values":[{"start":80,"end":85},{"start":85},{"value":"UNKNOWN"}]}],"studyIds":["genie_public"],"alterationFilter":{"copyNumberAlterationEventTypes":{"AMP":true,"HOMDEL":true},"mutationEventTypes":{"any":true},"structuralVariants":null,"includeDriver":true,"includeVUS":true,"includeUnknownOncogenicity":true,"includeUnknownTier":true,"includeGermline":true,"includeSomatic":true,"includeUnknownStatus":true,"tiersBooleanMap":{}}}

@alisman
Copy link
Contributor Author

alisman commented Nov 19, 2024

@onursumer any idea why it happens?

@onursumer
Copy link
Member

Still looking into it. It happens when UNKNOWN is selected together with numerical values. It doesn't happen only UNKNOWN is selected.

@onursumer
Copy link
Member

@alisman Looks like we are not properly handling the case where numerical and categorical filters are together in the same ClinicalDataFilter.

We have a method to check whether a filter is catergorical or not, but it only checks the first filter in the list.

value="studyViewFilterHelper.isCategoricalClinicalDataFilter(clinicalDataFilter)" />

public boolean isCategoricalClinicalDataFilter(ClinicalDataFilter clinicalDataFilter) {
var filterValue = clinicalDataFilter.getValues().getFirst();
return filterValue.getValue() != null;
}

So, for example, we treat the clinical data filter below as a numerical filter because the first filter is numerical.

image

The SQL dealing with numerical filters only checks for NA and ignores any other value.

<!-- if both 'NA' and non-NA are selected, union them together -->
<if test="userSelectsNA and userSelectsNumericalValue">
UNION ALL
</if>
<!-- if non-NA is selected, prepare non-NA samples -->
<if test="userSelectsNumericalValue">

So, for this specific example we always ignore UNKNOWN. I guess we need to somehow improve the SQL to handle both numerical and categorical filters at the same time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants