Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: catalog query filtering without elasticsearch #675

Merged
merged 1 commit into from
Sep 28, 2023

Conversation

johnnagro
Copy link
Contributor

@johnnagro johnnagro commented Sep 21, 2023

Description

Given a catalog query and some content metadata - can we determine if the content matches the query without using Elasticsearch (ie search/all)?

This will be useful when and if:

  • crawl content once, determine catalogs from our copy (rather than re-crawl content for every query it belongs to)

  • grind search/all less

  • we want to process content metadata update events (from event bus)

  • support catalog contents with similar query syntax but broader sources of content metadata (not everything in discovery)

  • https://2u-internal.atlassian.net/browse/ENT-7712

Testing

  • I wrote some tests
  • i parsed all our production queries and they're syntactically valid based on this code
  • if we like this, i'll write a job in prod to compare our query tool to Elasticsearch

@johnnagro johnnagro force-pushed the johnnagro/ENT-7712/0 branch 3 times, most recently from b740d90 to f3edd94 Compare September 21, 2023 18:47
Comment on lines 33 to 38
if raw_query_key == 'aggregration__key':
return 'aggregation_key'
elif raw_query_key == 'aggregation__key':
return 'aggregation_key'
elif raw_query_key == 'org__exempt':
return 'org__exclude'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Could put this in a dict to reduce duplication and allow adding more cases without further stacking the elif tower

Suggested change
if raw_query_key == 'aggregration__key':
return 'aggregation_key'
elif raw_query_key == 'aggregation__key':
return 'aggregation_key'
elif raw_query_key == 'org__exempt':
return 'org__exclude'
corrections_for_typos = {
'aggregation_key': ['aggregration__key', 'aggregation__key'],
'org__exclude': ['org__exempt']
}
for correction, typos in corrections_for_typos.items():
if raw_query in typos:
return correction;

field = None
# comparison_kind defaults to "exact match"
comparison_kind = 'exact'
if len(raw_query_key.split("__")) == 2:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

raw_query_key.split("__") expression gets duplicated a couple times

Suggested change
if len(raw_query_key.split("__")) == 2:
field__comparison = raw_query_key.split("__")
if len(field__comparison) == 2:

Comment on lines 116 to 127
if comparison_kind == 'exact':
field_result = any(field_results)
else:
field_result = all(field_results)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the logic be reversed here? I would expect 'exact' matching to require matching all results.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its confusing but "exact" in this case is really mean to be more like "includes" so any of them need to match. if the course key is "edX+Demo" and the query key is like "course_key: [ "edX+Demo", "MIT+Demo"]" you'd want this to match where as the else case is an "excludes" which means NONE of them can match... lemme think about the right way to document this and/or if we need to code this so its easier to understand.

@johnnagro johnnagro force-pushed the johnnagro/ENT-7712/0 branch 2 times, most recently from c4618f6 to f496ea3 Compare September 25, 2023 19:17
@johnnagro johnnagro force-pushed the johnnagro/ENT-7712/0 branch 5 times, most recently from 7290c19 to 296dde5 Compare September 28, 2023 18:43
@johnnagro johnnagro merged commit 53eaa93 into master Sep 28, 2023
6 checks passed
@johnnagro johnnagro deleted the johnnagro/ENT-7712/0 branch September 28, 2023 19:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants