-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate Citation detector with Metrics #126
Conversation
** Why are these changes being introduced: Now that the Detection::Citation model is in place, we need to integrate it with the existing Metrics reporting. ** Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/tco-107 ** How does this address that need: * Adds a citation field (integer) to the metrics_algorithms table * Defines a match_citations method in the Algorithms model, to pass search events through the Citation detector and count those where one is found (via the detection? method). This also includes expanding the event_matches logic to account for this in the unmatched calculations. * Creates a new search_event fixture, using the citation term fixture, to give the algorithm something to count. * As a side effect, we have to tweak the isbn term fixture, because it was unintentionally tripping the citation as well. See the side effect section for a longer discussion of this. * Adds tests to the Algorithm model to account for everything above, copying from other calculations. ** Document any side effects to this change: The fact that this needed to tweak a different term fixture is an indication that our detectors are starting to overlap. This is a real phenomenon in production, and is expected as we start to build more detectors. I do not have a notion for now about how to address this in the more constrained test environment, where we can carefully craft fixtures to do exactly what we intend. It feels like this behavior should be confirmed via explicit tests that address this overlap, but the tests we have so far are focused on one-and-only-one detector, so I feel okay about tweaking things to keep them separate. We do have a term fixture that trips multiple detectors, confirming that they get counted in all applicable categories, so we may be covered already?
journal_exact = process_journals(event, matches) | ||
suggested_resource_exact = process_suggested_resources(event, matches) | ||
lcshs = match_lcsh(event, matches) | ||
|
||
matches[:unmatched] += 1 if ids.detections.blank? && lcshs.detections.blank? && journal_exact.count.zero? && suggested_resource_exact.count.zero? | ||
matches[:unmatched] += 1 if ids.detections.blank? && !citation.detection? && lcshs.detections.blank? && journal_exact.count.zero? && suggested_resource_exact.count.zero? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to normalize Citation
a bit more after seeing how it behaves differently. Not in this PR, but just eventually. This could take the form of adding the .detection?
to all of our Detectors and/or normalizing what .detection
returns. No changes are should be made as part of this work, but this difference made me pause and have to understand why it was done this way. It makes sense based on how Citation currently works but feels like it may be confusing in the future as to why it is different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I agree that having a more standardized set of methods across our detectors would be helpful. Some differences might be unavoidable (the citation detector doesn't extract specific patterns, while the standard identifier one does), but looking now I think the others (which all treat the phrase holistically) could benefit from a .detection?
method.
This integrates the citation detector with the monthly metrics setup. It includes a tweak to one of the term fixtures, because the ISBN fixture had ended up getting counted in the citation detector (which is a real phenomenon, but our test fixtures feel like they should be crafted to keep things separate).
Quick note about the side effect
The fact that this needed to tweak a different term fixture is an indication that our detectors are starting to overlap. This is a real phenomenon in production, and is expected as we start to build more detectors.
I do not have a notion for now about how to address this in the more constrained test environment, where we can carefully craft fixtures to do exactly what we intend. It feels like this behavior should be confirmed via explicit tests that address this overlap, but the tests we have so far are focused on one-and-only-one detector, so I feel okay about tweaking things to keep them separate.
We do have a term fixture that trips multiple detectors, confirming that they get counted in all applicable categories, so we may be covered already?
Developer
Ticket(s)
https://mitlibraries.atlassian.net/browse/TCO-107
Accessibility
all issues introduced by these changes have been resolved or opened
as new issues (link to those issues in the Pull Request details above)
Documentation
ENV
Stakeholders
Dependencies and migrations
NO dependencies are updated
YES migrations are included
Reviewer
Code
added technical debt.
Documentation
(not just this pull request message).
Testing