-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Detections model #107
Conversation
4031e29
to
91438d2
Compare
91438d2
to
57f9fc0
Compare
57f9fc0
to
70e8840
Compare
e3153e1
to
8eacc21
Compare
** Why are these changes being introduced: * The application needs a way to record the outcome of the application's Detectors when they review the Terms we receive. These outcomes will be needed in a subsequent step, the application tries to place these Terms into the various Categories. ** Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/tco-33 ** How does this address that need: This defines a Detections model, which is a linkage between a Term and a Detector. In order to allow the application's behavior to improve over time, we also define a DETECTOR_VERSION value in the environment, which will be updated as the application becomes more capable. Detections are the unique combination of a Term, a Detector, for a given DETECTOR_VERSION. The Detection model includes several scopes, which are consulted during the before_create lifecycle hook to prevent duplicate records from being created. I've attempted to write tests to indicate how this works. ** Document any side effects to this change: This change deviates from our earlier plans in two notable ways: * The confidence value which we originally intended to store with the Detection has been abandoned. Because a Detector might be linked with multiple Categories, each with a differing confidence value, there is no way to record a Term-Detector confidence value without losing information. The confidence values will still be used during Categorization, but the values cannot be cached in the Detection records. * The various record_* methods are no longer defined as part of the Detection class. Because a single Term will be joined to multiple Detection records, it made more sense for me to define the record_* methods on the Term class instead.
8eacc21
to
48ef85b
Compare
Jeremy pointed out during code review that it might be worth using find_or_create_by in the Term methods, rather than a combination of the create method and a guard method in the Detection model. This makes that change, adopting find_or_create_by inside Term and removing the check_for_current method from the Detection model. Along the way we update the specific assertion being looked for in the Detection tests. Please note that there are tests in the Term model that confirm the record_* methods an be called repeatedly without either creating duplicate records, or throwing errors.
Okay, @JPrevost - I think this is ready for another review. I've elected to keep the scopes in place - I think there's going to be a clear need for the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I believe this works. I have two comments (one small and one potential refactor). Neither are blocking so I am approving. If you choose to make any changes please dismiss my review when you are ready for a re-review.
- Explicitly return nil in Term methods where no return is needed - Update code comments for Detection and Detector classes - Use ClimateControl to modify environment when testing
** Why are these changes being introduced: The original proposal for implementing the Detection workflow involved a series of record_* methods in the Term model, which then called methods across the application to coordinate the creation of needed records. During code review, a more modular arrangement was suggested, where the record_* methods would be distributed in the Detector subclasses, where the work was actually being done. Those subclasses would then each manage their own Detection record creation. ** Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/tco-33 ** How does this address that need: This makes that change, moving the work of recording Detections to the Detector subclasses. The tests are refactored as well, to match this arrangement. ** Document any side effects to this change: Some of the record methods feel very close to being repetetive, which makes me wonder whether we should add them to the base Detector class, and allow overrides when a specific detector needs a different logic (for example, the Journal and SuggestedResource look up against a list, while StandardIdentifiers might provide its own regex-based record method).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. record
will likely always hurt my brain as I'm also thinking of that as a noun and not a verb. Hopefully we get used to it and if not we can change it.
This defines the
Detections
model, which links our Term and Detector records. The details have gotten a bit messier than I expected, and some changes to the class diagram may be necessary. This PR updates that diagram to help make it clear what's changing:First, the confidence value is no longer stored as part of the Detections record. Because a Detector might be mapped to multiple Categories, each with its own confidence value, but the Detections record has no relationship to Category, there's no way to comprehensively cache the appropriate confidence value here. The values will still be used during the Categorization process, but we'll have to look them up in the graph rather than cache them with the Detection.
Second, the various record_* methods are defined in the Term model, and not in the Detection model.
I've tried to write thorough tests confirming the various constraints, and walking through the intended uses of this model. I'm not sure that I've gotten everything, but hopefully nothing is glaringly wrong.
Developer
Ticket(s)
https://mitlibraries.atlassian.net/browse/TCO-33
Accessibility
all issues introduced by these changes have been resolved or opened
as new issues (link to those issues in the Pull Request details above)
Documentation
ENV
Stakeholders
Dependencies and migrations
NO dependencies are updated
YES migrations are included
Reviewer
Code
added technical debt.
Documentation
(not just this pull request message).
Testing