CV2-5789 dont send null values out to presto from check-api #2155
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
We have a minor, persistent Sentry error where we are sending null items from Check API to Alegre (and eventually Presto) to match them for similarity. This ticket skips those cases, as it ought to, to stop unnecessary work and stop the minor error stream.
References: CV2-5789
How has this been tested?
I added a PR and deployed for Presto to get more instrumentation with regards to the origin of null vectorization requests. They came through as vanilla requests for encoding against
original_title
andoriginal_description
fields as we do in these lines. This is likely the predominant candidate for where this happens. I tried to look for a more "unified" approach but couldn't quite come up with some generic preflight check that didn't end up just adding more complexity than was warranted.Things to pay attention to during code review
Any other candidates? I'm of the opinion that we solve these ones and see if we see more cases rather than hunt them all down pre-emptively but I can be persuaded!
Checklist