Consider caching the results of "related items" SOLR searches #2832

eddierubeiz · 2024-12-23T21:16:16Z

Background
Since 2022, we've been using Solr's "more-like-this" feature to fetch up to 3 works from the index that look similar, based on metadata. This is what allows us to show two other letters to Gabor Levy under "Related Works" on the work page for this letter to Gabor Levy.

Right now, every time you load that letter in a browser, our website contacts solr (at least in theory) to retrieve those other two letters, even though the likelihood of Solr changing its answer between any two consecutive calls is actually vanishingly small.

Let's consider caching the results of that call to Solr (the one that says "Tell me 3 items that are similar to this one.

Recipe:
For a given work,

if we have more-like-this info for that work that's current, then show that.
if we have no such info, or the info is stale, then ask solr for the info.
if solr answers, store the new info, discarding any stale info, and keep track of the current date. The info we just retrieved will go stale in (e.g.) one week.
if solr doesn't answer in time (happens a lot), then fall back on any stored info we might have, stale or not.

Pro:

This will reduce our dependence on SearchStax (the externally hosted service that provides us with our search results).
The page will load faster in all cases; in some cases it will load much faster (up to roughly a second faster).
In a vast majority of cases, we believe the results will be the same with and without the cache.
We hardly ever delete works, so we don't have to worry about a broken more-like-this link.

Con:
Especially in collections that are undergoing active editing, the Related Works section will fail to include newly-added or recently-edited more-like-this matches.

jrochkind · 2024-12-23T21:33:56Z

We should use the standard Rails cache mechanism to do this, probably caching work primary keys as a list. (Can still fetch the works from db, not solr, on page display, by id).

https://guides.rubyonrails.org/caching_with_rails.html#low-level-caching-using-rails-cache

We should consider our choices of how to configure Rails cache. Right now we don't actually have Rails cache configured -- by default it is a per-machine in-memory cache (I think), which works if we only have ONE web dyno. If we had more than one, they might each have their own copy of a cache. And even with one, the cache will be reset every night when the dyno restarts.

That might be fine for this use case, or we could consider configuring a more persistent shared cache. We already have a memcached for rack-attack cache, perhaps we could use that for both purposes (or replace it with a redis used for both purposes, either way). We could also consider using the new solid_cache db-based cache -- and possibly for both purposes, although it might be too slow for rack-attack that happens on every single request.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider caching the results of "related items" SOLR searches #2832

Consider caching the results of "related items" SOLR searches #2832

eddierubeiz commented Dec 23, 2024

jrochkind commented Dec 23, 2024

Consider caching the results of "related items" SOLR searches #2832

Consider caching the results of "related items" SOLR searches #2832

Comments

eddierubeiz commented Dec 23, 2024

jrochkind commented Dec 23, 2024