Reduce concurrent runs of WDK cache population #49

ryanrdoherty · 2022-09-12T11:04:58Z

Per Sept 9 scrum, we would like to reduce the number of times a particular application tried to concurrently fill a specific cache. If one search is taking too long, the user may try to reload the page or try the search again. This will take yet more resources (CPU + DB connection + network) for no benefit. A possible solution was worked out previously:

Background on WDK Cache Maintenance and a suggested solution (which we are not taking):

The way we combat WDK cache insertion race conditions is working; however, ever since we moved to a service + thick-client (vs JSP page) architecture, it has no longer been optimal. We should find a way to reduce concurrent query execution while still providing the guarantees the current system provides.

Review: right now what happens is that if a result needs to be put into a cache table, a new cache table name is procured (in an atomic way) and then the result is placed in that table. Once this is successful, a row is added/replaced in the query_instance table that maps query hashes to cache table names. Any request comes in that needs the same cached result (but the row in the query_instance table does not yet exist), performs the same logic concurrently, with the last cache table to complete "winning" and having its name associated with the query hash.

This worked quite well in the Servlet/JSP architecture because, e.g. the entire result page was rendered by a single request. The servlet for that page would fill the cache table, then any other code or JSP tag that needed the values would use them. The race condition only happened if the user asked for the same results page in two separate tabs, so actual concurrency was rare an inexpensive.

In our new architecture, different parts of the result page render in parallel (strategy panel, organism filter, counts display, result table, etc.), with requests made at the same time to fetch their data. If the result is not yet in the cache, this can kick off many queries performing the same search (and filling separate cache tables), only to have one of them be used in subsequent requests. This is inefficient; a single load could happen per query hash, where secondary requests wait for the query to complete. Our existing mechanism to protect against separate tabs or even separate sites configured with the same appDB+login could remain.

aurreco-uga · 2022-09-12T12:13:34Z

(i would describe this issue as: "reloading the set/results page while the step is loading generates new connections to the database" , i guess to create a new cache table)

ryanrdoherty added the enhancement New feature or request label Sep 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce concurrent runs of WDK cache population #49

Reduce concurrent runs of WDK cache population #49

ryanrdoherty commented Sep 12, 2022

aurreco-uga commented Sep 12, 2022

Reduce concurrent runs of WDK cache population #49

Reduce concurrent runs of WDK cache population #49

Comments

ryanrdoherty commented Sep 12, 2022

aurreco-uga commented Sep 12, 2022