Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce concurrent runs of WDK cache population #49

Open
ryanrdoherty opened this issue Sep 12, 2022 · 1 comment
Open

Reduce concurrent runs of WDK cache population #49

ryanrdoherty opened this issue Sep 12, 2022 · 1 comment
Labels
enhancement New feature or request

Comments

@ryanrdoherty
Copy link
Member

Per Sept 9 scrum, we would like to reduce the number of times a particular application tried to concurrently fill a specific cache. If one search is taking too long, the user may try to reload the page or try the search again. This will take yet more resources (CPU + DB connection + network) for no benefit. A possible solution was worked out previously:

Background on WDK Cache Maintenance and a suggested solution (which we are not taking):

The way we combat WDK cache insertion race conditions is working; however, ever since we moved to a service + thick-client (vs JSP page) architecture, it has no longer been optimal. We should find a way to reduce concurrent query execution while still providing the guarantees the current system provides.

Review: right now what happens is that if a result needs to be put into a cache table, a new cache table name is procured (in an atomic way) and then the result is placed in that table. Once this is successful, a row is added/replaced in the query_instance table that maps query hashes to cache table names. Any request comes in that needs the same cached result (but the row in the query_instance table does not yet exist), performs the same logic concurrently, with the last cache table to complete "winning" and having its name associated with the query hash.

This worked quite well in the Servlet/JSP architecture because, e.g. the entire result page was rendered by a single request. The servlet for that page would fill the cache table, then any other code or JSP tag that needed the values would use them. The race condition only happened if the user asked for the same results page in two separate tabs, so actual concurrency was rare an inexpensive.

In our new architecture, different parts of the result page render in parallel (strategy panel, organism filter, counts display, result table, etc.), with requests made at the same time to fetch their data. If the result is not yet in the cache, this can kick off many queries performing the same search (and filling separate cache tables), only to have one of them be used in subsequent requests. This is inefficient; a single load could happen per query hash, where secondary requests wait for the query to complete. Our existing mechanism to protect against separate tabs or even separate sites configured with the same appDB+login could remain.

@ryanrdoherty ryanrdoherty added the enhancement New feature or request label Sep 12, 2022
@aurreco-uga
Copy link
Member

(i would describe this issue as: "reloading the set/results page while the step is loading generates new connections to the database" , i guess to create a new cache table)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants