Locking to support concurrency (not using Python multiprocessing/subprocess) #10

mzpqnxow · 2023-12-29T15:36:41Z

Hello there!

I'm still using this project and recently ran into a case where I was going to have a multiple instances of an application running concurrently. They needed to share the same cache

Because currently, the cache writes in cache_to_disk are not atomic, this isn't a safe thing to do and can cause all sorts of erroneous cache misses if you have enough concurrent processes. You probably hadn't thought too much about this use-case but I'm pretty sure you understand it to be a known limitation/design choice - it's not a bug

I would like to implement this to solve my use-case and would like to send it as PR if you're interested

It could be an optional behavior, though I don't think there's much to lose in making it default - the performance hit won't be bad for cases where it's not necessary since there will never be lock contention, and these locks are not heavyweight at all

I'm thinking of implementing it as a file-based lock (via flock()) rather than introducing a lot of complexity using something like shared memory. It's only an "advisory" lock - any process could still mess with the files if they really wanted to- but that's not really a concern here

I do understand that Python multiprocessing can handle all of these things somewhat seamlessly with its own lock classes, but my use-case doesn't involve concurrency via the Python multiprocessing (or even subprocess) packages. Instead, each instance of my application is created completely independent of the others. Imagine, for example, a non-Python web application that invokes a standalone Python application (which happens to use cache_to_disk)

Any thoughts on this?

It should be minimally invasive because my use-case is rather simple. In the end, the logic is pretty simple

For the cache reads, roughly:

lock = get_exclusive_lock(cache_file)
data = read_cache(cache_file)
release_exclusive_lock(cache_file, lock)

For the cache writes, roughly:

lock = get_exclusive_lock(cache_file)
write_cache(cache_file, data)
release_exclusive_lock(cache_file, lock)

I have to re-familiarize myself with the code again because I don't even recall how things are stored on the filesystem, but I don't see any reasons this general approach wouldn't work

It's possible this could even be done quite neatly with decorators, I guess

Any feedback/ideas are welcome

Thanks!

The text was updated successfully, but these errors were encountered:

mzpqnxow · 2023-12-29T18:29:12Z

I added a PR of a rough shot at it if you're interested in taking a look and if you have time

mzpqnxow mentioned this issue Dec 29, 2023

Experimental concurrency support #11

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Locking to support concurrency (not using Python multiprocessing/subprocess) #10

Locking to support concurrency (not using Python multiprocessing/subprocess) #10

mzpqnxow commented Dec 29, 2023 •

edited

Loading

mzpqnxow commented Dec 29, 2023

Locking to support concurrency (not using Python multiprocessing/subprocess) #10

Locking to support concurrency (not using Python multiprocessing/subprocess) #10

Comments

mzpqnxow commented Dec 29, 2023 • edited Loading

mzpqnxow commented Dec 29, 2023

mzpqnxow commented Dec 29, 2023 •

edited

Loading