Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Locking to support concurrency (not using Python multiprocessing/subprocess) #10

Open
mzpqnxow opened this issue Dec 29, 2023 · 1 comment

Comments

@mzpqnxow
Copy link
Contributor

mzpqnxow commented Dec 29, 2023

Hello there!

I'm still using this project and recently ran into a case where I was going to have a multiple instances of an application running concurrently. They needed to share the same cache

Because currently, the cache writes in cache_to_disk are not atomic, this isn't a safe thing to do and can cause all sorts of erroneous cache misses if you have enough concurrent processes. You probably hadn't thought too much about this use-case but I'm pretty sure you understand it to be a known limitation/design choice - it's not a bug

I would like to implement this to solve my use-case and would like to send it as PR if you're interested

It could be an optional behavior, though I don't think there's much to lose in making it default - the performance hit won't be bad for cases where it's not necessary since there will never be lock contention, and these locks are not heavyweight at all

I'm thinking of implementing it as a file-based lock (via flock()) rather than introducing a lot of complexity using something like shared memory. It's only an "advisory" lock - any process could still mess with the files if they really wanted to- but that's not really a concern here

I do understand that Python multiprocessing can handle all of these things somewhat seamlessly with its own lock classes, but my use-case doesn't involve concurrency via the Python multiprocessing (or even subprocess) packages. Instead, each instance of my application is created completely independent of the others. Imagine, for example, a non-Python web application that invokes a standalone Python application (which happens to use cache_to_disk)

Any thoughts on this?

It should be minimally invasive because my use-case is rather simple. In the end, the logic is pretty simple

For the cache reads, roughly:

lock = get_exclusive_lock(cache_file)
data = read_cache(cache_file)
release_exclusive_lock(cache_file, lock)

For the cache writes, roughly:

lock = get_exclusive_lock(cache_file)
write_cache(cache_file, data)
release_exclusive_lock(cache_file, lock)

I have to re-familiarize myself with the code again because I don't even recall how things are stored on the filesystem, but I don't see any reasons this general approach wouldn't work

It's possible this could even be done quite neatly with decorators, I guess

Any feedback/ideas are welcome

Thanks!

@mzpqnxow
Copy link
Contributor Author

I added a PR of a rough shot at it if you're interested in taking a look and if you have time

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant