-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JWST Pipeline Memory Leaks when run as a Subprocess #8404
Comments
Thanks for opening the issue and for sharing the minimal example. What version of jwst are you using? I tried to replicate this locally (mac OS M1, jwst main) and I'm not seeing a memory leak for the following (which is slightly modified from the example you provided):
Running the above the memory usage climbs to ~240 MB per process (the size of the input file) and remains constant throughout the run. I do get the Is it possible the minimal example didn't capture the issue? Is it possible to share more of the code? |
Thanks for the quick response! I am using pulling directly against the GitHub repo (jwst:1.14.1.dev2+gdd295809). I believe the exact tag I am using unfortunately no longer exists, however. I may have been slightly hasty in diagnosing this issue. When I run the code you provided within an HPC cluster environment, the job reports hundreds of GBs of use, even with just ~50 input files in parallel. This can cause the job to get cancelled as it exceeds the memory budget provided. I thought it was related to the leaking semaphore warning, but I have no proof of this. I have also been able to see the same level of memory usage on macOS. Either CentOS is not freeing up resources related to the semaphores, or it is an unrelated issue to the semaphores at all. Or, as you suggested, there may be something related to how processes are created on different OSes (e.g. spawn vs fork vs forkserver) I'm about to head on vacation, but will look more into this issue when I return. I am hoping that if the semaphore issue is solved, it will also solve my problem. |
Closed as I believe the source of the problem is high memory usage for Stage 1 Pipeline processing. Perhaps related to #2144? |
I suspect this is not related to #2144, as that is TSO data which by its nature has very large input files, and the solution there was to segment _uncal files into integration chunks. That said, if there's a memory leak in |
I am running the Stage 1 Pipeline over ~1000 of UNCAL images. To leverage multiple CPU cores I am using the Python multiprocessing library to parallelize the operations since each UNCAL image will be independent.
However, when I do so, each Stage 1 Pipeline incurs sufficient memory leakage that by the end of running 1000 files, I have ~600GB of memory usage, forcing me to use high-memory nodes. This problem continues to scale the more files I have, appearing to incur ~500MB of leakage per file processed.
Here is a minimal working example:
While this performs the expected behavior, it returns the warning:
And if this is run over 1000s of images, the memory leaks continue to pile up.
This only occurs if the relevant Pipeline step is imported at the top of the function, rather than in the specific step itself. I.e. the following does not incur the same memory leakage:
It appears that when the JWST pipeline modules are copied to the new process, something funny is occurring that enables memory leaks. For now I am wrapping my JWST imports within the functions that are called by multiprocessing modules. Perhaps it should either be documented that this is how to avoid memory leaks or the root problem should be determined.
I have tested this problem on macOS (M2) and Linux (Rocky Linux, CentOS) and it appears in all cases.
Thanks to @jdavies-st for helping diagnose this issue.
The text was updated successfully, but these errors were encountered: