-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
memory issue with pangolin version 4 and pangoLEARN #395
Comments
Just noticed this myself too. The error message doesn't say it explicitly, but memory usage swelled when testing pangolearn mode with the supplied test sequences Logs for failure here: |
Note to self: this is a perfect candidate for profiling with Scalene. |
The changes to pangoLEARN in pangolin 4 include a shift to a random forest model by @emilyscher. This model was performing more robustly to missing data and homoplasies, and is less overfit than the decision tree model so is a definite welcome shift. I'll add a warning about memory usage to the user, as 12GB is a lot, but had been under the impression that it uses 5GB of RAM- saying that I was struggling with github actions most of Thursday for ubuntu and their max is 7GB I believe so this might check out. Unfortunately my fix wasn't an actual solution, it was just to remove the github actions test for ubuntu as the macos test ran fine (which is allocated a max of 14GB). I'll add that warning in about RAM, but is this something that will need resolving or is a warning enough? |
I am not sure how many users process their data on <16GB RAM systems but that was possible with pangolin v3. A warning would be great. An optimization would be nice for those users with less ressources at their hands but perhaps that is a very small subset of users? |
Thanks for posting, this solved a mystery of crashes when I increased
I was trying to run 64 pangolins at once on 200,000 FASTA split into 64 chunks, so each one was trying to use 30GB RAM at least. This only happens on pangolearn mode not usher. |
Thanks for flagging this, 30GB of RAM sounds like a lot more than we had expected. The model itself shouldn't be that large, it could be an issue of holding lots of sequences in memory at once.
The rf has shown to be a more robust model, but because it's a bunch of decision trees together it's going to be more memory intensive. I was told it would need 5GB of RAM, but it seems that isn't the case! @emilyscher might have the profiling results from before, but if not I can run some tests and see what step is taking so much memory. I know @rmcolq was working on parallelising the model (on this branch so that chunking up a massive fasta file wouldn't be necessary (and then only one decompressed model would be held in memory at a time), but I can't remember if it was done or not. I'll revisit that branch this week too as that might be a good compromise if the model file is going to be very large. |
As a reference point the German RKI DESH [1] sequences require currently "a bit" over 8GiB main memory (I could go up to 10 on one VM, and then it finished). [1] https://github.com/robert-koch-institut/SARS-CoV-2-Sequenzdaten_aus_Deutschland |
I've found that for a single sample with pangolin v4.0.2, it requires roughly 11.5 GB RAM. Anything less will run OOM. though it would be more thorough if the memory usage was profiled as @fanninpm suggested! |
Similar finding; for batches of 2000 samples could not run on a 12GB RAM system but increasing to 16GB was successful.
|
wondering if anyone is seeing an increase in memory usage with the latest pangolin-data version 1.8? my jobs are failing when requesting 16GB. I did not have that issue with pangoling-data version 1.6. Thanks! |
@rrdavis77 Same here. Roughly need about 16-17GB of RAM with |
16 wasn't enough, but 16.5GB was enough. I'm guessing the minimum is between 16 and 16.5GB? # limiting docker container to 16GB memory
$ docker run --rm -ti -m 16000000000 -v $PWD:/data staphb/pangolin:4.0.6-pdata-1.8 pangolin EPI_ISL_6825395-B.1.1.529-omicron.fasta --analysis-mode pangolearn -o test-16GB-memory.csv
****
Pangolin running in pangolearn mode.
****
Warning: pangoLEARN mode may use a significant amount of RAM, be aware that it will not suit every system.
Maximum ambiguity allowed is 0.3.
****
Query file: /data/EPI_ISL_6825395-B.1.1.529-omicron.fasta
****
Data files found:
plearn_model: /opt/conda/envs/pangolin/lib/python3.8/site-packages/pangolin_data/data/randomForest_v1.joblib
plearn_header: /opt/conda/envs/pangolin/lib/python3.8/site-packages/pangolin_data/data/randomForestHeaders_v1.joblib
****
Job stats:
job count min threads max threads
--------------------- ------- ------------- -------------
align_to_reference 1 1 1
all 1 1 1
cache_sequence_assign 1 1 1
create_seq_hash 1 1 1
get_constellations 1 1 1
merged_info 1 1 1
scorpio 1 1 1
sequence_qc 1 1 1
total 8 1 1
****
Query sequences collapsed from 1 to 1 unique sequences.
****
1 sequences assigned via designations.
****
Running sequence QC
Total passing QC: 1
Job stats:
job count min threads max threads
----------------- ------- ------------- -------------
all 1 1 1
pangolearn 1 1 1
pangolearn_output 1 1 1
total 3 1 1
Running pangoLEARN assignment
Loading model 05/02/2022, 21:40:25
Killed
Exiting because a job execution failed. Look above for error message
# now with 16.5 GB RAM
$ docker run --rm -ti -m 16500000000 -v $PWD:/data staphb/pangolin:4.0.6-pdata-1.8 pangolin EPI_ISL_6825395-B.1.1.529-omicron.fasta --analysis-mode pangolearn -o test-16.5GB-memory.csv
****
Pangolin running in pangolearn mode.
****
Warning: pangoLEARN mode may use a significant amount of RAM, be aware that it will not suit every system.
Maximum ambiguity allowed is 0.3.
****
Query file: /data/EPI_ISL_6825395-B.1.1.529-omicron.fasta
****
Data files found:
plearn_model: /opt/conda/envs/pangolin/lib/python3.8/site-packages/pangolin_data/data/randomForest_v1.joblib
plearn_header: /opt/conda/envs/pangolin/lib/python3.8/site-packages/pangolin_data/data/randomForestHeaders_v1.joblib
****
Job stats:
job count min threads max threads
--------------------- ------- ------------- -------------
align_to_reference 1 1 1
all 1 1 1
cache_sequence_assign 1 1 1
create_seq_hash 1 1 1
get_constellations 1 1 1
merged_info 1 1 1
scorpio 1 1 1
sequence_qc 1 1 1
total 8 1 1
****
Query sequences collapsed from 1 to 1 unique sequences.
****
1 sequences assigned via designations.
****
Running sequence QC
Total passing QC: 1
Job stats:
job count min threads max threads
----------------- ------- ------------- -------------
all 1 1 1
pangolearn 1 1 1
pangolearn_output 1 1 1
total 3 1 1
Running pangoLEARN assignment
Loading model 05/02/2022, 21:41:33
Finished loading model 05/02/2022, 21:41:59
Processing block of 1 sequences 05/02/2022, 21:41:59
Complete 05/02/2022, 21:42:00
****
Output file written to: /data/test-16.5GB-memory.csv/lineage_report.csv
|
with the latest pangolin-data version 1.9, my jobs are failing when requesting 20GB. The RAM requirements seem to grow after each pangolin-data update :( |
^ Can confirm, with my tests using pangolin-data v1.9 it required approximately 19GB of RAM to run pangolearn mode to completion. It failed when 18GB of RAM was allocated
|
Thanks for the updated information- I think with so many categories and random forest as it is, I'm not sure what I can do in the short term to resolve this. We've discussed potentially splitting the model into hierarchical, variant-specific random forests, which should reduce the memory requirements, but will also slow it down. With UShER as the default inference now (and the developer of pangoLEARN in a new job) we may need to just have the warning in place for the time being. If there's any machine-learning aficionados who see something that could help, I'm very happy to take PRs too! |
Quick update: I've found it requires roughly 32 GB of RAM to use pangolearn mode with pangolin-data v1.16 I'm simply documenting as the models grow larger - it's not an issue for me since I use the usher analysis mode |
Thanks! It's good to keep track of this- we may need to make a decision about this approach soon because at the training end we're close to maxing out the server's RAM. Thanks for keeping us updated though! |
Performed the update to version 4 today and ran a test query on ~50 samples and it worked as expected in UShER mode. However, in pangoLEARN mode
--analysis-mode fast
, I am running out of memory. I only have 12GB of RAM on this test system but I can run pangolin version 3.* with no problems on the same system.Have the requirements for version 4 changed in terms of memory use?
Thanks!
The text was updated successfully, but these errors were encountered: