memory issue with pangolin version 4 and pangoLEARN #395

rrdavis77 · 2022-04-01T23:03:48Z

Performed the update to version 4 today and ran a test query on ~50 samples and it worked as expected in UShER mode. However, in pangoLEARN mode --analysis-mode fast, I am running out of memory. I only have 12GB of RAM on this test system but I can run pangolin version 3.* with no problems on the same system.
Have the requirements for version 4 changed in terms of memory use?

Thanks!

The text was updated successfully, but these errors were encountered:

kapsakcj · 2022-04-02T00:41:21Z

Just noticed this myself too. The error message doesn't say it explicitly, but memory usage swelled when testing pangolearn mode with the supplied test sequences pangolin /pangolin/test/test_seqs.fasta --analysis-mode pangolearn -o test_seqs-output-plearn

Logs for failure here:
https://github.com/StaPH-B/docker-builds/runs/5796292215?check_suite_focus=true

fanninpm · 2022-04-02T02:36:26Z

Note to self: this is a perfect candidate for profiling with Scalene.

aineniamh · 2022-04-02T07:46:21Z

The changes to pangoLEARN in pangolin 4 include a shift to a random forest model by @emilyscher. This model was performing more robustly to missing data and homoplasies, and is less overfit than the decision tree model so is a definite welcome shift.

I'll add a warning about memory usage to the user, as 12GB is a lot, but had been under the impression that it uses 5GB of RAM- saying that I was struggling with github actions most of Thursday for ubuntu and their max is 7GB I believe so this might check out. Unfortunately my fix wasn't an actual solution, it was just to remove the github actions test for ubuntu as the macos test ran fine (which is allocated a max of 14GB).

I'll add that warning in about RAM, but is this something that will need resolving or is a warning enough?

rrdavis77 · 2022-04-02T15:11:51Z

I am not sure how many users process their data on <16GB RAM systems but that was possible with pangolin v3. A warning would be great. An optimization would be nice for those users with less ressources at their hands but perhaps that is a very small subset of users?

tseemann · 2022-04-03T02:35:27Z

Thanks for posting, this solved a mystery of crashes when I increased parallel -j XX for pangolearn.
Finally checke system logs and see it was being killed by the kernel OOM killer:

[Sun Apr  3 11:18:43 2022] Out of memory: Killed process 3307066 (python) total-vm:31142324kB, anon-rss:10222624kB, file-rss:0kB, shmem-rss:0kB, UID:1424802263 pgtables:21788kB oom_score_adj:0
[Sun Apr  3 11:18:44 2022] oom_reaper: reaped process 3307066 (python), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

I was trying to run 64 pangolins at once on 200,000 FASTA split into 64 chunks, so each one was trying to use 30GB RAM at least.

This only happens on pangolearn mode not usher.
It didn't happen on v3 - i could use 256 in parallel then.

aineniamh · 2022-04-04T08:26:22Z

Thanks for flagging this, 30GB of RAM sounds like a lot more than we had expected. The model itself shouldn't be that large, it could be an issue of holding lots of sequences in memory at once.
I guess one solution could be to have options:

usher
pangolearn-rf (the new random forest option)
pangolearn-dt (the original decision tree option available in v3)

The rf has shown to be a more robust model, but because it's a bunch of decision trees together it's going to be more memory intensive. I was told it would need 5GB of RAM, but it seems that isn't the case! @emilyscher might have the profiling results from before, but if not I can run some tests and see what step is taking so much memory.

I know @rmcolq was working on parallelising the model (on this branch so that chunking up a massive fasta file wouldn't be necessary (and then only one decompressed model would be held in memory at a time), but I can't remember if it was done or not. I'll revisit that branch this week too as that might be a good compromise if the model file is going to be very large.

rgerhards · 2022-04-04T10:03:46Z

As a reference point the German RKI DESH [1] sequences require currently "a bit" over 8GiB main memory (I could go up to 10 on one VM, and then it finished).

[1] https://github.com/robert-koch-institut/SARS-CoV-2-Sequenzdaten_aus_Deutschland

kapsakcj · 2022-04-07T19:53:09Z

I've found that for a single sample with pangolin v4.0.2, it requires roughly 11.5 GB RAM. Anything less will run OOM.

though it would be more thorough if the memory usage was profiled as @fanninpm suggested!

rrdavis77 · 2022-04-07T20:27:35Z

Similar finding; for batches of 2000 samples could not run on a 12GB RAM system but increasing to 16GB was successful.

rrdavis77 · 2022-04-28T22:46:27Z

wondering if anyone is seeing an increase in memory usage with the latest pangolin-data version 1.8? my jobs are failing when requesting 16GB. I did not have that issue with pangoling-data version 1.6. Thanks!

kapsakcj · 2022-05-02T21:35:22Z

@rrdavis77 Same here. Roughly need about 16-17GB of RAM with --analysis-mode pangolearn with pangolin-data 1.8

kapsakcj · 2022-05-02T21:47:00Z

16 wasn't enough, but 16.5GB was enough. I'm guessing the minimum is between 16 and 16.5GB?

# limiting docker container to 16GB memory
$ docker run --rm -ti -m 16000000000 -v $PWD:/data staphb/pangolin:4.0.6-pdata-1.8 pangolin EPI_ISL_6825395-B.1.1.529-omicron.fasta --analysis-mode pangolearn -o test-16GB-memory.csv
****
Pangolin running in pangolearn mode.
****
Warning: pangoLEARN mode may use a significant amount of RAM, be aware that it will not suit every system.
Maximum ambiguity allowed is 0.3.
****
Query file:     /data/EPI_ISL_6825395-B.1.1.529-omicron.fasta
****
Data files found:
plearn_model:   /opt/conda/envs/pangolin/lib/python3.8/site-packages/pangolin_data/data/randomForest_v1.joblib
plearn_header:  /opt/conda/envs/pangolin/lib/python3.8/site-packages/pangolin_data/data/randomForestHeaders_v1.joblib
****
Job stats:
job                      count    min threads    max threads
---------------------  -------  -------------  -------------
align_to_reference           1              1              1
all                          1              1              1
cache_sequence_assign        1              1              1
create_seq_hash              1              1              1
get_constellations           1              1              1
merged_info                  1              1              1
scorpio                      1              1              1
sequence_qc                  1              1              1
total                        8              1              1

****
Query sequences collapsed from 1 to 1 unique sequences.
****
1 sequences assigned via designations.
****
Running sequence QC
Total passing QC: 1
Job stats:
job                  count    min threads    max threads
-----------------  -------  -------------  -------------
all                      1              1              1
pangolearn               1              1              1
pangolearn_output        1              1              1
total                    3              1              1

Running pangoLEARN assignment
Loading model 05/02/2022, 21:40:25
Killed
Exiting because a job execution failed. Look above for error message

# now with 16.5 GB RAM
$ docker run --rm -ti -m 16500000000 -v $PWD:/data staphb/pangolin:4.0.6-pdata-1.8 pangolin EPI_ISL_6825395-B.1.1.529-omicron.fasta --analysis-mode pangolearn -o test-16.5GB-memory.csv
****
Pangolin running in pangolearn mode.
****
Warning: pangoLEARN mode may use a significant amount of RAM, be aware that it will not suit every system.
Maximum ambiguity allowed is 0.3.
****
Query file:     /data/EPI_ISL_6825395-B.1.1.529-omicron.fasta
****
Data files found:
plearn_model:   /opt/conda/envs/pangolin/lib/python3.8/site-packages/pangolin_data/data/randomForest_v1.joblib
plearn_header:  /opt/conda/envs/pangolin/lib/python3.8/site-packages/pangolin_data/data/randomForestHeaders_v1.joblib
****
Job stats:
job                      count    min threads    max threads
---------------------  -------  -------------  -------------
align_to_reference           1              1              1
all                          1              1              1
cache_sequence_assign        1              1              1
create_seq_hash              1              1              1
get_constellations           1              1              1
merged_info                  1              1              1
scorpio                      1              1              1
sequence_qc                  1              1              1
total                        8              1              1

****
Query sequences collapsed from 1 to 1 unique sequences.
****
1 sequences assigned via designations.
****
Running sequence QC
Total passing QC: 1
Job stats:
job                  count    min threads    max threads
-----------------  -------  -------------  -------------
all                      1              1              1
pangolearn               1              1              1
pangolearn_output        1              1              1
total                    3              1              1

Running pangoLEARN assignment
Loading model 05/02/2022, 21:41:33
Finished loading model 05/02/2022, 21:41:59
Processing block of 1 sequences 05/02/2022, 21:41:59
Complete 05/02/2022, 21:42:00
****
Output file written to: /data/test-16.5GB-memory.csv/lineage_report.csv

rrdavis77 · 2022-06-02T22:20:44Z

with the latest pangolin-data version 1.9, my jobs are failing when requesting 20GB. The RAM requirements seem to grow after each pangolin-data update :(

kapsakcj · 2022-06-03T18:06:44Z

^ Can confirm, with my tests using pangolin-data v1.9 it required approximately 19GB of RAM to run pangolearn mode to completion. It failed when 18GB of RAM was allocated

# killed/OOM/failed
docker run --rm -ti -m 18000000000 -v $PWD:/data staphb/pangolin:4.0.6-pdata-1.9 pangolin EPI_ISL_6825395-B.1.1.529-omicron.fasta --analysis-mode pangolearn -o test-18GB-memory.csv

# passed
docker run --rm -ti -m 19000000000 -v $PWD:/data staphb/pangolin:4.0.6-pdata-1.9 pangolin EPI_ISL_6825395-B.1.1.529-omicron.fasta --analysis-mode pangolearn -o test-19GB-memory.csv

aineniamh · 2022-06-08T10:22:50Z

Thanks for the updated information- I think with so many categories and random forest as it is, I'm not sure what I can do in the short term to resolve this.

We've discussed potentially splitting the model into hierarchical, variant-specific random forests, which should reduce the memory requirements, but will also slow it down. With UShER as the default inference now (and the developer of pangoLEARN in a new job) we may need to just have the warning in place for the time being.

If there's any machine-learning aficionados who see something that could help, I'm very happy to take PRs too!

kapsakcj · 2022-11-09T21:53:37Z

Quick update: I've found it requires roughly 32 GB of RAM to use pangolearn mode with pangolin-data v1.16

I'm simply documenting as the models grow larger - it's not an issue for me since I use the usher analysis mode

aineniamh · 2022-11-10T08:59:39Z

Thanks! It's good to keep track of this- we may need to make a decision about this approach soon because at the training end we're close to maxing out the server's RAM.

Thanks for keeping us updated though!

kapsakcj mentioned this issue Apr 2, 2022

Add pangolin v4 StaPH-B/docker-builds#329

Merged

11 tasks

aineniamh mentioned this issue Apr 13, 2022

Testing of pangolearn analysis mode #433

Closed

wm75 mentioned this issue Apr 13, 2022

Updating tools/pangolin from version 3.1.20 to 4.0.5 galaxyproject/tools-iuc#4494

Merged

ArtPoon mentioned this issue May 3, 2022

Issues arising with Pangolin v4.0.x CoVaRR-NET/duotang#31

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

memory issue with pangolin version 4 and pangoLEARN #395

memory issue with pangolin version 4 and pangoLEARN #395

rrdavis77 commented Apr 1, 2022

kapsakcj commented Apr 2, 2022

fanninpm commented Apr 2, 2022

aineniamh commented Apr 2, 2022

rrdavis77 commented Apr 2, 2022

tseemann commented Apr 3, 2022 •

edited

Loading

aineniamh commented Apr 4, 2022

rgerhards commented Apr 4, 2022

kapsakcj commented Apr 7, 2022

rrdavis77 commented Apr 7, 2022 via email •

edited

Loading

rrdavis77 commented Apr 28, 2022

kapsakcj commented May 2, 2022

kapsakcj commented May 2, 2022

rrdavis77 commented Jun 2, 2022

kapsakcj commented Jun 3, 2022

aineniamh commented Jun 8, 2022

kapsakcj commented Nov 9, 2022

aineniamh commented Nov 10, 2022

memory issue with pangolin version 4 and pangoLEARN #395

memory issue with pangolin version 4 and pangoLEARN #395

Comments

rrdavis77 commented Apr 1, 2022

kapsakcj commented Apr 2, 2022

fanninpm commented Apr 2, 2022

aineniamh commented Apr 2, 2022

rrdavis77 commented Apr 2, 2022

tseemann commented Apr 3, 2022 • edited Loading

aineniamh commented Apr 4, 2022

rgerhards commented Apr 4, 2022

kapsakcj commented Apr 7, 2022

rrdavis77 commented Apr 7, 2022 via email • edited Loading

rrdavis77 commented Apr 28, 2022

kapsakcj commented May 2, 2022

kapsakcj commented May 2, 2022

rrdavis77 commented Jun 2, 2022

kapsakcj commented Jun 3, 2022

aineniamh commented Jun 8, 2022

kapsakcj commented Nov 9, 2022

aineniamh commented Nov 10, 2022

tseemann commented Apr 3, 2022 •

edited

Loading

rrdavis77 commented Apr 7, 2022 via email •

edited

Loading