Clarify the effect of `--ncpus` #86

quantumdot · 2022-12-06T20:54:22Z

Can you clarify the exact effect of the CLI parameter --ncpus? Also, what are the recommended settings when running on a compute cluster? The reason I ask is that my university cluster system administrators keep complaining to me that my modelling jobs are over-subscribing CPU resources.

From what I can tell, the --ncpus parameter was simply passed into the model.resample_model() within moseq2_model.train.util.train_model(), and within the model class (for instance ARWeakLimitStickyHDPHMM, this initiates a joblib.Parallel context with n_jobs=ncpus and the multiprocessing backend.

But I recently found within the autoregressive library, that this code automatically parallelizes computation via openMP and native threads (releasing the python GIL). So I suspect there is a situation where different libraries are simultaneously attempting to parallelize work, and thus oversubscribing the CPU cores.

If I ask slurm for 8 cores and pass --ncpus=8, the job is oversubscribed (8 moseq2-model processes with ~90-150 load/%CPU in top).

if I ask slurm for 8 cores and do not pass --ncpus at all, the job is not oversubscribed but utilized most of the 8 cores (one single moseq2-model process with ~760 load/%CPU in top).

Many of the "batching" command generators (for example generating jobs for kappa scan) incorporate ncpus into both the slurm preamble as well as the moseq2-model command.

The text was updated successfully, but these errors were encountered:

versey-sherry · 2022-12-13T22:27:44Z

The --ncpus flag meant to start joblib.Parallel for multiple processing. I checked the code and indeed there is additional parallelized computation via openMP so it looks like the issue comes from simultaneously having parallel computing set up. @wingillis @calebweinreb is this something we are interested in fixing?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarify the effect of `--ncpus` #86

Clarify the effect of `--ncpus` #86

quantumdot commented Dec 6, 2022

versey-sherry commented Dec 13, 2022

Clarify the effect of --ncpus #86

Clarify the effect of --ncpus #86

Comments

quantumdot commented Dec 6, 2022

versey-sherry commented Dec 13, 2022

Clarify the effect of `--ncpus` #86

Clarify the effect of `--ncpus` #86