CUDA memory requirement scaling with system size #26

yifan-henry-cao · 2023-06-22T17:32:14Z

yifan-henry-cao
Jun 22, 2023

Hi All!

I have been running strong and weak scaling tests using several simple Allegro potentials I fitted earlier (with lmax=2, only 1 allegro layer, and very few parameters). Initially my LAMMPS codes runs fine for small system sizes, however I noticed that the memory requirements of pair-allegro seems to scale linearly with the number of atoms per GPU, and for my case if I put more than ~20,000 atoms per GPU, all of my allegro models give me the following error message, indicating an out of memory issue from CUDA:

RuntimeError: CUDA out of memory. Tried to allocate 6.04 GiB (GPU 0; 31.75 GiB total capacity; 25.06 GiB already allocated; 4.49 GiB free; 25.09 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

However, as I have seen from Fig. 5 of this paper: https://www.nature.com/articles/s41467-023-36329-y. It seems that people have managed to put half a million atoms on a single GPU without problems. I was wondering what am I doing wrong here? Is there a trick I can do to reduce the memory requirement of the simulation?

For reference, I am using LAMMPS (29 Sep 2021 - Update 2) compiled with KOKKOS acceleration for GPUs with CUDA support and serial backend (no OPENMP multithreading). Some of the parameters for the Allegro model fitting are attached:

l_max: 2
parity: o3_full
num_layers: 1
env_embed_multiplicity: 16
two_body_latent_mlp_latent_dimensions: [32, 64, 128]
latent_mlp_latent_dimensions: [128, 128]
edge_eng_mlp_latent_dimensions: [32]

Please let me know if additional information is needed for debugging this issue.

Best,
Yifan Cao

Linux-cpp-lisp · 2023-06-30T20:43:19Z

Linux-cpp-lisp
Jun 30, 2023
Maintainer

Hi Yifan,

Thanks for your interest in our code!

I noticed that the memory requirements of pair-allegro seems to scale linearly with the number of atoms per GPU

Yes, this is correct (see scaling section of the original Allegro paper.) How many atoms/GPU you can fit is obviously a function of how much memory your particular GPUs have available; the experiments in our papers are run on 80GB VRAM NVIDIA A100s. You can also try to use a smaller model, although your model is already quite small. Another critical parameter for memory use is the number of neighbors (i.e. system neighbor density, which is a function of cutoff).

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA memory requirement scaling with system size #26

{{title}}

Replies: 1 comment

{{title}}

Select a reply

CUDA memory requirement scaling with system size #26

yifan-henry-cao Jun 22, 2023

Replies: 1 comment

Linux-cpp-lisp Jun 30, 2023 Maintainer

yifan-henry-cao
Jun 22, 2023

Linux-cpp-lisp
Jun 30, 2023
Maintainer