Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How much GPU memory needed in general? #8

Open
sheng19331 opened this issue Nov 18, 2024 · 10 comments
Open

How much GPU memory needed in general? #8

sheng19331 opened this issue Nov 18, 2024 · 10 comments

Comments

@sheng19331
Copy link

Great job! Thanks for sharing the tool.
Do you have recommendations for the GPU memory to run a prediction? I was trying to run the prediction in the examples with 4090(24GB), but it failed with 'ran out of the memory'. Is it possible to run a prediction with boltz-1 by such GPU?
Thanks.

@RuikangSun
Copy link

I failed with RTX 3090 in my lab, but surprisingly succeed with CPU mode.

@hazirliver
Copy link

I also failed on 24GB (RTX 4090), but succeeded with RTX A6000 (48GB). The peak memory consumption during the run for example run ligand.fasta was approximately 33G

@jwohlwend
Copy link
Owner

Hi all, yes the example file is actually fairly large. I'll make a smaller one. We'll be adding an option today to lower memory consumption with bit of slowdown as tradeoff. Will report back here when it's on the main branch!

@sheng19331
Copy link
Author

Hi all, yes the example file is actually fairly large. I'll make a smaller one. We'll be adding an option today to lower memory consumption with bit of slowdown as tradeoff. Will report back here when it's on the main branch!

Thanks for the prompt feedback. That would be great to add an option to adjust the memory consumption for the GPU with a low memory size!

@aggelos-michael-papadopoulos

In my RTX 3090 takes about 11GB and it works fine

@MKCarter
Copy link

My RTX4090, with 16GB seems to handle this fine:

boltz predict --out_dir test/ test/ --diffusion_samples 5
Downloading data and model to /home/michael/.boltz. You may change this by setting the --cache flag.
Checking input data.
Processing input data.
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 26.44it/s]
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
/home/michael/miniconda3/envs/boltz/lib/python3.12/site-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py:75: Starting from v1.9.0, `tensorboardX` has been removed as a dependency of the `pytorch_lightning` package, due to potential conflicts with other packages in the ML ecosystem. For this reason, `logger=True` will use `CSVLogger` as the default logger, unless the `tensorboard` or `tensorboardX` packages are found. Please `pip install lightning[extra]` or one of them to enable TensorBoard support by default
You are using a CUDA device ('NVIDIA GeForce RTX 4090 Laptop GPU') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
Predicting DataLoader 0: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:27<00:00,  0.04it/s]Number of failed examples: 0
Predicting DataLoader 0: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:27<00:00,  0.04it/s]

Gives a usage level of this:

nvidia-smi
Mon Nov 18 15:40:10 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.03              Driver Version: 560.35.03      CUDA Version: 12.6     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4090 ...    Off |   00000000:01:00.0 Off |                  N/A |
| N/A   42C    P0             33W /  150W |    1423MiB /  16376MiB |     50%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A      2542      G   /usr/lib/xorg/Xorg                              4MiB |
|    0   N/A  N/A     90421      C   ...el/miniconda3/envs/boltz/bin/python       1380MiB |
+-----------------------------------------------------------------------------------------+

This is only running for a single protein seqeunce and ligand, but I imagine it would be fine with multiple chains and ligands.

@jadolfbr
Copy link

The example failed on a 24GB machine for me as well: https://instances.vantage.sh/aws/ec2/g5.2xlarge

Downloading data and model to /home/jadolfbr/.boltz. You may change this by setting the --cache flag.
Checking input data.
Processing input data.
Predicting DataLoader 0:   0%|          | 0/1 [00:00<?, ?it/s]| WARNING: ran out of memory, skipping batch
Predicting DataLoader 0: 100%|██████████| 1/1 [00:02<00:00,  0.39it/s]Number of failed examples: 1
Predicting DataLoader 0: 100%|██████████| 1/1 [00:02<00:00,  0.39it/s]

@moritztng
Copy link

I failed with RTX 3090 in my lab, but surprisingly succeed with CPU mode.

What was the input and how long did it take?

@moritztng
Copy link

The weights are only 6.5gb but when predicting with CPU it uses up to 30GB RAM. What is taking so much memory?

@RuikangSun
Copy link

I failed with RTX 3090 in my lab, but surprisingly succeed with CPU mode.

What was the input and how long did it take?

1 receptor and 1 ligand, maybe a quarter or a few quarters?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants