-
Notifications
You must be signed in to change notification settings - Fork 569
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Model does not run correctly on non A100/H100 GPUs #59
Comments
if you're interested in stats, works fine on NVIDIA GeForce RTX 3080, Driver Version: 535.183.01 |
Could you be specific about what was not working on your side on the V100? How to recognize that there is a problem? Is non-sense structure THE indicator of numerical inaccuracies? The mentioned post with non-sense structures on Quadro 4000 and RTX 2060S were not done in your docker environment... Because on my side, predictions look perfect on an old Quadro P3000 6GB for several proteins and ligand complexes (i.e. on a 6 years-old Thinkpad laptop and a mobile GPU with compute < 8.0),. Also works great on RTX-3090. Other than non-sense structures, what other observation could indicate that we have numerical inaccuracy? Is there a controlled test we could do to identify potential numerical inaccuracy in our setup? |
The nonsense structure is the indicator of the problem here - output will look almost random. The problem appears related to bfloat16, which is not supported on older GPU. We will continue to investigate next week. Interesting to know that it does work on some older GPU, thanks for the report. Even if the major issue under investigation here isn't present, please note we have not done any large scale numerical verification of outputs on devices other than A100/H100. |
Thank you for the precision @joshabramson. I will watch for "exploded" structures, and report the specifics if ever it happens on one of my GPUs. The P3000 definitely does not support natively BF16 (CUDA capability 6.1). I guess it emulates it via float32 compute. Since it is quite probable that several people will try to run AF3 on their available hardware, here are some details of my setup where it works perfect so far. Number of tokens (12 runs so far on that GPU) : 167-334 tokens, so largest bucket size tested was 512. Largest test: Typical inference speed for < 256 tokens : 150-190 seconds per seed (so typically less than 3 minutes for < 256 tokens) GPU : Quadro P3000, Pascal architecture, Computer Capability = 6.1 (ThinkPad P71 laptop) Docker : default setup, NOT using unified memory
nvidia-smi
nvcc -V
deviceQuery
neofetch
|
We ran the "2PV7" example from the docs on all GPU models available on our cluster with the following results:
Specifically, a ranking score of -99 corresponds to noise/explosion, and a ranking score of 0.67 corresponds to a visually compelling output structure. Update (20.11): added driver/cuda versions reported by nvidia-smi. |
Thanks @jurgjn, this is incredibly useful information! These are the GPU capabilities (see https://developer.nvidia.com/cuda-gpus) for the GPUs mentioned:
Looks like anything with GPU capability < 8.0 produces bad results. |
Just to add one more piece of info, I am using a RTX A6000 (capability 8.6) and so far all looks well. |
RTX A5000 (capability 8.6) works well too |
Could more people test with capability 6.x? Based on the result above from @smg3d, it looks that maybe only capability 7.x is broken, while 6.x (and >8.0) might be fine. I.e. current theory:
|
I wonder if it could be a driver effect? I noticed several people are mentioning they are using older driver. Might be useful to know which driver and Cuda @jurgjn was using on his system. I was using Driver 560.35.03 and Cuda V12.6.77 (Actually just upgraded to driver 565 today). |
I could now try AF3 on a Quadro P4000 (Pascal) and like @smg3d reported for P3000, on this GPU it works. This test was done with the same driver and cuda versions (565.57.01, cuda_12.5.r12.5) as the tests on RTX 2060S (Turing) and Quadro RTX 4000 (Turing). |
V100 also meet "exploded" structures NVIDIA-SMI 560.35.03 Driver Version: 560.35.03 CUDA Version: 12.6 |
Quadro RTX 8000 also got exploding structures Driver Version 555.42.06, CUDA Version 12.5 |
I can confirm that it runs well on P100 (capability 6.0). So far it has been confirmed that it runs well on the following 6.x Capability:
And so far there has been no reports of "exploded structures" on 6.x Capability. |
- Add explicit check for compute capability < 6.0 - Keep check for range [7.0, 8.0) - Update error message to clarify working versions (6.x and 8.x) - Addresses issue google-deepmind#59
I think it would be good for users to be able to use AlphaFold3 on Pascal GPUs (without requiring them to modify code). The data on this issue strongly suggest that the "exploded structures" problem does not affect Pascal GPUs (compute capability 6.x). Moreover, there are still several clusters with P100s, and these often have 0 or very short wait time (compared to the A100s). For example, on one of the Canadian national clusters, AF3 jobs on P100 currently start immediately, whereas jobs on the A100 (on the same cluster) often have 10-30 minutes wait time in the queue. So for a single inference job on small-medium size protein complexes, we get our predictions back much faster with the P100, despite the inference being ~5x slower (358 sec vs 73 sec on the tested dimer). I tested and submitted a small PR to allow Pascal GPUs to run without raising the error message. |
A note from us at Google DeepMind:
We have now tested accuracy on V100 and there are serious issues with the output (looks like random noise). Users have reported similar issues with RTX 2060S and RTX Quadro 4000.
For now the only supported and tested devices are A100 and H100.
The text was updated successfully, but these errors were encountered: