You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I use Rancher 2.5.9 to build my cluster, I think the installation steps are correct since it worked on another cluster which I use A100 40G, however, it fails on this cluster using A100 80G.
nvidia-smi gives the correct result.
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02 Driver Version: 470.57.02 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA A100 80G... Off | 00000000:00:08.0 Off | 0 |
| N/A 39C P0 60W / 300W | 0MiB / 80994MiB | 14% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Plugin cannot find my A100 80G
I use Rancher 2.5.9 to build my cluster, I think the installation steps are correct since it worked on another cluster which I use A100 40G, however, it fails on this cluster using A100 80G.
nvidia-smi gives the correct result.
But no gpu in cluster
I tried to find the reason, this is the log of the Pod for the plugin.
Any idea how this happen ? Is that possible the plugin does not support A100 80G ?
The text was updated successfully, but these errors were encountered: