You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
to install JAX on a GPU node, but am getting a CUDA_ERROR_SYSTEM_NOT_READY error:
(base) $ python3 -c "import jax; jax.numpy.array(0)"
2024-11-12 15:37:25.005059: E external/xla/xla/stream_executor/cuda/cuda_driver.cc:152] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: CUDA_ERROR_SYSTEM_NOT_READY: system not yet initialized
Traceback (most recent call last):
File "/marvel/home/cgmartin/miniforge3/lib/python3.11/site-packages/jax/_src/xla_bridge.py", line 896, in backends
backend = _init_backend(platform)
^^^^^^^^^^^^^^^^^^^^^^^
File "/marvel/home/cgmartin/miniforge3/lib/python3.11/site-packages/jax/_src/xla_bridge.py", line 982, in _init_backend
backend = registration.factory()
^^^^^^^^^^^^^^^^^^^^^^
File "/marvel/home/cgmartin/miniforge3/lib/python3.11/site-packages/jax/_src/xla_bridge.py", line 674, in factory
return xla_client.make_c_api_client(plugin_name, updated_options, None)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/marvel/home/cgmartin/miniforge3/lib/python3.11/site-packages/jaxlib/xla_client.py", line 200, in make_c_api_client
return _xla.get_c_api_client(plugin_name, options, distributed_client)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
jaxlib.xla_extension.XlaRuntimeError: FAILED_PRECONDITION: No visible GPU devices.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/marvel/home/cgmartin/miniforge3/lib/python3.11/site-packages/jax/_src/numpy/lax_numpy.py", line 5426, in array
out_array: Array = lax_internal._convert_element_type(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/marvel/home/cgmartin/miniforge3/lib/python3.11/site-packages/jax/_src/lax/lax.py", line 587, in _convert_element_type
return convert_element_type_p.bind(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/marvel/home/cgmartin/miniforge3/lib/python3.11/site-packages/jax/_src/lax/lax.py", line 2981, in _convert_element_type_bind
operand = core.Primitive.bind(convert_element_type_p, operand,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/marvel/home/cgmartin/miniforge3/lib/python3.11/site-packages/jax/_src/core.py", line 438, in bind
return self.bind_with_trace(find_top_trace(args), args, params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/marvel/home/cgmartin/miniforge3/lib/python3.11/site-packages/jax/_src/core.py", line 442, in bind_with_trace
out = trace.process_primitive(self, map(trace.full_raise, args), params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/marvel/home/cgmartin/miniforge3/lib/python3.11/site-packages/jax/_src/core.py", line 955, in process_primitive
return primitive.impl(*tracers, **params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/marvel/home/cgmartin/miniforge3/lib/python3.11/site-packages/jax/_src/dispatch.py", line 91, in apply_primitive
outs = fun(*args)
^^^^^^^^^^
RuntimeError: Unable to initialize backend 'cuda': FAILED_PRECONDITION: No visible GPU devices. (you may need to uninstall the failing plugin package, or set JAX_PLATFORMS=cpu to skip this backend.)
--------------------
For simplicity, JAX has removed its internal frames from the traceback of the following exception. Set JAX_TRACEBACK_FILTERING=off to include these.
Here's some additional output:
(base) $ echo $CUDA_VISIBLE_DEVICES
0
(base) $ echo $LD_LIBRARY_PATH
(base) $ nvidia-smi
Tue Nov 12 15:38:53 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 565.57.01 Driver Version: 565.57.01 CUDA Version: 12.7 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A100-SXM4-40GB Off | 00000000:07:00.0 Off | 0 |
| N/A 25C P0 43W / 400W | 1MiB / 40960MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
System info (python version, jaxlib version, accelerator, etc.)
2024-11-12 15:36:48.401160: E external/xla/xla/stream_executor/cuda/cuda_driver.cc:152] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: CUDA_ERROR_SYSTEM_NOT_READY: system not yet initialized
Traceback (most recent call last):
File "/marvel/home/cgmartin/miniforge3/lib/python3.11/site-packages/jax/_src/xla_bridge.py", line 896, in backends
backend = _init_backend(platform)
^^^^^^^^^^^^^^^^^^^^^^^
File "/marvel/home/cgmartin/miniforge3/lib/python3.11/site-packages/jax/_src/xla_bridge.py", line 982, in _init_backend
backend = registration.factory()
^^^^^^^^^^^^^^^^^^^^^^
File "/marvel/home/cgmartin/miniforge3/lib/python3.11/site-packages/jax/_src/xla_bridge.py", line 674, in factory
return xla_client.make_c_api_client(plugin_name, updated_options, None)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/marvel/home/cgmartin/miniforge3/lib/python3.11/site-packages/jaxlib/xla_client.py", line 200, in make_c_api_client
return _xla.get_c_api_client(plugin_name, options, distributed_client)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
jaxlib.xla_extension.XlaRuntimeError: FAILED_PRECONDITION: No visible GPU devices.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/marvel/home/cgmartin/miniforge3/lib/python3.11/site-packages/jax/_src/environment_info.py", line 49, in print_environment_info
device info: {xb.devices()[0].device_kind}-{xb.device_count()}, {xb.local_device_count()} local devices"
^^^^^^^^^^^^
File "/marvel/home/cgmartin/miniforge3/lib/python3.11/site-packages/jax/_src/xla_bridge.py", line 1094, in devices
return get_backend(backend).devices()
^^^^^^^^^^^^^^^^^^^^
File "/marvel/home/cgmartin/miniforge3/lib/python3.11/site-packages/jax/_src/xla_bridge.py", line 1028, in get_backend
return _get_backend_uncached(platform)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/marvel/home/cgmartin/miniforge3/lib/python3.11/site-packages/jax/_src/xla_bridge.py", line 1007, in _get_backend_uncached
bs = backends()
^^^^^^^^^^
File "/marvel/home/cgmartin/miniforge3/lib/python3.11/site-packages/jax/_src/xla_bridge.py", line 912, in backends
raise RuntimeError(err_msg)
RuntimeError: Unable to initialize backend 'cuda': FAILED_PRECONDITION: No visible GPU devices. (you may need to uninstall the failing plugin package, or set JAX_PLATFORMS=cpu to skip this backend.)
The text was updated successfully, but these errors were encountered:
Description
I used
to install JAX on a GPU node, but am getting a
CUDA_ERROR_SYSTEM_NOT_READY
error:Here's some additional output:
System info (python version, jaxlib version, accelerator, etc.)
The text was updated successfully, but these errors were encountered: