Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework CUDA/native-library setup and diagnostics #1041

Merged
merged 3 commits into from
Mar 13, 2024

Conversation

akx
Copy link
Contributor

@akx akx commented Feb 6, 2024

This PR reworks the native-library setup code that was partially enmeshed with the diagnostics code. I'm sorry it's all in one big commit, this was a bit too hard to pull into smaller ones so it still made sense 😓

The main thing is that cuda_setup/ is gone; in its place are

  • cuda_specs, which is a simple dataclass containing information about the current cuda environment, and the native-library loading code is now in cextension.py. (This would be further reworked when more backends are introduced, of course; see Enable common device abstraction for 8bits/4bits #898.)
    • cextension introduces a minimal proxy object for the ctypes.CDLL (a precursor to a Backend), so a global CUDA-specific constant COMPILED_WITH_CUDA is not needed.
  • diagnostics/, which is a separate thing that's being run by python -m bitsandbytes.
    • There was overlapping DLL/.so discovering code in both cuda_setup and __main__.py; I got rid of the latter implementation in favor of the stuff that had already been there in cuda_setup. As far as I could tell, though, that code was only ever used to sniff around for CUDA runtime libraries, but that information was only shown to the user as a diagnostic.
      • Finding the libraries was made a bit looser; anything nvcuda or libcudart now smells like CUDA.
    • I think the new diagnostics are pretty much on par with the old ones – definitely not bit-by-bit exact though. I don't presently have a machine with broken CUDA, so I can't really tell. (I think the formatting should be made more GitHub friendly though; the ######## headers turn into monstrous Markdown headings. That's for another PR, though.)

Additionally, this PR trivially enables the Mac libbitsandbytes_cpu.dylib native library to be found by adding it to the DYNAMIC_LIBRARY_SUFFIX mapping.

I tested locally that:

  • on my Mac, no tests fail and the library gets loaded as expected
  • on my WSL2 machine, libbitsandbytes_cuda121_nocublaslt.so gets loaded as expected and tests pass as before.

This is still in no way perfect, but it's better IMO :)

Refs #918

Copy link

github-actions bot commented Feb 6, 2024

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@rickardp
Copy link
Contributor

rickardp commented Feb 6, 2024

Really nice refactoring, much easier to follow the code. I would argue the setup code should embrace further device abstraction, but I think this is being worked on already and this looks like a really nice improvement!

Copy link
Collaborator

@younesbelkada younesbelkada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this huge work and refactor ! 🙏
I quickly tried this branch out, and I managed to compile bnb on your branch, but when calling python -m bitsandbytes I get:

Could not find the bitsandbytes CUDA binary at PosixPath('/bitsandbytes/bitsandbytes/libbitsandbytes_cuda121.so')
Could not load bitsandbytes native library: /bitsandbytes/bitsandbytes/libbitsandbytes_cpu.so: cannot open shared object file: No such file or directory
Traceback (most recent call last):
  File "/bitsandbytes/bitsandbytes/cextension.py", line 110, in <module>
    lib = get_native_library()
  File "/bitsandbytes/bitsandbytes/cextension.py", line 97, in get_native_library
    dll = ct.cdll.LoadLibrary(str(binary_path))
  File "/opt/conda/envs/peft/lib/python3.8/ctypes/__init__.py", line 451, in LoadLibrary
    return self._dlltype(name)
  File "/opt/conda/envs/peft/lib/python3.8/ctypes/__init__.py", line 373, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /bitsandbytes/bitsandbytes/libbitsandbytes_cpu.so: cannot open shared object file: No such file or directory

CUDA Setup failed despite CUDA being available. Please run the following command to get more information:

python -m bitsandbytes

Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them
to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++ BUG REPORT INFORMATION ++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++ OTHER +++++++++++++++++++++++++++
CUDA specs: CUDASpecs(highest_compute_capability=(8, 0), cuda_version_string='121', cuda_version_tuple=(12, 1))
PyTorch settings found: CUDA_VERSION=121, Highest Compute Capability: (8, 0).
Could not find the bitsandbytes CUDA binary at PosixPath('/bitsandbytes/bitsandbytes/libbitsandbytes_cuda121.so')
Traceback (most recent call last):
  File "/opt/conda/envs/peft/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/opt/conda/envs/peft/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/bitsandbytes/bitsandbytes/__main__.py", line 4, in <module>
    main()
  File "/bitsandbytes/bitsandbytes/diagnostics/main.py", line 44, in main
    print_cuda_diagnostics(cuda_specs)
  File "/bitsandbytes/bitsandbytes/diagnostics/cuda.py", line 112, in print_cuda_diagnostics
    if not binary_path.exists():
AttributeError: 'NoneType' object has no attribute 'exists'

Any idea what I did wrong and how to fix this?

Copy link
Collaborator

@younesbelkada younesbelkada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Below will solve the error but I am still not able to compile and use bnb on a docker image

bitsandbytes/diagnostics/cuda.py Show resolved Hide resolved
@younesbelkada
Copy link
Collaborator

@akx managed to make it work following instructions from :#1042

@akx akx force-pushed the cuda-wagh branch 2 times, most recently from 116c910 to 42edea0 Compare February 7, 2024 09:06
@akx
Copy link
Contributor Author

akx commented Feb 7, 2024

@younesbelkada Rebased and fixed your excellent catch :)

Funnily enough this still works fine on my machine even if the diagnostics say

CUDA SETUP: WARNING! CUDA runtime files not found in any environmental path.

since what really matters for linkage (on Linux) is the ldconfig cache:

(venv) akx@AEON ~/bitsandbytes (cuda-wagh)> ldconfig -p | grep libcudart
	libcudart.so.12 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcudart.so.12
	libcudart.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcudart.so

– I think the env path diagnostics should probably be reworked too.

@Titus-von-Koeller
Copy link
Collaborator

Ok, this is looking really good so far. We're planning to merge #898 after a final review from Tim and me this end of week. After that I would like to merge this one here and #1060.

Also, I see a bunch of PRs from @rickardp: Any particular order you want these merged/ looked at? Please make me aware of any dependencies/ conflicts between all mentioned PRs, if you're aware of them or if you think anything should come first.

Thanks again all for the valuable work your contributing :)

@rickardp
Copy link
Contributor

The most important one of my PRs right now is #1050 but no one should block this one. I try to stay clear of the Python code as I know there's a lot going on there right now

@akx akx mentioned this pull request Feb 15, 2024
3 tasks
@akx
Copy link
Contributor Author

akx commented Mar 4, 2024

Rebased, incorporating #1064.

@akx
Copy link
Contributor Author

akx commented Mar 5, 2024

So yeah, @Titus-von-Koeller, this is again mergeable 😅

@akx
Copy link
Contributor Author

akx commented Mar 8, 2024

Works on my Windows machine too (though the output really does need some cleaning, as noted in the PR description).

(a) F:\temp\a>uv pip list
Package           Version
----------------- ------------
bitsandbytes      0.44.0.dev0
filelock          3.9.0
fsspec            2023.4.0
jinja2            3.1.2
markupsafe        2.1.3
mpmath            1.3.0
networkx          3.2.1
numpy             1.26.3
pillow            10.2.0
sympy             1.12
torch             2.2.1+cu121
torchaudio        2.2.1+cu121
torchvision       0.17.1+cu121
typing-extensions 4.8.0

(a) F:\temp\a>python -m bitsandbytes
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++ BUG REPORT INFORMATION ++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++ OTHER +++++++++++++++++++++++++++
CUDA specs: CUDASpecs(highest_compute_capability=(6, 1), cuda_version_string='121', cuda_version_tuple=(12, 1))
PyTorch settings found: CUDA_VERSION=121, Highest Compute Capability: (6, 1).
To manually override the PyTorch CUDA version please see: https://github.com/TimDettmers/bitsandbytes/blob/main/docs/source/nonpytorchcuda.mdx
WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU!
If you run into issues with 8-bit matmul, you can try 4-bit quantization:
https://huggingface.co/blog/4bit-transformers-bitsandbytes
The directory listed in your path is found to be non-existent: ...
Found duplicate CUDA runtime files (see below).

We select the PyTorch default CUDA runtime, which is 12.1,
but this might mismatch with the CUDA version that is needed for bitsandbytes.
To override this behavior set the `BNB_CUDA_VERSION=<version string, e.g. 122>` environmental variable.

For example, if you want to use the CUDA version 122,
    BNB_CUDA_VERSION=122 python ...

OR set the environmental variable in your .bashrc:
    export BNB_CUDA_VERSION=122

In the case of a manual override, make sure you set LD_LIBRARY_PATH, e.g.
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.2,
* Found CUDA runtime at: c:\windows\system32\nvcuda.dll
* Found CUDA runtime at: c:\windows\system32\nvcudadebugger.dll
* Found CUDA runtime at: C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common\cudart64_65.dll
* Found CUDA runtime at: C:\WINDOWS\system32\nvcuda.dll
* Found CUDA runtime at: C:\WINDOWS\system32\nvcudadebugger.dll
* Found CUDA runtime at: C:\Windows\System32\nvcuda.dll
* Found CUDA runtime at: C:\Windows\System32\nvcudadebugger.dll
* Found CUDA runtime at: C:\WINDOWS\system32\nvcuda.dll
* Found CUDA runtime at: C:\WINDOWS\system32\nvcudadebugger.dll
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++ DEBUG INFO END ++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Checking that the library is importable and CUDA is callable...
SUCCESS!
Installation was successful!

Comment on lines +1 to +12
from pathlib import Path
import platform

DYNAMIC_LIBRARY_SUFFIX = {
"Darwin": ".dylib",
"Linux": ".so",
"Windows": ".dll",
}.get(platform.system(), ".so")

PACKAGE_DIR = Path(__file__).parent
PACKAGE_GITHUB_URL = "https://github.com/TimDettmers/bitsandbytes"
NONPYTORCH_DOC_URL = "https://github.com/TimDettmers/bitsandbytes/blob/main/docs/source/nonpytorchcuda.mdx"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

much cleaner like this, nice! also especially like the .get(platform.system(), ".so")

import torch


@dataclasses.dataclass(frozen=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, super nice and clean, the whole module! Really like how you factor out everything, much easier to follow and maintain now

return PACKAGE_DIR / library_name


class BNBNativeLibrary:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this refactor here, I also find very useful, much more pythonic and maintainable. thanks!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

of course there's still a lot of work, but this is a significant step forward

@Titus-von-Koeller
Copy link
Collaborator

Ok, I just went through everything one more time and it's looking great, as always.

Thanks for providing tweaks for the things that came up in yesterdays review and that we discussed in Slack this morning.

I ran the test suite and despite the usual flakiness, there's only one issue:

――――――――――――――――――――――――――――――――――― test_manual_override ――――――――――――――――――――――――――――――――――――

requires_cuda = True

    def test_manual_override(requires_cuda):
        manual_cuda_path = str(Path('/mmfs1/home/dettmers/data/local/cuda-12.2'))
    
        pytorch_version = torch.version.cuda.replace('.', '')
    
        assert pytorch_version != 122  # TODO: this will never be true...
    
        os.environ['CUDA_HOME']='{manual_cuda_path}'
        os.environ['BNB_CUDA_VERSION']='122'
        #assert str(manual_cuda_path) in os.environ['LD_LIBRARY_PATH']
        import bitsandbytes as bnb
>       loaded_lib = bnb.cuda_setup.main.CUDASetup.get_instance().binary_name
E       AttributeError: module 'bitsandbytes' has no attribute 'cuda_setup'

tests/test_cuda_setup_evaluator.py:20: AttributeError

The Transformer integration tests also came through clean, not that I was expecting an issue, but just ran them to be sure.

I'm not sure to what extent this test is actually useful. Wdyt? We could also delete it, in case it doesn't.

Anyways, maybe that's another thing to discuss in dev corner, to what extent we would like test coverage for this setup stuff and to what extent that is even practically testable and if this is useful or not.

Otherwise, this is ready to merge. Great work! Really glad to have you on board at BNB ❤️

@akx
Copy link
Contributor Author

akx commented Mar 13, 2024

@Titus-von-Koeller Ah, good catch, yeah. Fixing that test actually uncovered a bug regarding the override mechanism (it was using an unused binary_name) :)

@akx akx requested a review from Titus-von-Koeller March 13, 2024 11:27
@Titus-von-Koeller Titus-von-Koeller merged commit fd723b7 into bitsandbytes-foundation:main Mar 13, 2024
23 checks passed
@Titus-von-Koeller
Copy link
Collaborator

Well, good that we caught it! Nice additions to the tests as well.

Thanks again for the great work on this cleanup!

Glad to merge.

Atry added a commit to Atry/nix-ml-ops that referenced this pull request Aug 3, 2024
- This PR removes $CUDA_HOME/lib from $LD_LIBRARY_PATH because bitsandbytes can follow PyTorch's CUDA runtime since bitsandbytes-foundation/bitsandbytes#1041, and setting $LD_LIBRARY_PATH for `bitsandbytes` is not necessary any more
- Hard-coded `/run/opengl-driver/lib` are removed because nix-gl-host can print the driver path for both Ubuntu and NixOS since numtide/nix-gl-host#16
- `link-nvidia-drivers.nix` is removed because drivers are instead copied to a cache directory by nix-gl-host
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants