-
Notifications
You must be signed in to change notification settings - Fork 645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] cross-platform: Refactoring bitsandbytes/cuda_setup #918
Comments
Hey, one thing to clarify. The device abstraction proposed in #894 and the PR #898 are not only about CPU quantization but also for Intel GPU. Therefore, we tried to abstract out the interfaces that the backend needs to implement and also provided a registration mechanism for supporting new devices. |
This is really exciting!
GPU runners now appear to be available on GitHub Actions and I can't think of a more useful project to begin trialling their usage. There is a lot of complexity regarding the number of different CUDA versions, hardware and other dependencies. For now one goal could be to get the test suite automated and running cleanly for a limited combination of CUDA versions/hardwares and then building out the build/deployment process from there. |
Nice to see some initiative on this again. Here's a few points from me.
As for some of the details / tech choices on portability, please see prior discussion in #252 and #485 as there was a lot of good discussions already. @Titus-von-Koeller I understand that this issue might be specific to the cuda_setup, but I think that if the decision is made to build true binary wheels (see bullet 2/2b above), a lot of the complexity of the CUDA set up goes away. So my proposal would be start by agreeing that this library will be distributed this way, and load the CUDA kernels for the CUDA runtime supplied by PyTorch (querying PyTorch for the details, IIRC this is possibly but my knowledge here is 6 months old so I might be wrong here). I'm happy to spend some time reviving/rebasing/refactoring any of the work on PR #257 that are of interest to the community, but I would like to get some commit from a maintainer that this actually has a chance of getting merged so it's not just a dead end. |
Yes, getting build + testing automated is quite high up our list. I also saw that Github blog post about GPU runners (this is still in beta) and already signed up for the beta in December, but we didn't get selected. Atm, the only way to get GPU runners is to self-host them, which in our case would mean we would need to spin them up in the cloud on demand. However, we decided that the engineering effort to get that working is currently better targeted at more pressing / high impact matters. Hugging Face is willing to support us with compute costs, once we decide to move ahead with this. If anyone is willing to contribute / collaborate on this topic, please let me know and we can figure out how/when to move forward. |
@Titus-von-Koeller, I would like to find time to contribute. It's awesome that HF has capacity to support this! Do let me know how the project would like to proceed - once there's a plan we can start to chip away at elements of it 🙂 |
Ah, I wasn't even aware of this conversation before opening #996 :) |
Ok, after merging #1041 (thanks @akx, this is really bringing us a step forward!), we should re-asses where we would like to head with this. Seems @matthewdouglas and @rickardp also had quite a few opinions on the topics. If everyone could just spell out a bit what they think is important going forward, this would be quite helpful in distilling things down to something concrete. Please let me know what you think. |
Archiving this, because it's out of date and we ended up favoring other modes of interaction to coordinate. |
Summary
This RFC aims to discuss and gather community input on refactoring the
bitsandbytes/cuda_setup
module. The goal is to enhance its functionality, simplify the user experience across different hardware and operating systems, and prepare it for upcoming device support expansions.Background
bitsandbytes
has become instrumental in democratizing AI, thanks to its deep integration with hardware. Despite millions of monthly downloads, a fraction of users encounter issues, such as those detailed in #914. Our objective is to makebitsandbytes
as easily usable (e.g. as easy aspip install bitsandbytes
andload_in_4bit=True
) as possible while mostly hiding the complexities of the software-hardware boundary under the hood, maintaining the ease of installation and use, while improving error reporting and handling.Current Challenges
Setup Module Issues
python -m bitsandbytes
: This feature, intended to simplify debugging, sometimes presents similar tracebacks for different underlying issues, causing confusion in the issue threads.bitsandbytes
itself, but from user-side issues with CUDA installations, environment settings (e.g.LD_LIBRARY_PATH
), or hardware configurations.Diverse Hardware Landscape
Operating System Variability
Proposed Improvements
cuda_setup
: Enhancing code quality and clarity to better handle the diverse hardware and OS scenarios.Call to Action
We invite the community to provide feedback and suggestions on the following:
cuda_setup
module.Timeline and Milestones
We'll take an incremental take on improving the setup module. The more actionable and commonly agreed, the quicker we can implement.
Contribution and Feedback Mechanism
Please share your thoughts, suggestions, and feedback in the thread below.
Summary and Next Steps
This RFC serves as a starting point to get feedback and coordinate the collaborative effort to refine
bitsandbytes
's setup process. We aim to address the current challenges, embrace the diversity of hardware and operating systems, and build a robust, user-friendly setup. Your participation in this process is crucial, and we look forward to your valuable input.The text was updated successfully, but these errors were encountered: