-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REQUEST] Support for XLA/TPU #6901
Comments
@radna0, this sounds exciting. It is awesome that you are willing to lead the integration. We will be glad to collaborate with you to make this happen. |
Thank you @tjruwase . I currently have a setup somewhat running. Here are some things I did and some roadblocks I have. Basics:I added Roadblock 1:When it comes to the DeepSpeed launcher. XLA only allows one single parent process when doing muiti process spawn, but when using DeepSpeed launcher, For example Run with: so it would just act/behave like the following for Roadblock 2:After getting to launcher.py, the processes are spawned with I haven't tested out other methods more clearly like UpdatesI'm currently testing a training script to see if multiprocess, pipeline parallelism, etc works properly, but will update if I have any findings. |
I am curious about the implementation of the above. Are you handling when I assume you have seen these tutorials: |
Does it make sense to add the spawn method into the accelerator abstraction class? |
Another idea for roadblock 1 is to add |
No I'm not handling when accelerator is none. and yes when not available it defaults to CPU or other accelerator. I will add a check for xla I haven't seen those tutorials. But my implementation of the xla_accelerator is based on the cpu and cuda accelerator. So it is actually compatible and working just as the tutorial suggested. As for the setup guide for Pytorch XLA, I can document it as I open my PR to merge the changes. Or I can do that right away if needed? How can I do so?
Yes, I think this would be ideal, I'm currently just checking the accelerator on the fly within the launcher.py to handle spawning processes accordingly.
I'm not sure if I'm doing it correctly, I created
|
Hmm, that is strange. Can you examine this code to see what is wrong? DeepSpeed/deepspeed/launcher/runner.py Lines 593 to 600 in eea5304
|
Try adding |
@tjruwase I tried all kinds of commands, all just uses CPU, and with Also for now I'm leaving out the xla check |
Agreed. Will fix. |
CPU should be the fall-through if no accelerator is detected or specified. So, it seems your use case is exposing an issue here. Is it possible to share a stack trace of the failure? |
There are two ways we can try setting accelerator_name for xla:
With
|
Is your feature request related to a problem? Please describe.
Currently, DeepSpeed lacks support for TPUs via the XLA backend. This limits the use of DeepSpeed's advanced parallelism techniques, such as pipeline parallelism and ZeRO optimizations, for TPU users. Frameworks like PyTorch/XLA and Accelerate offer TPU support, but they lack the comprehensive optimization features that DeepSpeed provides.
This is particularly frustrating for users who want to scale models efficiently on TPUs across multiple nodes while leveraging DeepSpeed's features.
Describe the solution you'd like
I propose integrating XLA as a backend for DeepSpeed, enabling TPU users to take advantage of DeepSpeed's optimizations, including pipeline parallelism, ZeRO, and advanced scheduling mechanisms.
Describe alternatives you've considered
Additional context
There is growing interest in having TPU support in DeepSpeed, as evidenced by multiple community requests. Adding XLA as a backend would make DeepSpeed accessible to a wider audience, particularly researchers and engineers working with TPUs.
I'm willing to lead this integration if needed, and I hope this request sparks discussion and collaboration within the DeepSpeed team.
Link to Pytorch XLA Feature Request: pytorch/xla#8514 (comment)
The text was updated successfully, but these errors were encountered: