Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Diagnose Why Sunspot Doesn't Run PyTorch Lightning #91

Open
3 of 5 tasks
WardLT opened this issue Apr 3, 2024 · 0 comments
Open
3 of 5 tasks

Diagnose Why Sunspot Doesn't Run PyTorch Lightning #91

WardLT opened this issue Apr 3, 2024 · 0 comments

Comments

@WardLT
Copy link
Member

WardLT commented Apr 3, 2024

I'm yet to get our training component test to run on Sunspot. A few things that could be the reason or a solution

  • That we are using PyTorch 2 on Sunspot, and Difflinker was built using PyTorch 1.13. Not the answer. Training works on CPU on Sunspot with 2, and on CUDA on my desktop on 2.
  • My (feeble) attempt at a XPU wrapper is insufficient. I got the same error with my version and Corey's (better) implementation
  • Intel's or Corey's fork of Lightning could work
  • There is an missing .to(device) somewhere in the workflow
  • Something is wrong with XPU Pytorch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant