Unconditional image generation with accelerate and train_unconditional.py crashes with MPS but works on CPU #6568
-
I am playing around with all those code examples from https://huggingface.co/docs/diffusers and most of them basically work quite fine in my testing Jupyter environment. Except this one: I created a config file for accelerate:
And then I am trying to train on some images, just for hello-world-purposes:
It works, but on my local machine apparently pretty slow. So I try to switch to MPS support:
But now the process crashes:
At this point:
Any guesses? |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 1 reply
-
Hmm maybe cc @pcuenca here |
Beta Was this translation helpful? Give feedback.
-
If you check the error stack, you see it comes from
Debugging shows, that device enters the function as "cpu", which is indeed wrong. The following line in
With the following comment:
Hm, well. When I pass the device in the function call, it works:
Long story short: Setting the desired argument works. I don't know if the argument is left out on purpose, so I better keep this local. |
Beta Was this translation helpful? Give feedback.
-
I found another thing that does not really make sense to me and that is probably the actual root cause: In most tutorials the generator object is instantiated from torch.Generator(). Passing this as a generator to torch.randn works fine on MPS, as long as you pass device correctly, which is a little confusing: This works fine:
If you not pass device to either the generator or the randn function, you get one of those errors - apperently, because CPU seems to be default:
(The one I am struggingling with!) Thing now is, that you must consider when reading through some articles, you cannot do the exact same with manual_seed: print(torch.randn((3,3), generator=torch.manual_seed(42), device="mps")) This also leads to (apparently as explained above):
But how do you get reproducable random numbers to your MPS device? Some articles cover that, you just combine both:
To wrap it up:
So... the line that causes the error:
just does not take into consideration the several ways of how the generator can and will be passed through to it. |
Beta Was this translation helpful? Give feedback.
-
Thanks a lot @nickyreinert, RNG in |
Beta Was this translation helpful? Give feedback.
If you check the error stack, you see it comes from
utils/torch_utils.py
>def randn_tensor
and this linelatents = torch.randn(shape, generator=generator, device=rand_device, dtype=dtype, layout=layout).to(device)
Debugging shows, that device enters the function as "cpu", which is indeed wrong.
The following line in
pipelines/ddpm/pipeline_ddpm.py
calls therandn_tensor
:image = randn_tensor(image_shape, generator=generator)
With the following comment:
# randn does not work reproducibly on mps
Hm, well. When I pass the device in the function call, it works:
image = randn_tensor(image_shape, generator=generator, device=self.device)
Long story short: Setting the desired argument works.
I …