Unconditional image generation with accelerate and train_unconditional.py crashes with MPS but works on CPU #6568

nickyreinert · 2024-01-13T22:37:47Z

nickyreinert
Jan 13, 2024

I am playing around with all those code examples from https://huggingface.co/docs/diffusers and most of them basically work quite fine in my testing Jupyter environment.

Except this one:

I created a config file for accelerate:

compute_environment: LOCAL_MACHINE
debug: false
distributed_type: 'NO'
downcast_bf16: 'no'
machine_rank: 0
main_training_function: main
mixed_precision: fp16
num_machines: 1
num_processes: 1
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
**use_cpu: true**

And then I am trying to train on some images, just for hello-world-purposes:

accelerate launch --config_file "accelerate.config" train_unconditional.py --train_data_dir "data"

It works, but on my local machine apparently pretty slow. So I try to switch to MPS support:

**use_cpu: false**

But now the process crashes:

RuntimeError: Placeholder storage has not been allocated on MPS device!

At this point:


  File "/finetuning/.venv/lib/python3.9/site-packages/diffusers/utils/torch_utils.py", line 80, in randn_tensor
    latents = torch.randn(shape, generator=generator, device=rand_device, dtype=dtype, layout=layout).to(device)

Any guesses?

Answered by nickyreinert

Jan 15, 2024

If you check the error stack, you see it comes from utils/torch_utils.py > def randn_tensor and this line

latents = torch.randn(shape, generator=generator, device=rand_device, dtype=dtype, layout=layout).to(device)

Debugging shows, that device enters the function as "cpu", which is indeed wrong.

The following line in pipelines/ddpm/pipeline_ddpm.pycalls the randn_tensor:

image = randn_tensor(image_shape, generator=generator)

With the following comment:

# randn does not work reproducibly on mps

Hm, well. When I pass the device in the function call, it works:

image = randn_tensor(image_shape, generator=generator, device=self.device)

Long story short: Setting the desired argument works.

I …

View full answer

patrickvonplaten · 2024-01-15T14:46:51Z

patrickvonplaten
Jan 15, 2024

Hmm maybe cc @pcuenca here

0 replies

nickyreinert · 2024-01-15T22:41:42Z

nickyreinert
Jan 15, 2024
Author

If you check the error stack, you see it comes from utils/torch_utils.py > def randn_tensor and this line

latents = torch.randn(shape, generator=generator, device=rand_device, dtype=dtype, layout=layout).to(device)

Debugging shows, that device enters the function as "cpu", which is indeed wrong.

The following line in pipelines/ddpm/pipeline_ddpm.pycalls the randn_tensor:

image = randn_tensor(image_shape, generator=generator)

With the following comment:

# randn does not work reproducibly on mps

Hm, well. When I pass the device in the function call, it works:

image = randn_tensor(image_shape, generator=generator, device=self.device)

Long story short: Setting the desired argument works.

I don't know if the argument is left out on purpose, so I better keep this local.

0 replies

nickyreinert · 2024-01-19T10:34:05Z

nickyreinert
Jan 19, 2024
Author

I found another thing that does not really make sense to me and that is probably the actual root cause:

In most tutorials the generator object is instantiated from torch.Generator(). Passing this as a generator to torch.randn works fine on MPS, as long as you pass device correctly, which is a little confusing:

This works fine:

torch.randn((3,3), generator=torch.Generator(device="mps"), device="mps")

If you not pass device to either the generator or the randn function, you get one of those errors - apperently, because CPU seems to be default:

RuntimeError: Expected a 'mps' device type for generator but found 'cpu'

RuntimeError: Placeholder storage has not been allocated on MPS device!

(The one I am struggingling with!)

Thing now is, that you must consider when reading through some articles, you cannot do the exact same with manual_seed:

print(torch.randn((3,3), generator=torch.manual_seed(42), device="mps"))

This also leads to (apparently as explained above):

RuntimeError: Expected a 'mps' device type for generator but found 'cpu'

But how do you get reproducable random numbers to your MPS device? Some articles cover that, you just combine both:

print(torch.randn((3,3), generator=torch.Generator(device="mps").manual_seed(42), device="mps"))

To wrap it up:


# will get reproducable random results on device=cpu
print(torch.randn((3,3), generator=torch.Generator()))

# will get reproducable random results on device=mps
print(torch.randn((3,3), generator=torch.Generator(device="mps"), device="mps"))

# will get reproducable random results on device=cpu
print(torch.randn((3,3), generator=torch.manual_seed(42)))

# will get reproducable random results on device=mps
print(torch.randn((3,3), generator=torch.Generator(device="mps").manual_seed(42), device="mps"))

So... the line that causes the error:

latents = torch.randn(shape, generator=generator, device=rand_device, dtype=dtype, layout=layout).to(device)

just does not take into consideration the several ways of how the generator can and will be passed through to it.

0 replies

pcuenca · 2024-01-19T11:07:03Z

pcuenca
Jan 19, 2024
Maintainer

Thanks a lot @nickyreinert, RNG in mps is indeed confusing. I'll take a look at this workflow and see if we can improve it. As a side note, we haven't really tested training on mps, our main goal was to support inference. Does training run successfully with your local changes?

1 reply

nickyreinert Jan 19, 2024
Author

No problem - yes, it is working on MPS with my fixes.

(Only use for educational purposes, because it's incredible slow, apparently)

FWIWF:

compute_environment: LOCAL_MACHINE
debug: false
distributed_type: 'NO'
downcast_bf16: 'no'
machine_rank: 0
main_training_function: main
mixed_precision: 'no'
num_machines: 1
num_processes: 1
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false


!accelerate launch --config_file "accelerate.config" \
    ./.scripts/train_dreambooth.py \
    --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5"  \
    --instance_data_dir="./.input/dog" \
    --output_dir="./.model" \
    --instance_prompt="a photo of sks dog" \
    --resolution=512 \
    --train_batch_size=1 \
    --gradient_accumulation_steps=1 \
    --learning_rate=5e-6 \
    --lr_scheduler="constant" \
    --lr_warmup_steps=0 \
    --max_train_steps=10 \
    --num_train_epochs=2 \
    --num_validation_images=4 \
    --gradient_checkpointing

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unconditional image generation with accelerate and train_unconditional.py crashes with MPS but works on CPU #6568

{{title}}

Replies: 4 comments 1 reply

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Unconditional image generation with accelerate and train_unconditional.py crashes with MPS but works on CPU #6568

nickyreinert Jan 13, 2024

Replies: 4 comments · 1 reply

patrickvonplaten Jan 15, 2024

nickyreinert Jan 15, 2024 Author

nickyreinert Jan 19, 2024 Author

pcuenca Jan 19, 2024 Maintainer

nickyreinert Jan 19, 2024 Author

nickyreinert
Jan 13, 2024

Replies: 4 comments 1 reply

patrickvonplaten
Jan 15, 2024

nickyreinert
Jan 15, 2024
Author

nickyreinert
Jan 19, 2024
Author

pcuenca
Jan 19, 2024
Maintainer

nickyreinert Jan 19, 2024
Author