CUDA OOM when Running validation(Dreambooth_Lora_SDXL) #9014

ZwlongSir · 2024-07-30T08:45:41Z

ZwlongSir
Jul 30, 2024

Hi,I tried to run an example procedure of train_dreambooth_lora_sdxl_advanced.py on my own dataset which has 74 images,but it went wrong in the process of Running validation. In addition, when using the same parameter and dataset "3d_icon" with the 3D icon example provided by diffusers, it has the same wrong information.
I'm fairly new to this and I'm not sure what to do next.
Any help is appreciated :)

PARAMS FOR CUSTOM TRAINING:
accelerate launch train_dreambooth_lora_sdxl_advanced.py
--pretrained_model_name_or_path="${MODEL_NAME}"
--pretrained_vae_model_name_or_path="${VAE_PATH}"
--dataset_name="${DATASET_NAME}"
--instance_prompt="A cute pokemon design in the style of TOK"
--validation_prompt="A TOK style of cute pokeman design, featuring a single bird, wind type pokemon, cute, facing in the same direction on a white background."
--output_dir="${OUTPUT_DIR}"
--caption_column="prompt"
--mixed_precision="fp16"
--resolution=1024
--train_batch_size=1
--sample_batch_size=1
--repeats=1
--report_to="wandb"
--gradient_accumulation_steps=1
--gradient_checkpointing
--learning_rate=1.0
--text_encoder_lr=1.0
--optimizer="prodigy"
--train_text_encoder_ti
--train_text_encoder_ti_frac=0.5
--snr_gamma=5.0
--lr_scheduler="constant"
--lr_warmup_steps=0
--rank=32
--max_train_steps=74
--checkpointing_steps=2000
--seed="0"
--push_to_hub

07/30/2024 16:25:50 - INFO - main - Running validation...
Generating 1 images with prompt: A style of cute pokeman design, featuring a single bird, wind type p
okemon, cute, facing in the same direction on a white background..
{'lambda_min_clipped', 'thresholding', 'euler_at_final', 'final_sigmas_type', 'solver_order', 'use_lu_lambdas'$
'variance_type', 'solver_type', 'algorithm_type', 'lower_order_final', 'dynamic_thresholding_ratio', 'rescale$
betas_zero_snr'} was not found in config. Values will be initialized to default values.
Traceback (most recent call last):
File "/data/diffusers/examples/advanced_diffusion_training/train_dreambooth_lora_sdxl_advanced.py", line 246$
, in
main(args)
File "/data/diffusers/examples/advanced_diffusion_training/train_dreambooth_lora_sdxl_advanced.py", line 241$
, in main
images = log_validation(
File "/data/diffusers/examples/advanced_diffusion_training/train_dreambooth_lora_sdxl_advanced.py", line 281$
in log_validation
images = [pipeline(**pipeline_args, generator=generator).images[0] for _ in range(args.num_validation_imag$
s)]
File "/data/diffusers/examples/advanced_diffusion_training/train_dreambooth_lora_sdxl_advanced.py", line 281$
in
images = [pipeline(**pipeline_args, generator=generator).images[0] for _ in range(args.num_validation_imag$
s)]
File "/root/mambaforge/envs/pivotal/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in d$
corate_context
return func(*args, **kwargs)
File "/data/diffusers/src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py", [54/1492]
, in call
image = self.vae.decode(latents, return_dict=False)[0]
File "/data/diffusers/src/diffusers/utils/accelerate_utils.py", line 46, in wrapper
return method(self, *args, **kwargs)
File "/data/diffusers/src/diffusers/models/autoencoders/autoencoder_kl.py", line 321, in decode
decoded = self._decode(z).sample
File "/data/diffusers/src/diffusers/models/autoencoders/autoencoder_kl.py", line 292, in _decode
dec = self.decoder(z)
File "/root/mambaforge/envs/pivotal/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in $
wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/mambaforge/envs/pivotal/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in $
call_impl
return forward_call(*args, **kwargs)
File "/data/diffusers/src/diffusers/models/autoencoders/vae.py", line 337, in forward
sample = up_block(sample, latent_embeds)
File "/root/mambaforge/envs/pivotal/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _
wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/mambaforge/envs/pivotal/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _
call_impl
return forward_call(*args, **kwargs)
File "/data/diffusers/src/diffusers/models/unets/unet_2d_blocks.py", line 2750, in forward
hidden_states = upsampler(hidden_states)
File "/root/mambaforge/envs/pivotal/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _
wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/mambaforge/envs/pivotal/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _
call_impl
return forward_call(*args, **kwargs)
File "/data/diffusers/src/diffusers/models/upsampling.py", line 169, in forward
hidden_states = F.interpolate(hidden_states, scale_factor=2.0, mode="nearest")
File "/root/mambaforge/envs/pivotal/lib/python3.10/site-packages/torch/nn/functional.py", line 4050, in inter
polate
return torch._C._nn.upsample_nearest2d(input, output_size, scale_factors)
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 512.00 MiB. GPU 0 has a total capacity of 21.98 G
iB of which 396.44 MiB is free. Including non-PyTorch memory, this process has 21.58 GiB memory in use. Of the
allocated memory 20.73 GiB is allocated by PyTorch, and 564.22 MiB is reserved by PyTorch but unallocated. If r
eserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid f
ragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#enviro
nment-variables)

System info
GPU :A10 VRAM:24GB

asomoza · 2024-07-30T22:56:10Z

asomoza
Jul 30, 2024
Maintainer

Hi, I ran the same line with and it worked without a problem, for the validation It used 19.4 GB of VRAM, do you have that amount free when you run the training?

Also I use the fixed vae that doesn't need upcasting.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA OOM when Running validation(Dreambooth_Lora_SDXL) #9014

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

CUDA OOM when Running validation(Dreambooth_Lora_SDXL) #9014

ZwlongSir Jul 30, 2024

Replies: 1 comment

asomoza Jul 30, 2024 Maintainer

ZwlongSir
Jul 30, 2024

asomoza
Jul 30, 2024
Maintainer