CUDA out of memory error is not fixed #7398

alexd725 · 2024-03-19T19:54:44Z

alexd725
Mar 19, 2024

Hi, guies.

Cuda 12.0.
Memory: 16GB, Tesla T4. nproc output: 4

I was editing image using Instruct pix2pix model.
But after a tones of testing, I faced cuda out of memory issue.

CUDA out of memory. Tried to allocate 4.79 GiB. GPU 0 has a total capacity of 14.58 GiB of which 1.89 GiB is free. Process 10983 has 2.10 GiB memory in use. Including non-PyTorch memory, this process has 10.58 GiB memory in use. Of the allocated memory 10.39 GiB is allocated by PyTorch, and 56.02 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

I tried serveral ways from googling to fix it. But this issus is still remainging. Here are the common ways from googleing.

Large batch sizes
Large model architecture
Not freeing up memory
Accumulating intermediate gradients
GPU memory leaks

And also Im not sure how can i reduce batch size in my code and also other ways too. Please help me if you have exact way.

Here is my source code

import time
import os
import torch
import requests
import effects
import urllib.request
import gc
from PIL import Image, ImageTk
from io import BytesIO
from diffusers import StableDiffusionInstructPix2PixPipeline, EulerAncestralDiscreteScheduler
from diffusers.utils import load_image
from datetime import datetime

gc.collect()
torch.cuda.empty_cache()

class WebImage:
    def __init__(self, url):
        with urllib.request.urlopen(url) as u:
            raw_data = u.read()
        image = Image.open(BytesIO(raw_data))
        image.thumbnail((400, 400))  # Fix here
        self.image = image

    def get(self):
        return self.image

class ImageProcessor:
    def __init__(self):
        self.effects = effects.EFFECTS

        torch.backends.cuda.matmul.allow_tf32 = True
        torch._dynamo.config.suppress_errors = True

        model_id = "timbrooks/instruct-pix2pix"
        self.pipe = StableDiffusionInstructPix2PixPipeline.from_pretrained(model_id, torch_dtype=torch.float16, safety_checker=None)
        self.pipe.to("cuda")
        self.pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(self.pipe.scheduler.config)
        self.pipe.unet = torch.compile(self.pipe.unet, mode="reduce-overhead", fullgraph=True)

    def apply_effect(self, src_image_url, effect):
        selected_model = "Pix2Pix"
        # text_prompt = self.effects[effect]
        text_prompt = effect

        generated_result = None  # Initialize result variable

        if selected_model == "Pix2Pix":
            # try:
                # Initialize Pix2Pix pipeline
                begin = time.time()

                # Load source image
                src_image = WebImage(src_image_url).image

                # Apply effect
                images = self.pipe(text_prompt, image=src_image, num_inference_steps=1).images
                print(f"Edit+image       : {time.time()-begin}")
                image = images[0]

                print(f"list_gpu_processes: {torch.cuda.list_gpu_processes()}")

                # Create directory if it doesn't exist
                output_directory = "uploads"
                if not os.path.exists(output_directory):
                    os.makedirs(output_directory)
                
                # Save generated image
                current_time = datetime.now()
                filename = current_time.strftime(f"{output_directory}/%Y%m%d%H%M%S") + ".jpg"
                image.save(filename)  
                
                # Assign output to result
                generated_result = filename
            # except Exception as e:
            #     print("Error", f"An error occurred: {str(e)}")

        return generated_result

Answered by asomoza

Mar 19, 2024

To answer your questions:

Large batch sizes

This at inference is when you're using num_images_per_prompt which is not your case.

Large model architecture

Not your case because the model is just 3.2 GB.

Accumulating intermediate gradients

This is not for inference, is for when you train a model.

GPU memory leaks
Not freeing up memory

These two are probably your case. You only provide a class and not how you're using it, also you don't provide the images or the prompts or what is in effects.

I'm on windows right now so I can't use torch.compile but if I use just the generation code and do a loop with it:

import torch

from diffusers import EulerAncestralDiscreteScheduler, StableDiff…

View full answer

tolgacangoz · 2024-03-19T20:19:13Z

tolgacangoz
Mar 19, 2024

Hi @alexd725. You can also try these methods to reduce memory usage.

1 reply

alexd725 Mar 20, 2024
Author

Thanks, I will try.

asomoza · 2024-03-19T21:32:45Z

asomoza
Mar 19, 2024
Maintainer

To answer your questions:

Large batch sizes

This at inference is when you're using num_images_per_prompt which is not your case.

Large model architecture

Not your case because the model is just 3.2 GB.

Accumulating intermediate gradients

This is not for inference, is for when you train a model.

GPU memory leaks
Not freeing up memory

These two are probably your case. You only provide a class and not how you're using it, also you don't provide the images or the prompts or what is in effects.

I'm on windows right now so I can't use torch.compile but if I use just the generation code and do a loop with it:

import torch

from diffusers import EulerAncestralDiscreteScheduler, StableDiffusionInstructPix2PixPipeline
from diffusers.utils import load_image


model_id = "timbrooks/instruct-pix2pix"
pipe = StableDiffusionInstructPix2PixPipeline.from_pretrained(model_id, torch_dtype=torch.float16, safety_checker=None)
pipe.to("cuda")
pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe.scheduler.config)

image = load_image("https://raw.githubusercontent.com/timothybrooks/instruct-pix2pix/main/imgs/example.jpg")

prompt = "turn him into cyborg"

while True:
    images = pipe(prompt, image=image, num_inference_steps=10, image_guidance_scale=1).images
    images[0].save("generated.png")

The VRAM stays fixed at 4.1 GB and never gets over it.

So the problem is not with diffusers but in your code, you're probably loading the model multiple times or using really big images that makes the process use over 12GB more of VRAM than the 512x512 example.

2 replies

alexd725 Mar 20, 2024
Author

Thanks for your professional explanation.

You are right. I noticed that the error comes when i use these kind of image to edit.
https://firebasestorage.googleapis.com/v0/b/faceapp-2dad3.appspot.com/o/5.png?alt=media&token=896ac766-a76f-4551-a807-ff0d101a5f88
https://firebasestorage.googleapis.com/v0/b/faceapp-2dad3.appspot.com/o/2.png?alt=media&token=874bc991-cac8-408a-8d72-09dd091cbd66
https://firebasestorage.googleapis.com/v0/b/faceapp-2dad3.appspot.com/o/3.png?alt=media&token=9a39d3ec-d15f-41a3-bc88-7eb42bc6c82e
https://firebasestorage.googleapis.com/v0/b/faceapp-2dad3.appspot.com/o/4.png?alt=media&token=cb32c3e8-941b-45d9-b72b-cf0d37f1ae45
https://firebasestorage.googleapis.com/v0/b/faceapp-2dad3.appspot.com/o/5.png?alt=media&token=896ac766-a76f-4551-a807-ff0d101a5f88

Is there any way for the code to work without error for such images?

asomoza Mar 20, 2024
Maintainer

are you sure that those are the images you're using? I see them pixelated and not that big. But if you're using big images you can use the tiled vae option.

DuShunpeng · 2024-06-06T03:01:24Z

DuShunpeng
Jun 6, 2024

Hi, guies.

Cuda 12.0. Memory: 16GB, Tesla T4. nproc output: 4

I was editing image using Instruct pix2pix model. But after a tones of testing, I faced cuda out of memory issue.

CUDA out of memory. Tried to allocate 4.79 GiB. GPU 0 has a total capacity of 14.58 GiB of which 1.89 GiB is free. Process 10983 has 2.10 GiB memory in use. Including non-PyTorch memory, this process has 10.58 GiB memory in use. Of the allocated memory 10.39 GiB is allocated by PyTorch, and 56.02 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

I tried serveral ways from googling to fix it. But this issus is still remainging. Here are the common ways from googleing.

Large batch sizes
Large model architecture
Not freeing up memory
Accumulating intermediate gradients
GPU memory leaks

And also Im not sure how can i reduce batch size in my code and also other ways too. Please help me if you have exact way.

Here is my source code

import time
import os
import torch
import requests
import effects
import urllib.request
import gc
from PIL import Image, ImageTk
from io import BytesIO
from diffusers import StableDiffusionInstructPix2PixPipeline, EulerAncestralDiscreteScheduler
from diffusers.utils import load_image
from datetime import datetime

gc.collect()
torch.cuda.empty_cache()

class WebImage:
    def __init__(self, url):
        with urllib.request.urlopen(url) as u:
            raw_data = u.read()
        image = Image.open(BytesIO(raw_data))
        image.thumbnail((400, 400))  # Fix here
        self.image = image

    def get(self):
        return self.image

class ImageProcessor:
    def __init__(self):
        self.effects = effects.EFFECTS

        torch.backends.cuda.matmul.allow_tf32 = True
        torch._dynamo.config.suppress_errors = True

        model_id = "timbrooks/instruct-pix2pix"
        self.pipe = StableDiffusionInstructPix2PixPipeline.from_pretrained(model_id, torch_dtype=torch.float16, safety_checker=None)
        self.pipe.to("cuda")
        self.pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(self.pipe.scheduler.config)
        self.pipe.unet = torch.compile(self.pipe.unet, mode="reduce-overhead", fullgraph=True)

    def apply_effect(self, src_image_url, effect):
        selected_model = "Pix2Pix"
        # text_prompt = self.effects[effect]
        text_prompt = effect

        generated_result = None  # Initialize result variable

        if selected_model == "Pix2Pix":
            # try:
                # Initialize Pix2Pix pipeline
                begin = time.time()

                # Load source image
                src_image = WebImage(src_image_url).image

                # Apply effect
                images = self.pipe(text_prompt, image=src_image, num_inference_steps=1).images
                print(f"Edit+image       : {time.time()-begin}")
                image = images[0]

                print(f"list_gpu_processes: {torch.cuda.list_gpu_processes()}")

                # Create directory if it doesn't exist
                output_directory = "uploads"
                if not os.path.exists(output_directory):
                    os.makedirs(output_directory)
                
                # Save generated image
                current_time = datetime.now()
                filename = current_time.strftime(f"{output_directory}/%Y%m%d%H%M%S") + ".jpg"
                image.save(filename)  
                
                # Assign output to result
                generated_result = filename
            # except Exception as e:
            #     print("Error", f"An error occurred: {str(e)}")

        return generated_result

hi, @alexd725 :
im not sure that "35GB when setting --train_batch_size=1 and --resolution=1024." is true or not,
but pls try as below:
1、make sure that --use_8bit_adam --enable_xformers_memory_efficient_attention --set_grads_to_none
2、find "optimizer_class = bnb.optim.AdamW8bit" => "bnb.optim.PagedAdamW8bit " . train_controlnet_sdxl.py or whatever you are using

Good luck!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA out of memory error is not fixed #7398

{{title}}

Replies: 3 comments 3 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

CUDA out of memory error is not fixed #7398

alexd725 Mar 19, 2024

Replies: 3 comments · 3 replies

tolgacangoz Mar 19, 2024

alexd725 Mar 20, 2024 Author

asomoza Mar 19, 2024 Maintainer

alexd725 Mar 20, 2024 Author

asomoza Mar 20, 2024 Maintainer

DuShunpeng Jun 6, 2024

alexd725
Mar 19, 2024

Replies: 3 comments 3 replies

tolgacangoz
Mar 19, 2024

alexd725 Mar 20, 2024
Author

asomoza
Mar 19, 2024
Maintainer

alexd725 Mar 20, 2024
Author

asomoza Mar 20, 2024
Maintainer

DuShunpeng
Jun 6, 2024