`RuntimeError: std::bad_alloc` Error occur in other pixel ratio (none-square generation) #153

Suprhimp · 2024-07-12T06:22:48Z

HI, first of all Thanks for the cool diffuser optimize module.

Now I compiled my SDXL Lightning model with controlnet canny and compel module for longer prompt.

And I run with StableDiffusionXLControlNetInpaintPipeline

It works well with 1:1 ratio like 512,512 , 1024,1024 pixels.

But when I run with other pixel ratio such as (1360, 768) , (1176, 880) which is available wihtout compile

It shows error RuntimeError: std::bad_alloc

It is not allowed to use other ratio except 1:1 ?

Thanks!

The text was updated successfully, but these errors were encountered:

chengzeyi · 2024-07-12T12:06:58Z

This should not happen🤔

Suprhimp · 2024-07-12T20:59:46Z

import time
import torch
from diffusers import (StableDiffusionXLPipeline,
                       EulerAncestralDiscreteScheduler)
from sfast.compilers.diffusion_pipeline_compiler import (compile,
                                                         CompilationConfig)

def load_model():
    model = StableDiffusionXLPipeline.from_pretrained(
        'SG161222/RealVisXL_V4.0_Lightning',
        variant="fp16",
        torch_dtype=torch.float16)

    model.scheduler = EulerAncestralDiscreteScheduler.from_config(
        model.scheduler.config)
    # model.safety_checker = None
    model.to(torch.device('cuda'))
    return model

model = load_model()

config = CompilationConfig.Default()
# xformers and Triton are suggested for achieving best performance.
try:
    import xformers
    config.enable_xformers = True
except ImportError:
    print('xformers not installed, skip')
try:
    import triton
    config.enable_triton = True
except ImportError:
    print('Triton not installed, skip')
config.enable_cuda_graph = True

model = compile(model, config)

kwarg_inputs = dict(
    prompt=
    '(masterpiece:1,2), best quality, masterpiece, best detailed face, a beautiful girl',
    height=1024,
    width=1024,
    num_inference_steps=15,
    num_images_per_prompt=1,
)
kwarg_inputs2 = dict(
    prompt=
    '(masterpiece:1,2), best quality, masterpiece, best detailed face, a beautiful girl',
    height=1176,
    width=880,
    num_inference_steps=15,
    num_images_per_prompt=1,
)


# NOTE: Warm it up.
output_image = model(**kwarg_inputs).images[0]

begin = time.time()
# output_image = model(**kwarg_inputs).images[0]
output_image = model(**kwarg_inputs2).images[0]
print(f'Inference time: {time.time() - begin:.3f}s')

output_image.save('test_out.png')

This is simple test code that I tried. 1024,1024 image works well but generation with kwarg_inputs2 occur Segmentation fault (core dumped)

it died with this dmesg

[   21.457411] EXT4-fs (dm-0): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
[  765.435334] pt_main_thread[2886]: segfault at 20 ip 00007f17921d7024 sp 00007ffe65cee710 error 4 in libtorch_cpu.so[7f178dcac000+129b4000]
[  765.435347] Code: f8 01 0f 84 98 00 00 00 66 0f 1f 44 00 00 48 8b 84 24 80 00 00 00 48 8d 78 e8 4c 39 ef 0f 85 1b 02 00 00 48 8b 7b 60 48 8b 07 <ff> 50 20 48 8d 70 28 4c 89 e7 e8 7d 98 ae fb 48 83 c3 68 48 39 5c
[  906.383858] pt_main_thread[3069]: segfault at 20 ip 00007ffaf5c5c024 sp 00007ffe0283f0b0 error 4 in libtorch_cpu.so[7ffaf1731000+129b4000]
[  906.383871] Code: f8 01 0f 84 98 00 00 00 66 0f 1f 44 00 00 48 8b 84 24 80 00 00 00 48 8d 78 e8 4c 39 ef 0f 85 1b 02 00 00 48 8b 7b 60 48 8b 07 <ff> 50 20 48 8d 70 28 4c 89 e7 e8 7d 98 ae fb 48 83 c3 68 48 39 5c
[ 1994.323293] pt_main_thread[3874]: segfault at e52 ip 00007f61f9a27024 sp 00007ffd20ac4720 error 4 in libtorch_cpu.so[7f61f54fc000+129b4000]
[ 1994.323307] Code: f8 01 0f 84 98 00 00 00 66 0f 1f 44 00 00 48 8b 84 24 80 00 00 00 48 8d 78 e8 4c 39 ef 0f 85 1b 02 00 00 48 8b 7b 60 48 8b 07 <ff> 50 20 48 8d 70 28 4c 89 e7 e8 7d 98 ae fb 48 83 c3 68 48 39 5c
[ 2259.761472] pt_main_thread[4186]: segfault at 0 ip 0000000000000000 sp 00007ffff3848e38 error 14
[ 2259.761481] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.

I can see RuntimeError: std::bad_alloc when I run the test code with LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libtcmalloc_minimal.so.4 nohup python3 test.py

I use with aws instance ubuntu , A10 GPU, cuda 12.1, torch 2.3.0+cu121, with stable_fast-1.0.5+torch230cu121-cp310-cp310-manylinux2014_x86_64

Suprhimp · 2024-07-16T16:56:26Z

@chengzeyi Is there any thing that I can help to solve this problem?

CallmeZhangChenchen · 2024-07-17T02:39:59Z

@Suprhimp Active development on stable-fast has been paused. 遇到一些问题需要自己去尝试解决
Warm up. 的时候把输入的宽高设置成 height=1000, width=1000, 应该可以解决你的问题

Birch-san mentioned this issue Aug 21, 2024

segfault when latent height or width is not divisible by 4 #159

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`RuntimeError: std::bad_alloc` Error occur in other pixel ratio (none-square generation) #153

`RuntimeError: std::bad_alloc` Error occur in other pixel ratio (none-square generation) #153

Suprhimp commented Jul 12, 2024

chengzeyi commented Jul 12, 2024

Suprhimp commented Jul 12, 2024

Suprhimp commented Jul 16, 2024

CallmeZhangChenchen commented Jul 17, 2024 •

edited

Loading

RuntimeError: std::bad_alloc Error occur in other pixel ratio (none-square generation) #153

RuntimeError: std::bad_alloc Error occur in other pixel ratio (none-square generation) #153

Comments

Suprhimp commented Jul 12, 2024

chengzeyi commented Jul 12, 2024

Suprhimp commented Jul 12, 2024

Suprhimp commented Jul 16, 2024

CallmeZhangChenchen commented Jul 17, 2024 • edited Loading

`RuntimeError: std::bad_alloc` Error occur in other pixel ratio (none-square generation) #153

`RuntimeError: std::bad_alloc` Error occur in other pixel ratio (none-square generation) #153

CallmeZhangChenchen commented Jul 17, 2024 •

edited

Loading