Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: std::bad_alloc Error occur in other pixel ratio (none-square generation) #153

Open
Suprhimp opened this issue Jul 12, 2024 · 4 comments

Comments

@Suprhimp
Copy link

HI, first of all Thanks for the cool diffuser optimize module.

Now I compiled my SDXL Lightning model with controlnet canny and compel module for longer prompt.

And I run with StableDiffusionXLControlNetInpaintPipeline

It works well with 1:1 ratio like 512,512 , 1024,1024 pixels.

But when I run with other pixel ratio such as (1360, 768) , (1176, 880) which is available wihtout compile

It shows error RuntimeError: std::bad_alloc

It is not allowed to use other ratio except 1:1 ?

Thanks!

@chengzeyi
Copy link
Owner

This should not happen🤔

@Suprhimp
Copy link
Author

import time
import torch
from diffusers import (StableDiffusionXLPipeline,
                       EulerAncestralDiscreteScheduler)
from sfast.compilers.diffusion_pipeline_compiler import (compile,
                                                         CompilationConfig)

def load_model():
    model = StableDiffusionXLPipeline.from_pretrained(
        'SG161222/RealVisXL_V4.0_Lightning',
        variant="fp16",
        torch_dtype=torch.float16)

    model.scheduler = EulerAncestralDiscreteScheduler.from_config(
        model.scheduler.config)
    # model.safety_checker = None
    model.to(torch.device('cuda'))
    return model

model = load_model()

config = CompilationConfig.Default()
# xformers and Triton are suggested for achieving best performance.
try:
    import xformers
    config.enable_xformers = True
except ImportError:
    print('xformers not installed, skip')
try:
    import triton
    config.enable_triton = True
except ImportError:
    print('Triton not installed, skip')
config.enable_cuda_graph = True

model = compile(model, config)

kwarg_inputs = dict(
    prompt=
    '(masterpiece:1,2), best quality, masterpiece, best detailed face, a beautiful girl',
    height=1024,
    width=1024,
    num_inference_steps=15,
    num_images_per_prompt=1,
)
kwarg_inputs2 = dict(
    prompt=
    '(masterpiece:1,2), best quality, masterpiece, best detailed face, a beautiful girl',
    height=1176,
    width=880,
    num_inference_steps=15,
    num_images_per_prompt=1,
)


# NOTE: Warm it up.
output_image = model(**kwarg_inputs).images[0]

begin = time.time()
# output_image = model(**kwarg_inputs).images[0]
output_image = model(**kwarg_inputs2).images[0]
print(f'Inference time: {time.time() - begin:.3f}s')

output_image.save('test_out.png')

This is simple test code that I tried. 1024,1024 image works well but generation with kwarg_inputs2 occur Segmentation fault (core dumped)

it died with this dmesg

[   21.457411] EXT4-fs (dm-0): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
[  765.435334] pt_main_thread[2886]: segfault at 20 ip 00007f17921d7024 sp 00007ffe65cee710 error 4 in libtorch_cpu.so[7f178dcac000+129b4000]
[  765.435347] Code: f8 01 0f 84 98 00 00 00 66 0f 1f 44 00 00 48 8b 84 24 80 00 00 00 48 8d 78 e8 4c 39 ef 0f 85 1b 02 00 00 48 8b 7b 60 48 8b 07 <ff> 50 20 48 8d 70 28 4c 89 e7 e8 7d 98 ae fb 48 83 c3 68 48 39 5c
[  906.383858] pt_main_thread[3069]: segfault at 20 ip 00007ffaf5c5c024 sp 00007ffe0283f0b0 error 4 in libtorch_cpu.so[7ffaf1731000+129b4000]
[  906.383871] Code: f8 01 0f 84 98 00 00 00 66 0f 1f 44 00 00 48 8b 84 24 80 00 00 00 48 8d 78 e8 4c 39 ef 0f 85 1b 02 00 00 48 8b 7b 60 48 8b 07 <ff> 50 20 48 8d 70 28 4c 89 e7 e8 7d 98 ae fb 48 83 c3 68 48 39 5c
[ 1994.323293] pt_main_thread[3874]: segfault at e52 ip 00007f61f9a27024 sp 00007ffd20ac4720 error 4 in libtorch_cpu.so[7f61f54fc000+129b4000]
[ 1994.323307] Code: f8 01 0f 84 98 00 00 00 66 0f 1f 44 00 00 48 8b 84 24 80 00 00 00 48 8d 78 e8 4c 39 ef 0f 85 1b 02 00 00 48 8b 7b 60 48 8b 07 <ff> 50 20 48 8d 70 28 4c 89 e7 e8 7d 98 ae fb 48 83 c3 68 48 39 5c
[ 2259.761472] pt_main_thread[4186]: segfault at 0 ip 0000000000000000 sp 00007ffff3848e38 error 14
[ 2259.761481] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.

I can see RuntimeError: std::bad_alloc when I run the test code with LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libtcmalloc_minimal.so.4 nohup python3 test.py

I use with aws instance ubuntu , A10 GPU, cuda 12.1, torch 2.3.0+cu121, with stable_fast-1.0.5+torch230cu121-cp310-cp310-manylinux2014_x86_64

@Suprhimp
Copy link
Author

@chengzeyi Is there any thing that I can help to solve this problem?

@CallmeZhangChenchen
Copy link

CallmeZhangChenchen commented Jul 17, 2024

@Suprhimp Active development on stable-fast has been paused. 遇到一些问题需要自己去尝试解决
Warm up. 的时候把输入的宽高设置成 height=1000, width=1000, 应该可以解决你的问题

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants