Replies: 2 comments 1 reply
-
This issue has been resolved by using CLIP-ViT-H-14-laion2B-s32B-b79K. However, I have encountered a new problem. When I call the model using this process, the generated images are ugly, but the images generated in sdwebuiA1111's controlnet ipadapter are much better. I don't understand where the difference lies between the two.
but A1111 ‘ output is: |
Beta Was this translation helpful? Give feedback.
-
I noticed that there is a cross-attention weight setting in ControlNet. I am wondering if I made a mistake while writing it in diffusers. pipeline.unet.encoder_hid_proj.image_projection_layers[0].clip_embeds = clip_embeds.to(dtype=torch.float16) Do you have any more suggestions? |
Beta Was this translation helpful? Give feedback.
-
I did it this way, but there were errors.
clip_embeds shape: torch.Size([2, 1, 257, 1664])
id_embeds shape: torch.Size([2, 1, 1, 512])
0%| | 0/20 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/wlk/x04.py", line 80, in
images = pipeline(
File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py", line 1209, in call
noise_pred = self.unet(
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/diffusers/models/unets/unet_2d_condition.py", line 1157, in forward
encoder_hidden_states = self.process_encoder_hidden_states(
File "/opt/conda/lib/python3.10/site-packages/diffusers/models/unets/unet_2d_condition.py", line 1028, in process_encoder_hidden_states
image_embeds = self.encoder_hid_proj(image_embeds)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/diffusers/models/embeddings.py", line 1268, in forward
image_embed = image_projection_layer(image_embed)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/diffusers/models/embeddings.py", line 1151, in forward
x = self.proj_in(x)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 116, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (2x512 and 1280x1280)
Beta Was this translation helpful? Give feedback.
All reactions