Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Sana] Add Sana, including SanaPipeline, SanaPAGPipeline, LinearAttentionProcessor, Flow-based DPM-sovler and so on. #9982

Open
wants to merge 85 commits into
base: main
Choose a base branch
from

Conversation

lawrence-cj
Copy link
Contributor

What does this PR do?

This PR will add the official Sana (SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer) into the diffusers lib. Sana first makes the Text-to-Image available on 32x compressed latent space, powered by DC-AE(https://arxiv.org/abs/2410.10733v1) without performance degradation. Also, Sana contains several popular efficiency related techs, like DiT with Linear Attention processor and we use Decoder-only LLM (Gemma-2B-IT) for low GPU requirement and fast speed.

Paper: https://arxiv.org/abs/2410.10629
Original code repo: https://github.com/NVlabs/Sana
Project: https://nvlabs.github.io/Sana

Core contributor of DC-AE:
work with @[email protected]

Core library:

We want to collaborate on this PR together with friends from HF. Feel free to contact me here. Cc: @sayakpaul, @yiyixuxu

Core library:

HF projects:

-->

Images is generated by SanaPAGPipeline with FlowDPMSolverMultistepScheduler

5361732169697_ pic_hd

lawrence-cj and others added 17 commits November 27, 2024 00:44
* skip nan lora tests on PyTorch 2.5.1 CPU.

* cog

* use xfail

* correct xfail

* add condition

* tests
* enable on xpu

* add 1 more

* add one more

* enable more

* add 1 more

* add more

* enable 1

* enable more cases

* enable

* enable

* update comment

* one more

* enable 1

* add more cases

* enable xpu

* add one more caswe

* add more cases

* add 1

* add more

* add more cases

* add case

* enable

* add more

* add more

* add more

* enbale more

* add more

* update code

* update test marker

* add skip back

* update comment

* remove single files

* remove

* style

* add

* revert

* reformat

* update decorator

* update

* update

* update

* Update tests/pipelines/deepfloyd_if/test_if.py

Co-authored-by: Dhruv Nair <[email protected]>

* Update src/diffusers/utils/testing_utils.py

Co-authored-by: Dhruv Nair <[email protected]>

* Update tests/pipelines/animatediff/test_animatediff_controlnet.py

Co-authored-by: Dhruv Nair <[email protected]>

* Update tests/pipelines/animatediff/test_animatediff.py

Co-authored-by: Dhruv Nair <[email protected]>

* Update tests/pipelines/animatediff/test_animatediff_controlnet.py

Co-authored-by: Dhruv Nair <[email protected]>

* update float16

* no unitest.skipt

* update

* apply style check

* reapply format

---------

Co-authored-by: Sayak Paul <[email protected]>
Co-authored-by: Dhruv Nair <[email protected]>
* update

---------

Co-authored-by: yiyixuxu <[email protected]>
Co-authored-by: Sayak Paul <[email protected]>
* smol change to fix checkpoint saving & resuming (as done in train_dreambooth_sd3.py)

* style

* modify comment to explain reasoning behind hidden size check
@lawrence-cj
Copy link
Contributor Author

lawrence-cj commented Nov 27, 2024

so you're licensing this code to fit into the Diffusers project? because the original Sana codebase is non-commercial. why is that NC but this is being opened as Apache 2.0*?

The license of Sana's code base is changed to Apacha 2.0 @bghira . Refer to: https://github.com/NVlabs/Sana?tab=Apache-2.0-1-ov-file

@@ -407,6 +409,11 @@ def set_timesteps(
sigmas = np.flip(sigmas).copy()
sigmas = self._convert_to_beta(in_sigmas=sigmas, num_inference_steps=num_inference_steps)
timesteps = np.array([self._sigma_to_t(sigma, log_sigmas) for sigma in sigmas])
elif self.config.use_flow_sigmas:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lawrence-cj can we use karras_sigmas/exponential_sigmas/beta_sigmas with flow-matching? (i.e. use_beta_sigmas=True and prediction_type="flow_prediction")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
elif self.config.use_flow_sigmas:
elif self.config.use_beta_sigmas:
sigmas = np.flip(sigmas).copy()
sigmas = self._convert_to_beta(in_sigmas=sigmas, num_inference_steps=num_inference_steps)
timesteps = np.array([self._sigma_to_t(sigma, log_sigmas) for sigma in sigmas])
if self.config.use_flow_sigmas:
alphas = np.linspace(1, 1 / self.config.num_train_timesteps, num_inference_steps + 1)
sigmas = 1.0 - alphas
sigmas = np.flip(self.config.flow_shift * sigmas / (1 + (self.config.flow_shift - 1) * sigmas))[:-1]
timesteps = (sigmas * self.config.num_train_timesteps).copy()

Do you mean the logic like this, but with some change of code?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

umm not sure what you mean, here in your suggested code change you have this,

if self.config.use_beta_sigmas
    if self.config.use_flow_sigmas:
        ....

I don't think we can configure use_beta_sigma and use_flow_sigmas to be True at the same time. However, we should be able to configure use_beta_sigma=True and prediction_type="flow_prediction" at the same time, no? basically, you will still be doing flow match but use a "noise schedule" that's not uniform

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just wonder what happens if the user does that with the current implementation. Would it work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you will still be doing flow match but use a "noise schedule" that's not uniform

Then, it will not work I think, the noise schedule has to be uniform as it's defined in SD3, Flux and Sana.
If I understand correct, the code will work, like:

if self.config.use_beta_sigmas
    if prediction_type=="flow_prediction":
        ....

But, I'm not sure if the sigma here is possible to be uniform here:

sigmas = np.array(
[
sigma_min + (ppf * (sigma_max - sigma_min))
for ppf in [
scipy.stats.beta.ppf(timestep, alpha, beta)
for timestep in 1 - np.linspace(0, 1, num_inference_steps)
]
]
)
return sigmas

If the sigma can't be uniform, I think it's not reasonable to place prediction_type="flow_prediction" under use_beta_sigma=True

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ohh I was under the impression that we can use other sigmas distribution for flow-matching see #10001 (comment)

but we don't have to worry about it for this PR. If you have time and are interested in investigating this it would be great! :) if not, we can just make sure user are only be able use use_flow_sigma=True when prediction_type="flow_prediction", i.e. throw an error if not the case

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given the code change is so minimum I think we can just keep the change in DPM scheduler for now (we can remove that new scheduler file)

this creates some inconsistency across the library (for euler and heun we have a separate flow match scheduler); but since the change is tiny a& we are going through some scheduler refactoring soon, I think it is ok!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yah, the only concern is if we maintain the flow prediction in original DPM scheduler, there are lots of unrelated code that's only for original dpm-solver. But, if you can refactor and integrate flow into original file nicely, i'm very ok with it! :)

Copy link
Contributor Author

@lawrence-cj lawrence-cj Nov 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ohh I was under the impression that we can use other sigmas distribution for flow-matching see #10001 (comment)

but we don't have to worry about it for this PR. If you have time and are interested in investigating this it would be great! :) if not, we can just make sure user are only be able use use_flow_sigma=True when prediction_type="flow_prediction", i.e. throw an error if not the case

Interesting, and it makes sense. I had a experiment before, If I train the model with timestep_shift=3 and usetimestep_shift=4 to inference will also work well. This may explain why the sigmas change a little bit, but the model can still work, specially for the large model, like FLUX. I'll figure it out in my later update in this PR.

@lawrence-cj
Copy link
Contributor Author

lawrence-cj commented Nov 30, 2024

Hi, we prepared two model repos for you guys to test the correctness of both SanaPipeline and SanaPAGPipline:

https://huggingface.co/Efficient-Large-Model/Sana_pag_1600M_1024px_diffusers
https://huggingface.co/Efficient-Large-Model/Sana_1600M_1024px_diffusers
Gentle ping @bghira @yiyixuxu @stevhliu @a-r-r-o-w @sayakpaul

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.