Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Open-Sora-Plan v1.3.0]: inference and training #717

Open
wants to merge 133 commits into
base: master
Choose a base branch
from

Conversation

wtomin
Copy link
Collaborator

@wtomin wtomin commented Oct 29, 2024

  • vae inference & training under pynative mode;
  • dit inference & training under pynative mode;
  • dynamic resolution dataloader;
  • prompt refiner inference.

@wtomin wtomin force-pushed the op-v1.3-update branch 4 times, most recently from d2bfac6 to 4f29b01 Compare November 14, 2024 07:57
@wtomin wtomin requested a review from vigo999 as a code owner November 21, 2024 11:11
@wtomin wtomin force-pushed the op-v1.3-update branch 2 times, most recently from c068a2f to 1b79586 Compare November 28, 2024 09:22
@wtomin wtomin force-pushed the op-v1.3-update branch 13 times, most recently from 2e7783b to de77902 Compare December 23, 2024 02:23
@wtomin wtomin changed the title [Draft]: Open-Sora-Plan v1.3.0 inference and training [Open-Sora-Plan v1.3.0]: inference and training Dec 23, 2024

#### Performance

The training performance are tested on ascend 910* with mindspore 2.3.1 graph mode. The results are as follows.
We evaluated the training performance on Ascend NPUs. All experiments are running in PYNATIVE mode. The results are as follows.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

describe MS version.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

| OpenSoraT2V-ROPE-L-122 | 8 | 3 | 1 | 29 | 640x480 | 6mins | zero2 | ON | ON | O0 | 3.68 | 63.04 |
| OpenSoraT2V-ROPE-L-122 | 8 | 4 | 1 | 29 |1280x720 | 10mins | zero2 + SP(sp_size=8) | OFF | ON | O0 | 4.32 | 6.71 |
| OpenSoraT2V-ROPE-L-122 | 8 | 5 | 1 | 93 | 1280x720 | 15mins | zero2 + SP(sp_size=8) | ON | ON | O0 | 24.40 | 3.81 |
| model name | cards | stage | batch size (global) | video size | Paramllelism |recompute |data sink | jit level| step time | train imgs/s |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explain what train imgs/s is and how it's computed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants