Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Mixture of experts with token dropping #44

Open
hatanp opened this issue Jul 11, 2024 · 0 comments
Open

Feature: Mixture of experts with token dropping #44

hatanp opened this issue Jul 11, 2024 · 0 comments
Assignees

Comments

@hatanp
Copy link
Collaborator

hatanp commented Jul 11, 2024

Megatron-DeepSpeed supports MoE as seen from examples_deepspeed/MoE, there is some support for PP and TP as well introduced recently: microsoft#373 However I could not get this easily running, maybe I was missing some recent deepspeed updates that required this one.

Megatron-LM has some MoE support but the older version that can easily be ported to any accelerator is lacking drop token support. Maybe that can be ported easily still?

@hatanp hatanp self-assigned this Jul 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant