Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Qwen2_moe: Avoid zero tokens fowarding for some experts #32283

Closed
2 of 4 tasks
Coco58323 opened this issue Jul 29, 2024 · 4 comments · May be fixed by #32429
Closed
2 of 4 tasks

Qwen2_moe: Avoid zero tokens fowarding for some experts #32283

Coco58323 opened this issue Jul 29, 2024 · 4 comments · May be fixed by #32429
Labels

Comments

@Coco58323
Copy link

System Info

transformers=4.43.3
python=3.8
Linux

Who can help?

@ArthurZucker

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

  1. Under Auto-GPTQ with Triton kernel, it would use math.log2() function in Line 96
image While for the Implementation of MoE,

for expert_idx in range(self.num_experts):
expert_layer = self.experts[expert_idx]
idx, top_x = torch.where(expert_mask[expert_idx])
# Index the correct hidden states and compute the expert hidden state for
# the current expert. We need to make sure to multiply the output hidden
# states by `routing_weights` on the corresponding tokens (top-1 and top-2)
current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]

some experts directly forward with zero tokens and therefore the input shape is like [0, seq_length, hidden_states], and fails on log2()

The issue could be solved by checking the number of tokens before Line 675
if current_state.shape[0] == 0: continue

Expected behavior

No more forwarding with zero tokens.

@Coco58323 Coco58323 added the bug label Jul 29, 2024
@ArthurZucker
Copy link
Collaborator

WOuld you like to open a PR for a fix? #31173 could be a nice addition as well?

Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@ArthurZucker
Copy link
Collaborator

No completed, but we need to make sure it is worth it as this might break torch fx export!

Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot closed this as completed Nov 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants