Qwen2_moe: Avoid zero tokens fowarding for some experts #32283

Coco58323 · 2024-07-29T07:48:59Z

System Info

transformers=4.43.3
python=3.8
Linux

Who can help?

@ArthurZucker

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

Under Auto-GPTQ with Triton kernel, it would use math.log2() function in Line 96

While for the Implementation of MoE,

transformers/src/transformers/models/qwen2_moe/modeling_qwen2_moe.py

Lines 667 to 675 in f739687

    
           for expert_idx in range(self.num_experts): 
        
               expert_layer = self.experts[expert_idx] 
        
               idx, top_x = torch.where(expert_mask[expert_idx]) 
        
               # Index the correct hidden states and compute the expert hidden state for 
        
               # the current expert. We need to make sure to multiply the output hidden 
        
               # states by `routing_weights` on the corresponding tokens (top-1 and top-2) 
        
               current_state = hidden_states[None, top_x].reshape(-1, hidden_dim) 
        
               current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]

some experts directly forward with zero tokens and therefore the input shape is like [0, seq_length, hidden_states], and fails on log2()

The issue could be solved by checking the number of tokens before Line 675
if current_state.shape[0] == 0: continue

Expected behavior

No more forwarding with zero tokens.

The text was updated successfully, but these errors were encountered:

ArthurZucker · 2024-08-01T12:40:27Z

WOuld you like to open a PR for a fix? #31173 could be a nice addition as well?

github-actions · 2024-09-13T08:05:46Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

ArthurZucker · 2024-09-27T14:09:14Z

No completed, but we need to make sure it is worth it as this might break torch fx export!

github-actions · 2024-10-22T08:07:53Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Coco58323 added the bug label Jul 29, 2024

Coco58323 mentioned this issue Aug 5, 2024

Skip non-selected experts for mixtral and qwen2_moe #32429

Open

github-actions bot closed this as completed Sep 21, 2024

ArthurZucker reopened this Sep 27, 2024

github-actions bot closed this as completed Nov 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qwen2_moe: Avoid zero tokens fowarding for some experts #32283

Qwen2_moe: Avoid zero tokens fowarding for some experts #32283

Coco58323 commented Jul 29, 2024

ArthurZucker commented Aug 1, 2024

github-actions bot commented Sep 13, 2024

ArthurZucker commented Sep 27, 2024

github-actions bot commented Oct 22, 2024

Qwen2_moe: Avoid zero tokens fowarding for some experts #32283

Qwen2_moe: Avoid zero tokens fowarding for some experts #32283

Comments

Coco58323 commented Jul 29, 2024

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

ArthurZucker commented Aug 1, 2024

github-actions bot commented Sep 13, 2024

ArthurZucker commented Sep 27, 2024

github-actions bot commented Oct 22, 2024