-
Notifications
You must be signed in to change notification settings - Fork 27.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
_prepare_4d_attention_mask_for_sdpa
is not for causal attention but claims...
#30095
Comments
cc @fxmarty |
Hi @minostauros, thank you for the report.
Yes, good catch, I'll fix that! This is a somewhat unlikely case though, where one would use past key values for typically encoder-type of models. How did you run into this case? |
I didn't run into the specific section but I was just reviewing #28802 and was trying to add flash-attention-2 to BERT (BLIP-2 variant of BERT to be exact). |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
This issue will be closed by #30138 |
... SDPA causal mask generation may be wrong for the mask generation.
transformers/src/transformers/modeling_attn_mask_utils.py
Lines 421 to 433 in 76fa17c
Will it be safe to just return
None
for theelse:
case?For causal attention, we can just use
_prepare_4d_causal_attention_mask_for_sdpa
Related issues:
pytorch/pytorch#108108
Dao-AILab/flash-attention@9e5e8bc
#28802
The text was updated successfully, but these errors were encountered: