Fix: 'Gemma2Attention' object has no attribute '_flash_attn_uses_top_left_mask' #35285

jp1924 · 2024-12-16T01:49:40Z

What does this PR do?

A PR to fix the 'no attribute '_flash_attn_uses_top_left_mask'' error occurring in each model with the flash_attention module, including gemma2.

Reproduction Code

from transformers import AutoConfig, AutoModelForCausalLM, AutoTokenizer


def main() -> None:
    repo_id = "google/gemma-2-2b-it"
    config = AutoConfig.from_pretrained(repo_id, _attn_implementation="flash_attention_2")
    model = AutoModelForCausalLM.from_pretrained(repo_id, config=config, device_map="cpu")
    tokenizer = AutoTokenizer.from_pretrained(repo_id)

    text = tokenizer.apply_chat_template([{"role": "user", "content": "Hello!"}], tokenize=False)
    input_param = tokenizer(text, return_tensors="pt", return_attention_mask=True)
    input_param["labels"] = input_param["input_ids"].clone()
    output = model(**input_param)


if "__main__" in __name__:
    main()

pip install git+https://github.com/huggingface/transformers.git@5615a393691c81e00251e420c73e4d04c6fe22e5

Env

- `transformers` version: 4.48.0.dev0
- Platform: Linux-5.15.0-124-generic-x86_64-with-glibc2.35
- Python version: 3.10.12
- Huggingface_hub version: 0.26.2
- Safetensors version: 0.4.5
- Accelerate version: 1.1.1

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@ArthurZucker

jp1924 · 2024-12-16T07:03:10Z

@ArthurZucker
the attention module is being included in the config. Is this working as you intended?

jp1924 · 2024-12-18T23:59:36Z

This pull request is being closed because the issue has been resolved in pull request #35235.

ArthurZucker · 2024-12-19T08:13:41Z

Thanks ! And sorry for the delay! 🤗

jp1924 added 2 commits December 16, 2024 10:30

Add: _flash_attn_uses_top_left_mask

c732ccb

Fix: config to self

5ddb69a

jp1924 closed this Dec 18, 2024

jp1924 deleted the fix_config_flash-attn_attr_error branch December 18, 2024 23:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: 'Gemma2Attention' object has no attribute '_flash_attn_uses_top_left_mask' #35285

Fix: 'Gemma2Attention' object has no attribute '_flash_attn_uses_top_left_mask' #35285

jp1924 commented Dec 16, 2024 •

edited

Loading

jp1924 commented Dec 16, 2024

jp1924 commented Dec 18, 2024

ArthurZucker commented Dec 19, 2024

Fix: 'Gemma2Attention' object has no attribute '_flash_attn_uses_top_left_mask' #35285

Fix: 'Gemma2Attention' object has no attribute '_flash_attn_uses_top_left_mask' #35285

Conversation

jp1924 commented Dec 16, 2024 • edited Loading

What does this PR do?

Reproduction Code

Env

Before submitting

Who can review?

jp1924 commented Dec 16, 2024

jp1924 commented Dec 18, 2024

ArthurZucker commented Dec 19, 2024

jp1924 commented Dec 16, 2024 •

edited

Loading