Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FlexAttention Output Differs from SDPA #62

Open
chayut-t opened this issue Oct 22, 2024 · 4 comments
Open

FlexAttention Output Differs from SDPA #62

chayut-t opened this issue Oct 22, 2024 · 4 comments

Comments

@chayut-t
Copy link

I run python examples/benchmark.py and encounter the following error - the output from flex attention differs from SDPA

Traceback (most recent call last):
  File "/local/home/chayut/workplace/attention-gym/examples/benchmark.py", line 256, in <module>
    main(**vars(args))
  File "/local/home/chayut/workplace/attention-gym/examples/benchmark.py", line 234, in main
    available_examples[ex]()
  File "/local/home/chayut/workplace/attention-gym/examples/benchmark.py", line 214, in <lambda>
    "causal": lambda: test_mask(mask_mod=causal_mask),
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/local/home/chayut/workplace/attention-gym/examples/benchmark.py", line 140, in test_mask
    torch.testing.assert_close(flex, sdpa_mask, atol=1e-1, rtol=1e-2)
  File "/home/chayut/.local/share/mise/installs/python/3.12.7/lib/python3.12/site-packages/torch/testing/_comparison.py", line 1530, in assert_close
    raise error_metas[0].to_error(msg)
AssertionError: Tensor-likes are not close!

Mismatched elements: 6176101 / 134217728 (4.6%)
Greatest absolute difference: 4.546875 at index (10, 0, 2, 63) (up to 0.1 allowed)
Greatest relative difference: inf at index (0, 3, 5186, 54) (up to 0.01 allowed)

I use AWS p3.2xlarge (1 V100 GPU) instance with NVIDIA driver version 550.127.05 and CUDA version 12.4.1

@drisspg
Copy link
Contributor

drisspg commented Oct 28, 2024

Which example is this?

@chayut-t
Copy link
Author

I run python examples/benchmark.py (from attention-gym directory). I get the following output

Using the default sparsity block size: 128
╔═════════════════════════════════════════════════════════════════════════════════════════╗
║                                       Causal Mask                                       ║
╚═════════════════════════════════════════════════════════════════════════════════════════╝
Traceback (most recent call last):
  File "/local/home/chayut/workplace/attention-gym/examples/benchmark.py", line 256, in <module>
    main(**vars(args))
  File "/local/home/chayut/workplace/attention-gym/examples/benchmark.py", line 234, in main
    available_examples[ex]()
  File "/local/home/chayut/workplace/attention-gym/examples/benchmark.py", line 214, in <lambda>
    "causal": lambda: test_mask(mask_mod=causal_mask),
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/local/home/chayut/workplace/attention-gym/examples/benchmark.py", line 140, in test_mask
    torch.testing.assert_close(flex, sdpa_mask, atol=1e-1, rtol=1e-2)
  File "/home/chayut/.local/share/mise/installs/python/3.12.7/lib/python3.12/site-packages/torch/testing/_comparison.py", line 1530, in assert_close
    raise error_metas[0].to_error(msg)
AssertionError: Tensor-likes are not close!

Mismatched elements: 6176101 / 134217728 (4.6%)
Greatest absolute difference: 4.546875 at index (10, 0, 2, 63) (up to 0.1 allowed)
Greatest relative difference: inf at index (0, 3, 5186, 54) (up to 0.01 allowed)

I'm not sure what you mean by "which example is this"?

@drisspg
Copy link
Contributor

drisspg commented Oct 31, 2024

What version of PyTorch are you using? At least locally I am unable to reproduce this

@chayut-t
Copy link
Author

chayut-t commented Nov 1, 2024

PyTorch 2.5.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants