FlexAttention Output Differs from SDPA #62

chayut-t · 2024-10-22T20:19:06Z

I run python examples/benchmark.py and encounter the following error - the output from flex attention differs from SDPA

Traceback (most recent call last):
  File "/local/home/chayut/workplace/attention-gym/examples/benchmark.py", line 256, in <module>
    main(**vars(args))
  File "/local/home/chayut/workplace/attention-gym/examples/benchmark.py", line 234, in main
    available_examples[ex]()
  File "/local/home/chayut/workplace/attention-gym/examples/benchmark.py", line 214, in <lambda>
    "causal": lambda: test_mask(mask_mod=causal_mask),
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/local/home/chayut/workplace/attention-gym/examples/benchmark.py", line 140, in test_mask
    torch.testing.assert_close(flex, sdpa_mask, atol=1e-1, rtol=1e-2)
  File "/home/chayut/.local/share/mise/installs/python/3.12.7/lib/python3.12/site-packages/torch/testing/_comparison.py", line 1530, in assert_close
    raise error_metas[0].to_error(msg)
AssertionError: Tensor-likes are not close!

Mismatched elements: 6176101 / 134217728 (4.6%)
Greatest absolute difference: 4.546875 at index (10, 0, 2, 63) (up to 0.1 allowed)
Greatest relative difference: inf at index (0, 3, 5186, 54) (up to 0.01 allowed)

I use AWS p3.2xlarge (1 V100 GPU) instance with NVIDIA driver version 550.127.05 and CUDA version 12.4.1

The text was updated successfully, but these errors were encountered:

drisspg · 2024-10-28T22:23:17Z

Which example is this?

chayut-t · 2024-10-31T15:31:43Z

I run python examples/benchmark.py (from attention-gym directory). I get the following output

Using the default sparsity block size: 128
╔═════════════════════════════════════════════════════════════════════════════════════════╗
║                                       Causal Mask                                       ║
╚═════════════════════════════════════════════════════════════════════════════════════════╝
Traceback (most recent call last):
  File "/local/home/chayut/workplace/attention-gym/examples/benchmark.py", line 256, in <module>
    main(**vars(args))
  File "/local/home/chayut/workplace/attention-gym/examples/benchmark.py", line 234, in main
    available_examples[ex]()
  File "/local/home/chayut/workplace/attention-gym/examples/benchmark.py", line 214, in <lambda>
    "causal": lambda: test_mask(mask_mod=causal_mask),
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/local/home/chayut/workplace/attention-gym/examples/benchmark.py", line 140, in test_mask
    torch.testing.assert_close(flex, sdpa_mask, atol=1e-1, rtol=1e-2)
  File "/home/chayut/.local/share/mise/installs/python/3.12.7/lib/python3.12/site-packages/torch/testing/_comparison.py", line 1530, in assert_close
    raise error_metas[0].to_error(msg)
AssertionError: Tensor-likes are not close!

Mismatched elements: 6176101 / 134217728 (4.6%)
Greatest absolute difference: 4.546875 at index (10, 0, 2, 63) (up to 0.1 allowed)
Greatest relative difference: inf at index (0, 3, 5186, 54) (up to 0.01 allowed)

I'm not sure what you mean by "which example is this"?

drisspg · 2024-10-31T16:58:13Z

What version of PyTorch are you using? At least locally I am unable to reproduce this

chayut-t · 2024-11-01T09:43:33Z

PyTorch 2.5.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FlexAttention Output Differs from SDPA #62

FlexAttention Output Differs from SDPA #62

chayut-t commented Oct 22, 2024

drisspg commented Oct 28, 2024

chayut-t commented Oct 31, 2024

drisspg commented Oct 31, 2024

chayut-t commented Nov 1, 2024

FlexAttention Output Differs from SDPA #62

FlexAttention Output Differs from SDPA #62

Comments

chayut-t commented Oct 22, 2024

drisspg commented Oct 28, 2024

chayut-t commented Oct 31, 2024

drisspg commented Oct 31, 2024

chayut-t commented Nov 1, 2024