You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When cross-compiling for Android using NDK toolchain, Flash Attention fails to build in CPU-only mode but succeeds when Vulkan backend is enabled, despite being documented as CPU-only feature.
Environment:
- Android NDK: 28.0.12433566 - Target: arm64-v8a (Android 28) - Build system: CMake with Ninja - Host OS: Windows
D:/Building_test/stable-diffusion.cpp/ggml_extend.hpp:679:31: error: use of undeclared identifier 'ggml_flash_attn'; did you mean 'ggml_hash_set'? 679 | struct ggml_tensor* kqv = ggml_flash_attn(ctx, q, k, v, false); | ^
Same build command succeeds when adding `-DSD_VULKAN=ON
Expected behavior: Flash Attention should build successfully in CPU-only mode since it's documented as a CPU-only feature.
Actual behavior: Flash Attention only builds when Vulkan backend is enabled, suggesting the implementation may be incorrectly tied to GPU backend definitions.
EDIT : Nevermind, I just came accross this PR #386 (comment)
The text was updated successfully, but these errors were encountered:
When cross-compiling for Android using NDK toolchain, Flash Attention fails to build in CPU-only mode but succeeds when Vulkan backend is enabled, despite being documented as CPU-only feature.Environment:- Android NDK: 28.0.12433566- Target: arm64-v8a (Android 28)- Build system: CMake with Ninja- Host OS: WindowsBuild command that fails:cmake .. -G "Ninja" -DCMAKE_TOOLCHAIN_FILE=D:\Android_Studio_SDK\ndk\28.0.12433566\build\cmake\android.toolchain.cmake -DANDROID_ABI=arm64-v8a -DANDROID_PLATFORM=android-28 -DCMAKE_MAKE_PROGRAM=D:\Android_Studio_SDK\cmake\3.6.4111459\bin\ninja.exe -DSD_BUILD_SHARED_LIBS=ON -DSD_FLASH_ATTN=ON
Error:D:/Building_test/stable-diffusion.cpp/ggml_extend.hpp:679:31: error: use of undeclared identifier 'ggml_flash_attn'; did you mean 'ggml_hash_set'? 679 | struct ggml_tensor* kqv = ggml_flash_attn(ctx, q, k, v, false); | ^
Same build command succeeds when adding `-DSD_VULKAN=ONExpected behavior: Flash Attention should build successfully in CPU-only mode since it's documented as a CPU-only feature.Actual behavior: Flash Attention only builds when Vulkan backend is enabled, suggesting the implementation may be incorrectly tied to GPU backend definitions.EDIT : Nevermind, I just came accross this PR #386 (comment)
The text was updated successfully, but these errors were encountered: