Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update building for Android #9672

Merged
merged 4 commits into from
Oct 7, 2024
Merged

Update building for Android #9672

merged 4 commits into from
Oct 7, 2024

Conversation

amqdn
Copy link
Contributor

@amqdn amqdn commented Sep 27, 2024

This PR includes:

  1. Updates to the docs for building Android on Termux
  2. Updates to the docs for cross-compiling for Android
  3. Changes to CMake configuration specific to Android

All changes have been tested (at least on aarch64 arm64-v8a) on both:

  1. Termux on Android
  2. adb shell on Android

Caveat: If -c is not provided, the default context can end up over-initializing memory and killing the app (Termux) or crashing the system (adb shell). Since this would require a potentially lower-level fix which affects a wider scope, I have separated the issue into #9671.

Thanks.

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Sep 27, 2024
@ggerganov
Copy link
Owner

Since we don't get much reports for llama.cpp on Android, I'll use the opportunity to ask if you (or anyone else) have tried to run the Vulkan backend on Android devices? Wondering if the Vulkan backend is already capable of utilizing the mobile GPU or if more work is needed there. Any feedback in that regard is appreciated.

@ngxson
Copy link
Collaborator

ngxson commented Sep 28, 2024

This PR probably resolves #8705

AFAIU from the comment of @jsamol , libllama.so can be compiled with vulkan support. We can then use it inside an android app via JNI binding.

@amqdn
Copy link
Contributor Author

amqdn commented Sep 28, 2024

Unfortunately, I cannot speak to Vulkan on Android. It could be a matter of proper configuration to get it working, but I will refrain from more speculation. If I manage to come up with answers, I will report.

@amqdn amqdn force-pushed the master branch 2 times, most recently from af64e60 to 98376b5 Compare September 28, 2024 19:40
@amqdn amqdn requested a review from ggerganov September 28, 2024 20:08
@ggerganov ggerganov added the merge ready indicates that this may be ready to merge soon and is just holding out in case of objections label Sep 29, 2024
@AndrewNLauder
Copy link

Unfortunately, I cannot speak to Vulkan on Android. It could be a matter of proper configuration to get it working, but I will refrain from more speculation. If I manage to come up with answers, I will report.

I successfully compiled llama.android with Vulkan support on Android, but performance was much worse than running on CPU. If I loaded more than 2 layers onto GPU, it would OOM.

@ggerganov
Copy link
Owner

One of the CI workflows is still failing: CI / windows-latest-cmake-sycl (pull_request)

@gustrd
Copy link
Contributor

gustrd commented Oct 1, 2024

Unfortunately, I cannot speak to Vulkan on Android. It could be a matter of proper configuration to get it working, but I will refrain from more speculation. If I manage to come up with answers, I will report.

I successfully compiled llama.android with Vulkan support on Android, but performance was much worse than running on CPU. If I loaded more than 2 layers onto GPU, it would OOM.

Notably, Q4_0_4_4 provided significantly better performance than any GPU build I've tested on Snapdragon devices running Android, even the prompt processing was better than using CLBlast (that already showed some gain over CPU).

@amqdn
Copy link
Contributor Author

amqdn commented Oct 1, 2024

One of the CI workflows is still failing: CI / windows-latest-cmake-sycl (pull_request)

Yes, I saw that. Have been unsure why.

Seems the CMake change is making SYCL attempt to link m.dll again.

Investigating...

@amqdn
Copy link
Contributor Author

amqdn commented Oct 1, 2024

I think I figured it out. Fixing.

@amqdn amqdn force-pushed the master branch 2 times, most recently from 41f5a29 to 44b6851 Compare October 1, 2024 18:23
@max-krasnyansky
Copy link
Collaborator

max-krasnyansky commented Oct 2, 2024

I think it's safe to bump the Android API level to 31 at this point : https://apilevels.com/

The following build works with no additional changes with Android NDK r26b (prev LTS) and r27b releases.

cmake -D CMAKE_TOOLCHAIN_FILE="$NDK/build/cmake/android.toolchain.cmake" -D ANDROID_ABI="arm64-v8a" -D ANDROID_PLATFORM="android-31" -D CMAKE_C_FLAGS="-march=armv8.7a" -D CMAKE_CXX_FLAGS="-march=armv8.7a" -G Ninja -B build-android-arm64
...
cmake --build build-android-arm64

Those CFLAGS should be good for all Android ARM64 devices from 2023/24 and enable Q4_0_4_8 support which is the most performant on the current gen CPUs.

NDK r26 and newer definitely includes OpenMP

cmake-command-above
...
-- Found OpenMP_C: -fopenmp=libomp
-- Found OpenMP_CXX: -fopenmp=libomp
-- Found OpenMP: TRUE
-- OpenMP found
...

However, our threadpool implementation is more efficient at this point so it makes sense to include GGML_OPENMP=OFF.

In other words, I don't think the CMakeFile.txt changes are needed, we should just update the README to recommend NDK r27b and API Level 31.

@amqdn
Copy link
Contributor Author

amqdn commented Oct 3, 2024

Hi, @max-krasnyansky --

I think whichever direction is chosen depends on what kind of (best-effort) support llama.cpp is intended to have for Android, either towards broader device support (with some kind of cut-off) or towards the most powerful and latest. I don't have a strong opinion about that.

As far as the CMakeLists.txt changes specifically, those have to do with linking subtleties re: Bionic; see https://developer.android.com/ndk/guides/stable_apis#c_library.

I'm happy to adjust the README to reflect any recommendations required to steer users of the project.

@slaren
Copy link
Collaborator

slaren commented Oct 3, 2024

I think it's safe to bump the Android API level to 31 at this point : https://apilevels.com/

66.5% coverage seems low for the kind of hardware that we usually support, eg. we have builds for x86 for processors without AVX, which was introduced in 2011. Older phones are perfectly capable of running small LLMs.

@max-krasnyansky
Copy link
Collaborator

I think it's safe to bump the Android API level to 31 at this point : https://apilevels.com/

66.5% coverage seems low for the kind of hardware that we usually support, eg. we have builds for x86 for processors without AVX, which was introduced in 2011. Older phones are perfectly capable of running small LLMs.

That data is a couple of years old but fair point.
We could do API Level 28 which is sufficient to expose all the APIs we're using.

This builds/works just as well (tested with NDK r26b and r27b, on Galaxy S24).

cmake -D CMAKE_TOOLCHAIN_FILE="$NDK/build/cmake/android.toolchain.cmake" -D ANDROID_ABI="arm64-v8a" -D ANDROID_PLATFORM="android-28" -D CMAKE_C_FLAGS="-march=armv8.7a" -D CMAKE_CXX_FLAGS="-march=armv8.7a" -G Ninja -D GGML_OPENMP=OFF -B build-android-arm64

@max-krasnyansky
Copy link
Collaborator

Hi, @max-krasnyansky --

I think whichever direction is chosen depends on what kind of (best-effort) support llama.cpp is intended to have for Android, either towards broader device support (with some kind of cut-off) or towards the most powerful and latest. I don't have a strong opinion about that.

We recently merged PR for runtime detection of the CPU capabilities. So it makes sense to enable all latest CPU features at build time and let the CPU backend check what's available at runtime.

As far as the CMakeLists.txt changes specifically, those have to do with linking subtleties re: Bionic; see https://developer.android.com/ndk/guides/stable_apis#c_library.

The CMake command I provided already links in everything we need.
Here it is again:

cmake -D CMAKE_TOOLCHAIN_FILE="$NDK/build/cmake/android.toolchain.cmake" -D ANDROID_ABI="arm64-v8a" -D ANDROID_PLATFORM="android-28" -D CMAKE_C_FLAGS="-march=armv8.7a" -D CMAKE_CXX_FLAGS="-march=armv8.7a" -G Ninja -D GGML_OPENMP=OFF -B build-android-arm64

If you run verbose build you'll see that it's linking libm explicitly

cmake --build build-android-arm64 --verbose
...
/home/maxk/src/android-ndk-r27b/toolchains/llvm/prebuilt/linux-x86_64/bin/clang++ --target=aarch64-none-linux-android28 --sysroot=/home/maxk/src/android-ndk-r27b/toolchains/llvm/prebuilt/linux-x86_64/sysroot -g -DANDROID -fdata-sections -ffunction-sections -funwind-tables -fstack-protector-strong -no-canonical-prefixes -D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security  -march=armv8.7a -O3 -DNDEBUG -static-libstdc++ -Wl,--build-id=sha1 -Wl,--no-rosegment -Wl,--no-undefined-version -Wl,--fatal-warnings -Wl,--no-undefined -Qunused-arguments   -Wl,--gc-sections examples/server/CMakeFiles/llama-server.dir/server.cpp.o -o bin/llama-server  common/libcommon.a  -pthread  src/libllama.so  ggml/src/libggml.so  -pthread  -latomic -lm

libdl is not being linked explicitly but the linker is happy (ie no link errors) so that symbol must be getting resolved,
and resulting binaries run on the device without errors.

@max-krasnyansky
Copy link
Collaborator

Ah. Ok. GGML_STATIC=ON is busted on Android without linking libdl (missing dladdr).
So the change for libm is not needed (it's already linked properly), and libdl change looks good.

@amqdn
Copy link
Contributor Author

amqdn commented Oct 3, 2024

I am incorporating and considering the discussion thus far and will make edits.

We recently merged PR for runtime detection of the CPU capabilities.

I saw that. If indeed the CPU features are detected at runtime, then I can see why we would include all the features during build (though I don't necessarily understand all the intricacies of -march).

-lm

I appreciate seeing that output. Re-reading the Bionic and Android NDK documentation, I have the understanding that libm "is automatically linked by the build systems," in which case, explicitly linking m privately the way llama.cpp does right now is at least redundant. If we are truly averse to that logic in the CMake file, I will change it.

@max-krasnyansky
Copy link
Collaborator

I'd say let's remote the libm change. No need for redundancy.

I'd also update the CMake command with:

  • ANDROID_PLATFORM=android-28
  • CMAKE_C_FLAGS="-march=armv8.7a"
  • CMAKE_CXX_FLAGS="-march=armv8.7a"

(btw ideally we should just add cmake preset for this, not a blocker just a thought)

The rest looks good to me.

@amqdn
Copy link
Contributor Author

amqdn commented Oct 3, 2024

I will remove the libm change; though, to be clear, master currently links the lib explicitly, which is what I meant.

btw ideally we should just add cmake preset for this

I agree. There is already CMake logic for Android -march, though when I started this PR, I didn't have a good idea about the direction to take with it. I will leave that out of this PR.

@amqdn
Copy link
Contributor Author

amqdn commented Oct 5, 2024

I have completed changes reflecting this discussion and have tested the builds myself using Android NDK r25c, r26b, and r27b.

One thing I will note is that, some time in the last week, the tokens/s performance (using llama-simple) in adb shell has dropped by about half. Very striking, and no change related to this PR seems to have made a difference either way (NDK, API, -march).

@ggerganov
Copy link
Owner

One thing I will note is that, some time in the last week, the tokens/s performance (using llama-simple) in adb shell has dropped by about half.

Can you pinpoint the commit that introduced the regression? What is the exact llama-simple command that you use?

@amqdn
Copy link
Contributor Author

amqdn commented Oct 7, 2024

I have checked out many commits (all before mine) and have yet to pinpoint it. It could be something else.

The exact command I have been using is LD_LIBRARY_PATH=android/lib ./android/bin/llama-simple -m Q2_K-Meta-Llama-3.1-8B-Instruct.gguf -c 4096.

@amqdn
Copy link
Contributor Author

amqdn commented Oct 7, 2024

FYI, I tried this on Termux (both w/ and w/o my commits) and I did not observe the same regression.
Something weird is afoot. If I discover something, I will report.

@ggerganov
Copy link
Owner

Thanks. I think we are good to merge this. Agree?

@dcale
Copy link

dcale commented Oct 7, 2024

as a heads-up armv8.7a will not work with older devices i.e. Pixel 6 pro devices (3 year's old device, 2021), even though these devices are running recent Android versions (Android 14)

@max-krasnyansky
Copy link
Collaborator

as a heads-up armv8.7a will not work with older devices i.e. Pixel 6 pro devices (3 year's old device, 2021), even though these devices are running recent Android versions (Android 14)

It should because of the runtime detection of the CPU capabilities (such as MATMUL_INT8, etc).
In theory, it's possible that the compiler will use one of those instructions elsewhere but it's quite unlikely.

Can you try the latest on Pixel 6 Pro?
Let us know if it doesn't work and we'll iterate further if needed.

@max-krasnyansky max-krasnyansky merged commit f1af42f into ggerganov:master Oct 7, 2024
53 checks passed
@amqdn
Copy link
Contributor Author

amqdn commented Oct 7, 2024

Thanks, all.
@dcale, thanks for your report. Please follow up with more info so we can continue to clarify any issues.

@dcale
Copy link

dcale commented Oct 8, 2024

as a heads-up armv8.7a will not work with older devices i.e. Pixel 6 pro devices (3 year's old device, 2021), even though these devices are running recent Android versions (Android 14)

It should because of the runtime detection of the CPU capabilities (such as MATMUL_INT8, etc). In theory, it's possible that the compiler will use one of those instructions elsewhere but it's quite unlikely.

Can you try the latest on Pixel 6 Pro? Let us know if it doesn't work and we'll iterate further if needed.

I'll do and report back.

dsx1986 pushed a commit to dsx1986/llama.cpp that referenced this pull request Oct 29, 2024
* docs : clarify building Android on Termux

* docs : update building Android on Termux

* docs : add cross-compiling for Android

* cmake : link dl explicitly for Android
@jsamol
Copy link

jsamol commented Nov 6, 2024

@max-krasnyansky @amqdn Running llama compiled with the -march=armv8.7a flag results in SIGILL (ILL_ILLOPC) on the Pixel 4a (2020). To be fair, the old target, armv8.4a+dotprod, wasn't much better. Only targeting armv8.2-a, which I believe is the highest compatible version for that model, makes it work.

@amqdn
Copy link
Contributor Author

amqdn commented Nov 6, 2024

@jsamol Thanks for the report. I will defer to the others about what to do here.

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 15, 2024
* docs : clarify building Android on Termux

* docs : update building Android on Termux

* docs : add cross-compiling for Android

* cmake : link dl explicitly for Android
@rmatif
Copy link

rmatif commented Nov 16, 2024

Since we don't get much reports for llama.cpp on Android, I'll use the opportunity to ask if you (or anyone else) have tried to run the Vulkan backend on Android devices? Wondering if the Vulkan backend is already capable of utilizing the mobile GPU or if more work is needed there. Any feedback in that regard is appreciated.

I did try it. Built a host for shader generation and got the correct Vulkan HPP headers (they're missing in NDK >26 and incomplete in NDK ≤26). Compilation works fine but performance is 2x worse than CPU-only. It also seems to use more RAM even though logs show the same amount with/without Vulkan

Clearly some optimizations are needed here

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 18, 2024
* docs : clarify building Android on Termux

* docs : update building Android on Termux

* docs : add cross-compiling for Android

* cmake : link dl explicitly for Android
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation merge ready indicates that this may be ready to merge soon and is just holding out in case of objections
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants