Update building for Android #9672

amqdn · 2024-09-27T22:10:54Z

This PR includes:

Updates to the docs for building Android on Termux
Updates to the docs for cross-compiling for Android
Changes to CMake configuration specific to Android

All changes have been tested (at least on aarch64 arm64-v8a) on both:

Termux on Android
adb shell on Android

Caveat: If -c is not provided, the default context can end up over-initializing memory and killing the app (Termux) or crashing the system (adb shell). Since this would require a potentially lower-level fix which affects a wider scope, I have separated the issue into #9671.

Thanks.

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

ggerganov · 2024-09-28T10:23:26Z

Since we don't get much reports for llama.cpp on Android, I'll use the opportunity to ask if you (or anyone else) have tried to run the Vulkan backend on Android devices? Wondering if the Vulkan backend is already capable of utilizing the mobile GPU or if more work is needed there. Any feedback in that regard is appreciated.

ngxson · 2024-09-28T14:51:31Z

This PR probably resolves #8705

AFAIU from the comment of @jsamol , libllama.so can be compiled with vulkan support. We can then use it inside an android app via JNI binding.

amqdn · 2024-09-28T18:39:33Z

Unfortunately, I cannot speak to Vulkan on Android. It could be a matter of proper configuration to get it working, but I will refrain from more speculation. If I manage to come up with answers, I will report.

AndrewNLauder · 2024-10-01T03:34:04Z

Unfortunately, I cannot speak to Vulkan on Android. It could be a matter of proper configuration to get it working, but I will refrain from more speculation. If I manage to come up with answers, I will report.

I successfully compiled llama.android with Vulkan support on Android, but performance was much worse than running on CPU. If I loaded more than 2 layers onto GPU, it would OOM.

ggerganov · 2024-10-01T06:27:37Z

One of the CI workflows is still failing: CI / windows-latest-cmake-sycl (pull_request)

gustrd · 2024-10-01T12:43:48Z

Unfortunately, I cannot speak to Vulkan on Android. It could be a matter of proper configuration to get it working, but I will refrain from more speculation. If I manage to come up with answers, I will report.

I successfully compiled llama.android with Vulkan support on Android, but performance was much worse than running on CPU. If I loaded more than 2 layers onto GPU, it would OOM.

Notably, Q4_0_4_4 provided significantly better performance than any GPU build I've tested on Snapdragon devices running Android, even the prompt processing was better than using CLBlast (that already showed some gain over CPU).

amqdn · 2024-10-01T17:27:40Z

One of the CI workflows is still failing: CI / windows-latest-cmake-sycl (pull_request)

Yes, I saw that. Have been unsure why.

Seems the CMake change is making SYCL attempt to link m.dll again.

Investigating...

amqdn · 2024-10-01T17:47:31Z

I think I figured it out. Fixing.

max-krasnyansky · 2024-10-02T22:48:03Z

I think it's safe to bump the Android API level to 31 at this point : https://apilevels.com/

The following build works with no additional changes with Android NDK r26b (prev LTS) and r27b releases.

cmake -D CMAKE_TOOLCHAIN_FILE="$NDK/build/cmake/android.toolchain.cmake" -D ANDROID_ABI="arm64-v8a" -D ANDROID_PLATFORM="android-31" -D CMAKE_C_FLAGS="-march=armv8.7a" -D CMAKE_CXX_FLAGS="-march=armv8.7a" -G Ninja -B build-android-arm64
...
cmake --build build-android-arm64

Those CFLAGS should be good for all Android ARM64 devices from 2023/24 and enable Q4_0_4_8 support which is the most performant on the current gen CPUs.

NDK r26 and newer definitely includes OpenMP

cmake-command-above
...
-- Found OpenMP_C: -fopenmp=libomp
-- Found OpenMP_CXX: -fopenmp=libomp
-- Found OpenMP: TRUE
-- OpenMP found
...

However, our threadpool implementation is more efficient at this point so it makes sense to include GGML_OPENMP=OFF.

In other words, I don't think the CMakeFile.txt changes are needed, we should just update the README to recommend NDK r27b and API Level 31.

amqdn · 2024-10-03T00:26:50Z

Hi, @max-krasnyansky --

I think whichever direction is chosen depends on what kind of (best-effort) support llama.cpp is intended to have for Android, either towards broader device support (with some kind of cut-off) or towards the most powerful and latest. I don't have a strong opinion about that.

As far as the CMakeLists.txt changes specifically, those have to do with linking subtleties re: Bionic; see https://developer.android.com/ndk/guides/stable_apis#c_library.

I'm happy to adjust the README to reflect any recommendations required to steer users of the project.

slaren · 2024-10-03T00:53:59Z

I think it's safe to bump the Android API level to 31 at this point : https://apilevels.com/

66.5% coverage seems low for the kind of hardware that we usually support, eg. we have builds for x86 for processors without AVX, which was introduced in 2011. Older phones are perfectly capable of running small LLMs.

max-krasnyansky · 2024-10-03T01:43:43Z

I think it's safe to bump the Android API level to 31 at this point : https://apilevels.com/

66.5% coverage seems low for the kind of hardware that we usually support, eg. we have builds for x86 for processors without AVX, which was introduced in 2011. Older phones are perfectly capable of running small LLMs.

That data is a couple of years old but fair point.
We could do API Level 28 which is sufficient to expose all the APIs we're using.

This builds/works just as well (tested with NDK r26b and r27b, on Galaxy S24).

cmake -D CMAKE_TOOLCHAIN_FILE="$NDK/build/cmake/android.toolchain.cmake" -D ANDROID_ABI="arm64-v8a" -D ANDROID_PLATFORM="android-28" -D CMAKE_C_FLAGS="-march=armv8.7a" -D CMAKE_CXX_FLAGS="-march=armv8.7a" -G Ninja -D GGML_OPENMP=OFF -B build-android-arm64

max-krasnyansky · 2024-10-03T02:21:53Z

Hi, @max-krasnyansky --

I think whichever direction is chosen depends on what kind of (best-effort) support llama.cpp is intended to have for Android, either towards broader device support (with some kind of cut-off) or towards the most powerful and latest. I don't have a strong opinion about that.

We recently merged PR for runtime detection of the CPU capabilities. So it makes sense to enable all latest CPU features at build time and let the CPU backend check what's available at runtime.

As far as the CMakeLists.txt changes specifically, those have to do with linking subtleties re: Bionic; see https://developer.android.com/ndk/guides/stable_apis#c_library.

The CMake command I provided already links in everything we need.
Here it is again:

cmake -D CMAKE_TOOLCHAIN_FILE="$NDK/build/cmake/android.toolchain.cmake" -D ANDROID_ABI="arm64-v8a" -D ANDROID_PLATFORM="android-28" -D CMAKE_C_FLAGS="-march=armv8.7a" -D CMAKE_CXX_FLAGS="-march=armv8.7a" -G Ninja -D GGML_OPENMP=OFF -B build-android-arm64

If you run verbose build you'll see that it's linking libm explicitly

cmake --build build-android-arm64 --verbose
...
/home/maxk/src/android-ndk-r27b/toolchains/llvm/prebuilt/linux-x86_64/bin/clang++ --target=aarch64-none-linux-android28 --sysroot=/home/maxk/src/android-ndk-r27b/toolchains/llvm/prebuilt/linux-x86_64/sysroot -g -DANDROID -fdata-sections -ffunction-sections -funwind-tables -fstack-protector-strong -no-canonical-prefixes -D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security  -march=armv8.7a -O3 -DNDEBUG -static-libstdc++ -Wl,--build-id=sha1 -Wl,--no-rosegment -Wl,--no-undefined-version -Wl,--fatal-warnings -Wl,--no-undefined -Qunused-arguments   -Wl,--gc-sections examples/server/CMakeFiles/llama-server.dir/server.cpp.o -o bin/llama-server  common/libcommon.a  -pthread  src/libllama.so  ggml/src/libggml.so  -pthread  -latomic -lm

libdl is not being linked explicitly but the linker is happy (ie no link errors) so that symbol must be getting resolved,
and resulting binaries run on the device without errors.

max-krasnyansky · 2024-10-03T02:29:29Z

Ah. Ok. GGML_STATIC=ON is busted on Android without linking libdl (missing dladdr).
So the change for libm is not needed (it's already linked properly), and libdl change looks good.

amqdn · 2024-10-03T18:24:00Z

I am incorporating and considering the discussion thus far and will make edits.

We recently merged PR for runtime detection of the CPU capabilities.

I saw that. If indeed the CPU features are detected at runtime, then I can see why we would include all the features during build (though I don't necessarily understand all the intricacies of -march).

-lm

I appreciate seeing that output. Re-reading the Bionic and Android NDK documentation, I have the understanding that libm "is automatically linked by the build systems," in which case, explicitly linking m privately the way llama.cpp does right now is at least redundant. If we are truly averse to that logic in the CMake file, I will change it.

max-krasnyansky · 2024-10-03T21:26:23Z

I'd say let's remote the libm change. No need for redundancy.

I'd also update the CMake command with:

ANDROID_PLATFORM=android-28
CMAKE_C_FLAGS="-march=armv8.7a"
CMAKE_CXX_FLAGS="-march=armv8.7a"

(btw ideally we should just add cmake preset for this, not a blocker just a thought)

The rest looks good to me.

amqdn · 2024-10-03T23:05:48Z

I will remove the libm change; though, to be clear, master currently links the lib explicitly, which is what I meant.

btw ideally we should just add cmake preset for this

I agree. There is already CMake logic for Android -march, though when I started this PR, I didn't have a good idea about the direction to take with it. I will leave that out of this PR.

amqdn · 2024-10-05T19:59:02Z

I have completed changes reflecting this discussion and have tested the builds myself using Android NDK r25c, r26b, and r27b.

One thing I will note is that, some time in the last week, the tokens/s performance (using llama-simple) in adb shell has dropped by about half. Very striking, and no change related to this PR seems to have made a difference either way (NDK, API, -march).

ggerganov · 2024-10-06T11:19:28Z

One thing I will note is that, some time in the last week, the tokens/s performance (using llama-simple) in adb shell has dropped by about half.

Can you pinpoint the commit that introduced the regression? What is the exact llama-simple command that you use?

amqdn · 2024-10-07T01:12:58Z

I have checked out many commits (all before mine) and have yet to pinpoint it. It could be something else.

The exact command I have been using is LD_LIBRARY_PATH=android/lib ./android/bin/llama-simple -m Q2_K-Meta-Llama-3.1-8B-Instruct.gguf -c 4096.

amqdn · 2024-10-07T04:33:01Z

FYI, I tried this on Termux (both w/ and w/o my commits) and I did not observe the same regression.
Something weird is afoot. If I discover something, I will report.

ggerganov · 2024-10-07T08:56:01Z

Thanks. I think we are good to merge this. Agree?

dcale · 2024-10-07T09:26:42Z

as a heads-up armv8.7a will not work with older devices i.e. Pixel 6 pro devices (3 year's old device, 2021), even though these devices are running recent Android versions (Android 14)

max-krasnyansky · 2024-10-07T16:37:26Z

as a heads-up armv8.7a will not work with older devices i.e. Pixel 6 pro devices (3 year's old device, 2021), even though these devices are running recent Android versions (Android 14)

It should because of the runtime detection of the CPU capabilities (such as MATMUL_INT8, etc).
In theory, it's possible that the compiler will use one of those instructions elsewhere but it's quite unlikely.

Can you try the latest on Pixel 6 Pro?
Let us know if it doesn't work and we'll iterate further if needed.

amqdn · 2024-10-07T17:46:51Z

Thanks, all.
@dcale, thanks for your report. Please follow up with more info so we can continue to clarify any issues.

dcale · 2024-10-08T06:21:56Z

as a heads-up armv8.7a will not work with older devices i.e. Pixel 6 pro devices (3 year's old device, 2021), even though these devices are running recent Android versions (Android 14)

It should because of the runtime detection of the CPU capabilities (such as MATMUL_INT8, etc). In theory, it's possible that the compiler will use one of those instructions elsewhere but it's quite unlikely.

Can you try the latest on Pixel 6 Pro? Let us know if it doesn't work and we'll iterate further if needed.

I'll do and report back.

* docs : clarify building Android on Termux * docs : update building Android on Termux * docs : add cross-compiling for Android * cmake : link dl explicitly for Android

jsamol · 2024-11-06T13:45:40Z

@max-krasnyansky @amqdn Running llama compiled with the -march=armv8.7a flag results in SIGILL (ILL_ILLOPC) on the Pixel 4a (2020). To be fair, the old target, armv8.4a+dotprod, wasn't much better. Only targeting armv8.2-a, which I believe is the highest compatible version for that model, makes it work.

amqdn · 2024-11-06T18:25:07Z

@jsamol Thanks for the report. I will defer to the others about what to do here.

* docs : clarify building Android on Termux * docs : update building Android on Termux * docs : add cross-compiling for Android * cmake : link dl explicitly for Android

rmatif · 2024-11-16T12:30:33Z

Since we don't get much reports for llama.cpp on Android, I'll use the opportunity to ask if you (or anyone else) have tried to run the Vulkan backend on Android devices? Wondering if the Vulkan backend is already capable of utilizing the mobile GPU or if more work is needed there. Any feedback in that regard is appreciated.

I did try it. Built a host for shader generation and got the correct Vulkan HPP headers (they're missing in NDK >26 and incomplete in NDK ≤26). Compilation works fine but performance is 2x worse than CPU-only. It also seems to use more RAM even though logs show the same amount with/without Vulkan

Clearly some optimizations are needed here

* docs : clarify building Android on Termux * docs : update building Android on Termux * docs : add cross-compiling for Android * cmake : link dl explicitly for Android

github-actions bot added the documentation Improvements or additions to documentation label Sep 27, 2024

ggerganov approved these changes Sep 28, 2024

View reviewed changes

amqdn force-pushed the master branch 2 times, most recently from af64e60 to 98376b5 Compare September 28, 2024 19:40

amqdn requested a review from ggerganov September 28, 2024 20:08

ggerganov approved these changes Sep 29, 2024

View reviewed changes

ggerganov added the merge ready indicates that this may be ready to merge soon and is just holding out in case of objections label Sep 29, 2024

amqdn force-pushed the master branch from 98376b5 to 769caf8 Compare September 30, 2024 18:57

amqdn force-pushed the master branch from 769caf8 to 2910218 Compare October 1, 2024 17:38

amqdn force-pushed the master branch 2 times, most recently from 41f5a29 to 44b6851 Compare October 1, 2024 18:23

amqdn added 2 commits October 5, 2024 12:44

docs : clarify building Android on Termux

2f86523

docs : update building Android on Termux

49a2fd0

amqdn added 2 commits October 5, 2024 14:48

docs : add cross-compiling for Android

fa049cd

cmake : link dl explicitly for Android

e179dd4

amqdn force-pushed the master branch from 44b6851 to e179dd4 Compare October 5, 2024 19:52

max-krasnyansky merged commit f1af42f into ggerganov:master Oct 7, 2024
53 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update building for Android #9672

Update building for Android #9672

amqdn commented Sep 27, 2024

ggerganov commented Sep 28, 2024

ngxson commented Sep 28, 2024 •

edited

Loading

amqdn commented Sep 28, 2024

AndrewNLauder commented Oct 1, 2024

ggerganov commented Oct 1, 2024

gustrd commented Oct 1, 2024

amqdn commented Oct 1, 2024

amqdn commented Oct 1, 2024

max-krasnyansky commented Oct 2, 2024 •

edited

Loading

amqdn commented Oct 3, 2024

slaren commented Oct 3, 2024

max-krasnyansky commented Oct 3, 2024

max-krasnyansky commented Oct 3, 2024

max-krasnyansky commented Oct 3, 2024

amqdn commented Oct 3, 2024

max-krasnyansky commented Oct 3, 2024

amqdn commented Oct 3, 2024

amqdn commented Oct 5, 2024

ggerganov commented Oct 6, 2024

amqdn commented Oct 7, 2024

amqdn commented Oct 7, 2024

ggerganov commented Oct 7, 2024

dcale commented Oct 7, 2024

max-krasnyansky commented Oct 7, 2024

amqdn commented Oct 7, 2024

dcale commented Oct 8, 2024

jsamol commented Nov 6, 2024

amqdn commented Nov 6, 2024

rmatif commented Nov 16, 2024

Update building for Android #9672

Update building for Android #9672

Conversation

amqdn commented Sep 27, 2024

ggerganov commented Sep 28, 2024

ngxson commented Sep 28, 2024 • edited Loading

amqdn commented Sep 28, 2024

AndrewNLauder commented Oct 1, 2024

ggerganov commented Oct 1, 2024

gustrd commented Oct 1, 2024

amqdn commented Oct 1, 2024

amqdn commented Oct 1, 2024

max-krasnyansky commented Oct 2, 2024 • edited Loading

amqdn commented Oct 3, 2024

slaren commented Oct 3, 2024

max-krasnyansky commented Oct 3, 2024

max-krasnyansky commented Oct 3, 2024

max-krasnyansky commented Oct 3, 2024

amqdn commented Oct 3, 2024

max-krasnyansky commented Oct 3, 2024

amqdn commented Oct 3, 2024

amqdn commented Oct 5, 2024

ggerganov commented Oct 6, 2024

amqdn commented Oct 7, 2024

amqdn commented Oct 7, 2024

ggerganov commented Oct 7, 2024

dcale commented Oct 7, 2024

max-krasnyansky commented Oct 7, 2024

amqdn commented Oct 7, 2024

dcale commented Oct 8, 2024

jsamol commented Nov 6, 2024

amqdn commented Nov 6, 2024

rmatif commented Nov 16, 2024

ngxson commented Sep 28, 2024 •

edited

Loading

max-krasnyansky commented Oct 2, 2024 •

edited

Loading