-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update building for Android #9672
Conversation
Since we don't get much reports for |
Unfortunately, I cannot speak to Vulkan on Android. It could be a matter of proper configuration to get it working, but I will refrain from more speculation. If I manage to come up with answers, I will report. |
af64e60
to
98376b5
Compare
I successfully compiled llama.android with Vulkan support on Android, but performance was much worse than running on CPU. If I loaded more than 2 layers onto GPU, it would OOM. |
One of the CI workflows is still failing: |
Notably, Q4_0_4_4 provided significantly better performance than any GPU build I've tested on Snapdragon devices running Android, even the prompt processing was better than using CLBlast (that already showed some gain over CPU). |
Yes, I saw that. Have been unsure why. Seems the CMake change is making Investigating... |
I think I figured it out. Fixing. |
41f5a29
to
44b6851
Compare
I think it's safe to bump the Android API level to 31 at this point : https://apilevels.com/ The following build works with no additional changes with Android NDK r26b (prev LTS) and r27b releases.
Those CFLAGS should be good for all Android ARM64 devices from 2023/24 and enable Q4_0_4_8 support which is the most performant on the current gen CPUs. NDK r26 and newer definitely includes OpenMP
However, our threadpool implementation is more efficient at this point so it makes sense to include GGML_OPENMP=OFF. In other words, I don't think the CMakeFile.txt changes are needed, we should just update the README to recommend NDK r27b and API Level 31. |
Hi, @max-krasnyansky -- I think whichever direction is chosen depends on what kind of (best-effort) support As far as the I'm happy to adjust the README to reflect any recommendations required to steer users of the project. |
66.5% coverage seems low for the kind of hardware that we usually support, eg. we have builds for x86 for processors without AVX, which was introduced in 2011. Older phones are perfectly capable of running small LLMs. |
That data is a couple of years old but fair point. This builds/works just as well (tested with NDK r26b and r27b, on Galaxy S24).
|
We recently merged PR for runtime detection of the CPU capabilities. So it makes sense to enable all latest CPU features at build time and let the CPU backend check what's available at runtime.
The CMake command I provided already links in everything we need.
If you run verbose build you'll see that it's linking libm explicitly
libdl is not being linked explicitly but the linker is happy (ie no link errors) so that symbol must be getting resolved, |
Ah. Ok. |
I am incorporating and considering the discussion thus far and will make edits.
I saw that. If indeed the CPU features are detected at runtime, then I can see why we would include all the features during build (though I don't necessarily understand all the intricacies of
I appreciate seeing that output. Re-reading the Bionic and Android NDK documentation, I have the understanding that |
I'd say let's remote the libm change. No need for redundancy. I'd also update the CMake command with:
(btw ideally we should just add cmake preset for this, not a blocker just a thought) The rest looks good to me. |
I will remove the
I agree. There is already CMake logic for Android |
I have completed changes reflecting this discussion and have tested the builds myself using Android NDK r25c, r26b, and r27b. One thing I will note is that, some time in the last week, the tokens/s performance (using |
Can you pinpoint the commit that introduced the regression? What is the exact |
I have checked out many commits (all before mine) and have yet to pinpoint it. It could be something else. The exact command I have been using is |
FYI, I tried this on Termux (both w/ and w/o my commits) and I did not observe the same regression. |
Thanks. I think we are good to merge this. Agree? |
as a heads-up armv8.7a will not work with older devices i.e. Pixel 6 pro devices (3 year's old device, 2021), even though these devices are running recent Android versions (Android 14) |
It should because of the runtime detection of the CPU capabilities (such as MATMUL_INT8, etc). Can you try the latest on Pixel 6 Pro? |
Thanks, all. |
I'll do and report back. |
* docs : clarify building Android on Termux * docs : update building Android on Termux * docs : add cross-compiling for Android * cmake : link dl explicitly for Android
@max-krasnyansky @amqdn Running llama compiled with the |
@jsamol Thanks for the report. I will defer to the others about what to do here. |
* docs : clarify building Android on Termux * docs : update building Android on Termux * docs : add cross-compiling for Android * cmake : link dl explicitly for Android
I did try it. Built a host for shader generation and got the correct Vulkan HPP headers (they're missing in NDK >26 and incomplete in NDK ≤26). Compilation works fine but performance is 2x worse than CPU-only. It also seems to use more RAM even though logs show the same amount with/without Vulkan Clearly some optimizations are needed here |
* docs : clarify building Android on Termux * docs : update building Android on Termux * docs : add cross-compiling for Android * cmake : link dl explicitly for Android
This PR includes:
All changes have been tested (at least on
aarch64
arm64-v8a
) on both:adb shell
on AndroidCaveat: If
-c
is not provided, the default context can end up over-initializing memory and killing the app (Termux) or crashing the system (adb shell
). Since this would require a potentially lower-level fix which affects a wider scope, I have separated the issue into #9671.Thanks.