Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pull] master from ggerganov:master #151

Closed
wants to merge 46 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
1607a5e
backend cpu: add online flow for aarch64 Q4_0 GEMV/GEMM kernels (#9921)
chaxu01 Nov 15, 2024
5a54af4
sycl: Use syclcompat::dp4a (#10267)
Rbiessy Nov 15, 2024
4802ad3
scripts : fix regex in sync [no ci]
ggerganov Nov 15, 2024
231f936
cann: dockerfile and doc adjustment (#10302)
noemotiovon Nov 15, 2024
9901068
server : (web UI) add copy button for code block, fix api key (#10242)
ngxson Nov 15, 2024
57f8355
sycl: Update Intel docker images to use DPC++ 2025.0 (#10305)
Rbiessy Nov 15, 2024
f0204a0
ci: build test musa with cmake (#10298)
yeahdongcn Nov 15, 2024
1842922
AVX BF16 and single scale quant optimizations (#10212)
netrunnereve Nov 15, 2024
cbf5541
sync : ggml
ggerganov Nov 15, 2024
3225008
ggml : vulkan logs (whisper/2547)
thewh1teagle Nov 15, 2024
09ecbcb
cmake : fix ppc64 check (whisper/0)
ggerganov Nov 15, 2024
883d206
ggml : fix some build issues
slaren Nov 15, 2024
4047be7
scripts: update compare-llama-bench.py (#10319)
JohannesGaessler Nov 15, 2024
74d73dc
Make updates to fix issues with clang-cl builds while using AVX512 fl…
Srihari-mcw Nov 15, 2024
89e4caa
llama : save number of parameters and the size in llama_model (#10286)
FirstTimeEZ Nov 16, 2024
1e58ee1
ggml : optimize Q4_0 into Q4_0_X_Y repack (#10324)
eddnjjn Nov 16, 2024
dd3a6ce
vulkan : add cmake preset debug/release (#10306)
FirstTimeEZ Nov 16, 2024
772703c
vulkan: Optimize some mat-vec mul quant shaders (#10296)
jeffbolznv Nov 16, 2024
f245cc2
scripts : fix missing key in compare-llama-bench.py (#10332)
ggerganov Nov 16, 2024
bcdb7a2
server: (web UI) Add samplers sequence customization (#10255)
MaggotHATE Nov 16, 2024
8ee0d09
make : auto-determine dependencies (#0)
ggerganov Nov 16, 2024
db4cfd5
llamafile : fix include path (#0)
ggerganov Nov 16, 2024
4e54be0
llama/ex: remove --logdir argument (#10339)
JohannesGaessler Nov 16, 2024
0fff7fd
docs : vulkan build instructions to use git bash mingw64 (#10303)
FirstTimeEZ Nov 16, 2024
5c9a8b2
scripts : update sync
ggerganov Nov 16, 2024
8a43e94
ggml: new optimization interface (ggml/988)
JohannesGaessler Nov 16, 2024
68fcb47
ggml : fix compile warnings (#0)
ggerganov Nov 16, 2024
84274a1
tests : remove test-grad0
ggerganov Nov 16, 2024
a4200ca
make : add ggml-opt (#0)
ggerganov Nov 16, 2024
5d9e599
ggml : adapt AMX to tensor->grad removal (#0)
ggerganov Nov 16, 2024
24203e9
ggml : inttypes.h -> cinttypes (#0)
ggerganov Nov 16, 2024
eda7e1d
ggml : fix possible buffer use after free in sched reserve (#9930)
slaren Nov 17, 2024
467576b
CMake: default to -arch=native for CUDA build (#10320)
JohannesGaessler Nov 17, 2024
c3ea58a
CUDA: remove DMMV, consolidate F16 mult mat vec (#10318)
JohannesGaessler Nov 17, 2024
a431782
ggml : fix undefined reference to 'getcpu' (#10354)
FirstTimeEZ Nov 17, 2024
cf32a9b
metal : refactor kernel args into structs (#10238)
ggerganov Nov 17, 2024
20a780c
gitignore : ignore local run scripts [no ci]
ggerganov Nov 17, 2024
be5cacc
llama : only use default buffer types for the KV cache (#10358)
slaren Nov 17, 2024
ce2e59b
CMake: fix typo in comment [no ci] (#10360)
JohannesGaessler Nov 17, 2024
76e9e58
CUDA: fix MMV kernel being used for FP16 src1 (#10357)
JohannesGaessler Nov 17, 2024
75207b3
docker: use GGML_NATIVE=OFF (#10368)
JohannesGaessler Nov 17, 2024
9b75f03
Vulkan: Fix device info output format specifiers (#10366)
0cc4m Nov 18, 2024
2eb76b2
flake.lock: Update (#10346)
ggerganov Nov 18, 2024
f139d2e
vulkan: remove use of null initializer (#10372)
jeffbolznv Nov 18, 2024
531cb1c
Skip searching root path for cross-compile builds (#10383)
bandoti Nov 18, 2024
d3481e6
cuda : only use native when supported by cmake (#10389)
slaren Nov 18, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .devops/full-cuda.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ COPY . .
RUN if [ "${CUDA_DOCKER_ARCH}" != "default" ]; then \
export CMAKE_ARGS="-DCMAKE_CUDA_ARCHITECTURES=${CUDA_DOCKER_ARCH}"; \
fi && \
cmake -B build -DGGML_CUDA=ON -DLLAMA_CURL=ON ${CMAKE_ARGS} -DCMAKE_EXE_LINKER_FLAGS=-Wl,--allow-shlib-undefined . && \
cmake -B build -DGGML_NATIVE=OFF -DGGML_CUDA=ON -DLLAMA_CURL=ON ${CMAKE_ARGS} -DCMAKE_EXE_LINKER_FLAGS=-Wl,--allow-shlib-undefined . && \
cmake --build build --config Release -j$(nproc) && \
cp build/bin/* .

Expand Down
2 changes: 1 addition & 1 deletion .devops/full-musa.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ WORKDIR /app

COPY . .

RUN cmake -B build -DGGML_MUSA=ON -DLLAMA_CURL=ON ${CMAKE_ARGS} -DCMAKE_EXE_LINKER_FLAGS=-Wl,--allow-shlib-undefined . && \
RUN cmake -B build -DGGML_NATIVE=OFF -DGGML_MUSA=ON -DLLAMA_CURL=ON ${CMAKE_ARGS} -DCMAKE_EXE_LINKER_FLAGS=-Wl,--allow-shlib-undefined . && \
cmake --build build --config Release -j$(nproc) && \
cp build/bin/* .

Expand Down
6 changes: 3 additions & 3 deletions .devops/llama-cli-cann.Dockerfile
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
ARG ASCEND_VERSION=8.0.rc2.alpha003-910b-openeuler22.03-py3.8

FROM cosdt/cann:$ASCEND_VERSION AS build
FROM ascendai/cann:$ASCEND_VERSION AS build

WORKDIR /app

Expand All @@ -22,11 +22,11 @@ ENV LD_LIBRARY_PATH=${ASCEND_TOOLKIT_HOME}/runtime/lib64/stub:$LD_LIBRARY_PATH

RUN echo "Building with static libs" && \
source /usr/local/Ascend/ascend-toolkit/set_env.sh --force && \
cmake -B build -DGGML_CANN=ON -DBUILD_SHARED_LIBS=OFF && \
cmake -B build -DGGML_NATIVE=OFF -DGGML_CANN=ON -DBUILD_SHARED_LIBS=OFF && \
cmake --build build --config Release --target llama-cli

# TODO: use image with NNRT
FROM cosdt/cann:$ASCEND_VERSION AS runtime
FROM ascendai/cann:$ASCEND_VERSION AS runtime
COPY --from=build /app/build/bin/llama-cli /llama-cli

ENV LC_ALL=C.utf8
Expand Down
2 changes: 1 addition & 1 deletion .devops/llama-cli-cuda.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ COPY . .
RUN if [ "${CUDA_DOCKER_ARCH}" != "default" ]; then \
export CMAKE_ARGS="-DCMAKE_CUDA_ARCHITECTURES=${CUDA_DOCKER_ARCH}"; \
fi && \
cmake -B build -DGGML_CUDA=ON ${CMAKE_ARGS} -DCMAKE_EXE_LINKER_FLAGS=-Wl,--allow-shlib-undefined . && \
cmake -B build -DGGML_NATIVE=OFF -DGGML_CUDA=ON ${CMAKE_ARGS} -DCMAKE_EXE_LINKER_FLAGS=-Wl,--allow-shlib-undefined . && \
cmake --build build --config Release --target llama-cli -j$(nproc) && \
mkdir -p /app/lib && \
find build -name "*.so" -exec cp {} /app/lib \;
Expand Down
4 changes: 2 additions & 2 deletions .devops/llama-cli-intel.Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
ARG ONEAPI_VERSION=2024.1.1-devel-ubuntu22.04
ARG ONEAPI_VERSION=2025.0.0-0-devel-ubuntu22.04

FROM intel/oneapi-basekit:$ONEAPI_VERSION AS build

Expand All @@ -15,7 +15,7 @@ RUN if [ "${GGML_SYCL_F16}" = "ON" ]; then \
export OPT_SYCL_F16="-DGGML_SYCL_F16=ON"; \
fi && \
echo "Building with static libs" && \
cmake -B build -DGGML_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx \
cmake -B build -DGGML_NATIVE=OFF -DGGML_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx \
${OPT_SYCL_F16} -DBUILD_SHARED_LIBS=OFF && \
cmake --build build --config Release --target llama-cli

Expand Down
2 changes: 1 addition & 1 deletion .devops/llama-cli-musa.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ WORKDIR /app

COPY . .

RUN cmake -B build -DGGML_MUSA=ON ${CMAKE_ARGS} -DCMAKE_EXE_LINKER_FLAGS=-Wl,--allow-shlib-undefined . && \
RUN cmake -B build -DGGML_NATIVE=OFF -DGGML_MUSA=ON ${CMAKE_ARGS} -DCMAKE_EXE_LINKER_FLAGS=-Wl,--allow-shlib-undefined . && \
cmake --build build --config Release --target llama-cli -j$(nproc) && \
mkdir -p /app/lib && \
find build -name "*.so" -exec cp {} /app/lib \;
Expand Down
2 changes: 1 addition & 1 deletion .devops/llama-cli-vulkan.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ RUN wget -qO - https://packages.lunarg.com/lunarg-signing-key-pub.asc | apt-key
# Build it
WORKDIR /app
COPY . .
RUN cmake -B build -DGGML_VULKAN=1 && \
RUN cmake -B build -DGGML_NATIVE=OFF -DGGML_VULKAN=1 && \
cmake --build build --config Release --target llama-cli

# Clean up
Expand Down
2 changes: 1 addition & 1 deletion .devops/llama-server-cuda.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ COPY . .
RUN if [ "${CUDA_DOCKER_ARCH}" != "default" ]; then \
export CMAKE_ARGS="-DCMAKE_CUDA_ARCHITECTURES=${CUDA_DOCKER_ARCH}"; \
fi && \
cmake -B build -DGGML_CUDA=ON -DLLAMA_CURL=ON ${CMAKE_ARGS} -DCMAKE_EXE_LINKER_FLAGS=-Wl,--allow-shlib-undefined . && \
cmake -B build -DGGML_NATIVE=OFF -DGGML_CUDA=ON -DLLAMA_CURL=ON ${CMAKE_ARGS} -DCMAKE_EXE_LINKER_FLAGS=-Wl,--allow-shlib-undefined . && \
cmake --build build --config Release --target llama-server -j$(nproc) && \
mkdir -p /app/lib && \
find build -name "*.so" -exec cp {} /app/lib \;
Expand Down
4 changes: 2 additions & 2 deletions .devops/llama-server-intel.Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
ARG ONEAPI_VERSION=2024.1.1-devel-ubuntu22.04
ARG ONEAPI_VERSION=2025.0.0-0-devel-ubuntu22.04

FROM intel/oneapi-basekit:$ONEAPI_VERSION AS build

Expand All @@ -15,7 +15,7 @@ RUN if [ "${GGML_SYCL_F16}" = "ON" ]; then \
export OPT_SYCL_F16="-DGGML_SYCL_F16=ON"; \
fi && \
echo "Building with dynamic libs" && \
cmake -B build -DGGML_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DLLAMA_CURL=ON ${OPT_SYCL_F16} && \
cmake -B build -DGGML_NATIVE=OFF -DGGML_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DLLAMA_CURL=ON ${OPT_SYCL_F16} && \
cmake --build build --config Release --target llama-server

FROM intel/oneapi-basekit:$ONEAPI_VERSION AS runtime
Expand Down
2 changes: 1 addition & 1 deletion .devops/llama-server-musa.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ WORKDIR /app

COPY . .

RUN cmake -B build -DGGML_MUSA=ON -DLLAMA_CURL=ON ${CMAKE_ARGS} -DCMAKE_EXE_LINKER_FLAGS=-Wl,--allow-shlib-undefined . && \
RUN cmake -B build -DGGML_NATIVE=OFF -DGGML_MUSA=ON -DLLAMA_CURL=ON ${CMAKE_ARGS} -DCMAKE_EXE_LINKER_FLAGS=-Wl,--allow-shlib-undefined . && \
cmake --build build --config Release --target llama-server -j$(nproc) && \
mkdir -p /app/lib && \
find build -name "*.so" -exec cp {} /app/lib \;
Expand Down
2 changes: 1 addition & 1 deletion .devops/llama-server-vulkan.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ RUN wget -qO - https://packages.lunarg.com/lunarg-signing-key-pub.asc | apt-key
# Build it
WORKDIR /app
COPY . .
RUN cmake -B build -DGGML_VULKAN=1 -DLLAMA_CURL=1 && \
RUN cmake -B build -DGGML_NATIVE=OFF -DGGML_VULKAN=1 -DLLAMA_CURL=1 && \
cmake --build build --config Release --target llama-server

# Clean up
Expand Down
23 changes: 22 additions & 1 deletion .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -414,6 +414,27 @@ jobs:
cmake -B build2 -S . -DCMAKE_C_COMPILER=hipcc -DCMAKE_CXX_COMPILER=hipcc -DGGML_HIP=ON
cmake --build build2 --config Release -j $(nproc)

ubuntu-22-cmake-musa:
runs-on: ubuntu-22.04
container: mthreads/musa:rc3.1.0-devel-ubuntu22.04

steps:
- name: Clone
id: checkout
uses: actions/checkout@v4

- name: Dependencies
id: depends
run: |
apt-get update
apt-get install -y build-essential git cmake libcurl4-openssl-dev

- name: Build with native CMake MUSA support
id: cmake_build
run: |
cmake -B build -S . -DGGML_MUSA=ON
cmake --build build --config Release -j $(nproc)

ubuntu-22-cmake-sycl:
runs-on: ubuntu-22.04

Expand Down Expand Up @@ -930,7 +951,7 @@ jobs:
shell: bash

env:
WINDOWS_BASEKIT_URL: https://registrationcenter-download.intel.com/akdlm/IRC_NAS/7dff44ba-e3af-4448-841c-0d616c8da6e7/w_BaseKit_p_2024.1.0.595_offline.exe
WINDOWS_BASEKIT_URL: https://registrationcenter-download.intel.com/akdlm/IRC_NAS/b380d914-366b-4b77-a74a-05e3c38b3514/intel-oneapi-base-toolkit-2025.0.0.882_offline.exe
WINDOWS_DPCPP_MKL: intel.oneapi.win.cpp-dpcpp-common:intel.oneapi.win.mkl.devel
ONEAPI_ROOT: "C:/Program Files (x86)/Intel/oneAPI"
steps:
Expand Down
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
*.a
*.bat
*.bin
*.d
*.dll
*.dot
*.etag
Expand Down Expand Up @@ -133,3 +134,7 @@ poetry.toml

# Test models for lora adapters
/lora-tests

# Local scripts
/run-vim.sh
/run-chat.sh
34 changes: 19 additions & 15 deletions CMakePresets.json
Original file line number Diff line number Diff line change
Expand Up @@ -24,11 +24,12 @@
"CMAKE_INSTALL_RPATH": "$ORIGIN;$ORIGIN/.."
}
},
{ "name": "debug", "hidden": true, "cacheVariables": { "CMAKE_BUILD_TYPE": "Debug" } },
{ "name": "release", "hidden": true, "cacheVariables": { "CMAKE_BUILD_TYPE": "Release" } },
{ "name": "reldbg", "hidden": true, "cacheVariables": { "CMAKE_BUILD_TYPE": "RelWithDebInfo" } },
{ "name": "static", "hidden": true, "cacheVariables": { "GGML_STATIC": "ON" } },
{ "name": "sycl_f16", "hidden": true, "cacheVariables": { "GGML_SYCL_F16": "ON" } },
{ "name": "debug", "hidden": true, "cacheVariables": { "CMAKE_BUILD_TYPE": "Debug" } },
{ "name": "release", "hidden": true, "cacheVariables": { "CMAKE_BUILD_TYPE": "Release" } },
{ "name": "reldbg", "hidden": true, "cacheVariables": { "CMAKE_BUILD_TYPE": "RelWithDebInfo" } },
{ "name": "static", "hidden": true, "cacheVariables": { "GGML_STATIC": "ON" } },
{ "name": "sycl_f16", "hidden": true, "cacheVariables": { "GGML_SYCL_F16": "ON" } },
{ "name": "vulkan", "hidden": true, "cacheVariables": { "GGML_VULKAN": "ON" } },

{
"name": "arm64-windows-msvc", "hidden": true,
Expand Down Expand Up @@ -57,25 +58,28 @@
}
},

{ "name": "arm64-windows-llvm-debug" , "inherits": [ "base", "arm64-windows-llvm", "debug" ] },
{ "name": "arm64-windows-llvm-release", "inherits": [ "base", "arm64-windows-llvm", "reldbg" ] },
{ "name": "arm64-windows-llvm+static-release", "inherits": [ "base", "arm64-windows-llvm", "reldbg", "static" ] },
{ "name": "arm64-windows-llvm-debug", "inherits": [ "base", "arm64-windows-llvm", "debug" ] },
{ "name": "arm64-windows-llvm-release", "inherits": [ "base", "arm64-windows-llvm", "reldbg" ] },
{ "name": "arm64-windows-llvm+static-release", "inherits": [ "base", "arm64-windows-llvm", "reldbg", "static" ] },

{ "name": "arm64-apple-clang-debug" , "inherits": [ "base", "arm64-apple-clang", "debug" ] },
{ "name": "arm64-apple-clang-release" , "inherits": [ "base", "arm64-apple-clang", "reldbg" ] },
{ "name": "arm64-apple-clang+static-release" , "inherits": [ "base", "arm64-apple-clang", "reldbg", "static" ] },
{ "name": "arm64-apple-clang-debug", "inherits": [ "base", "arm64-apple-clang", "debug" ] },
{ "name": "arm64-apple-clang-release", "inherits": [ "base", "arm64-apple-clang", "reldbg" ] },
{ "name": "arm64-apple-clang+static-release", "inherits": [ "base", "arm64-apple-clang", "reldbg", "static" ] },

{ "name": "arm64-windows-msvc-debug" , "inherits": [ "base", "arm64-windows-msvc", "debug" ] },
{ "name": "arm64-windows-msvc-debug", "inherits": [ "base", "arm64-windows-msvc", "debug" ] },
{ "name": "arm64-windows-msvc-release", "inherits": [ "base", "arm64-windows-msvc", "reldbg" ] },
{ "name": "arm64-windows-msvc+static-release", "inherits": [ "base", "arm64-windows-msvc", "reldbg", "static" ] },

{ "name": "x64-windows-msvc-debug" , "inherits": [ "base", "debug" ] },
{ "name": "x64-windows-msvc-debug", "inherits": [ "base", "debug" ] },
{ "name": "x64-windows-msvc-release", "inherits": [ "base", "reldbg" ] },
{ "name": "x64-windows-msvc+static-release", "inherits": [ "base", "reldbg", "static" ] },

{ "name": "x64-windows-sycl-debug" , "inherits": [ "sycl-base", "debug" ] },
{ "name": "x64-windows-sycl-debug", "inherits": [ "sycl-base", "debug" ] },
{ "name": "x64-windows-sycl-debug-f16", "inherits": [ "sycl-base", "debug", "sycl_f16" ] },
{ "name": "x64-windows-sycl-release", "inherits": [ "sycl-base", "release" ] },
{ "name": "x64-windows-sycl-release-f16", "inherits": [ "sycl-base", "release", "sycl_f16" ] }
{ "name": "x64-windows-sycl-release-f16", "inherits": [ "sycl-base", "release", "sycl_f16" ] },

{ "name": "x64-windows-vulkan-debug", "inherits": [ "base", "vulkan", "debug" ] },
{ "name": "x64-windows-vulkan-release", "inherits": [ "base", "vulkan", "release" ] }
]
}
Loading
Loading