Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge from upstream #7

Merged
merged 48 commits into from
Apr 6, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
be55134
convert : refactor vocab selection logic (#6355)
cebtenzzre Mar 28, 2024
5106ef4
[SYCL] Revisited & updated SYCL build documentation (#6141)
OuadiElfarouki Mar 28, 2024
bfe7daf
readme : add notice for UI list
ggerganov Mar 28, 2024
b75c381
convert : allow conversion of Mistral HF models (#6144)
pcuenca Mar 29, 2024
057400a
llama : remove redundant reshape in build_kv_store (#6369)
danbev Mar 29, 2024
8093987
cmake : add explicit metal version options (#6370)
mattjcly Mar 29, 2024
b910287
readme : add project (#6356)
zhouwg Mar 29, 2024
cfde806
ci : fix BGE wget (#6383)
ggerganov Mar 29, 2024
0695747
[Model] Add support for xverse (#6301)
hxer7963 Mar 29, 2024
d48ccf3
sync : ggml (#6351)
ggerganov Mar 29, 2024
ba0c7c7
Vulkan k-quant mmq and ggml-backend offload functionality (#6155)
0cc4m Mar 29, 2024
f7fc5f6
split: allow --split-max-size option (#6343)
ngxson Mar 29, 2024
c342d07
Fedora build update (#6388)
Man2Dev Mar 29, 2024
37e7854
ci: bench: fix Resource not accessible by integration on PR event (#6…
phymbert Mar 30, 2024
c50a82c
readme : update hot topics
ggerganov Mar 31, 2024
226e819
ci: server: verify deps are coherent with the commit (#6409)
phymbert Apr 1, 2024
33a5244
compare-llama-bench.py: fix long hexsha args (#6424)
JohannesGaessler Apr 1, 2024
f87f7b8
flake.lock: Update (#6402)
ggerganov Apr 1, 2024
5260486
[SYCL] Disable iqx on windows as WA (#6435)
airMeng Apr 3, 2024
08a0c02
ggml : mul_mat_id use the same tensor for all the experts (#6387)
slaren Apr 3, 2024
076b086
readme : update hot topics
ggerganov Apr 3, 2024
1ff4d9f
Add OpenChat, Alpaca, Vicuna chat templates (#6397)
kaizau Apr 3, 2024
db214fa
Missing tokenizer.model error during gguf conversion (#6443)
overtunned Apr 3, 2024
e69945d
security : create policy (#6354)
joycebrum Apr 3, 2024
154d4ee
readme : add feature-rich rust bindings (#6465)
francis2tm Apr 3, 2024
5d4f12e
server: add cURL support to `server.Dockerfile` (#6461)
elepedus Apr 3, 2024
9f62c01
ci : update checkout, setup-python and upload-artifact to latest (#6456)
EwoutH Apr 3, 2024
bb43cf7
llama : add SEA-LION support (#6448)
bryanSwk Apr 3, 2024
60cdf40
server : handle exception on wrong type in request (#6452)
JH23X Apr 3, 2024
5fb1574
A few small fixes to server's README docs (#6428)
fat-tire Apr 3, 2024
72d73af
convert : fix for lint error complaining of bare except (#6470)
HanClinto Apr 4, 2024
1a43c72
server : add option to disable KV offload (#6468)
jxy Apr 4, 2024
4399f13
server : remove obsolete --memory-f32 option
ggerganov Apr 4, 2024
9b84ae1
examples : add GBNF validator program (#5948)
HanClinto Apr 4, 2024
4bcd6b9
common: remove duplicate check for curl (#6471)
danbev Apr 4, 2024
7a2c926
ci: bench: add more ftype, fix triggers and bot comment (#6466)
phymbert Apr 4, 2024
a74401f
Correct README link (#6458)
limitedAtonement Apr 4, 2024
8120efe
ci: bench fix concurrency for workflow trigger dispatch with sha1 (#6…
phymbert Apr 4, 2024
2e66913
server: allow penalizing repetition of newlines on server webpage (#6…
sha224 Apr 4, 2024
c666ba2
build CI: Name artifacts (#6482)
EwoutH Apr 4, 2024
7dda1b7
ci: exempt master branch workflows from getting cancelled (#6486)
mscheong01 Apr 4, 2024
0a1d889
server: add cURL support to server Dockerfiles (#6474)
elepedus Apr 4, 2024
b660a57
readme : fix typo (#6481)
junnjiee16 Apr 4, 2024
a307375
readme : add Dot to UI list (#6487)
alexpinel Apr 4, 2024
1b496a7
[SYCL] Fixed minor bug when enabling FP16 for non intel targets (#6464)
OuadiElfarouki Apr 5, 2024
87e21bb
bench : make n_batch and n_ubatch configurable in Batched bench (#6500)
Sunt-ing Apr 5, 2024
d0f5dee
readme : update UI list (#6503)
hugo53 Apr 5, 2024
a8bd14d
gguf.py : add licence and version to gguf writer (#6504)
mofosyne Apr 5, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion .devops/full-cuda.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ FROM ${BASE_CUDA_DEV_CONTAINER} as build
ARG CUDA_DOCKER_ARCH=all

RUN apt-get update && \
apt-get install -y build-essential python3 python3-pip git
apt-get install -y build-essential python3 python3-pip git libcurl4-openssl-dev

COPY requirements.txt requirements.txt
COPY requirements requirements
Expand All @@ -28,6 +28,8 @@ COPY . .
ENV CUDA_DOCKER_ARCH=${CUDA_DOCKER_ARCH}
# Enable CUDA
ENV LLAMA_CUDA=1
# Enable cURL
ENV LLAMA_CURL=1

RUN make

Expand Down
5 changes: 5 additions & 0 deletions .devops/full-rocm.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,11 @@ ENV LLAMA_HIPBLAS=1
ENV CC=/opt/rocm/llvm/bin/clang
ENV CXX=/opt/rocm/llvm/bin/clang++

# Enable cURL
ENV LLAMA_CURL=1
RUN apt-get update && \
apt-get install -y libcurl4-openssl-dev

RUN make

ENTRYPOINT ["/app/.devops/tools.sh"]
5 changes: 4 additions & 1 deletion .devops/full.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ ARG UBUNTU_VERSION=22.04
FROM ubuntu:$UBUNTU_VERSION as build

RUN apt-get update && \
apt-get install -y build-essential python3 python3-pip git
apt-get install -y build-essential python3 python3-pip git libcurl4-openssl-dev

COPY requirements.txt requirements.txt
COPY requirements requirements
Expand All @@ -15,6 +15,9 @@ WORKDIR /app

COPY . .

ENV LLAMA_CURL=1


RUN make

ENV LC_ALL=C.utf8
Expand Down
2 changes: 1 addition & 1 deletion .devops/llama-cpp-clblast.srpm.spec
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# SRPM for building from source and packaging an RPM for RPM-based distros.
# https://fedoraproject.org/wiki/How_to_create_an_RPM_package
# https://docs.fedoraproject.org/en-US/quick-docs/creating-rpm-packages
# Built and maintained by John Boero - [email protected]
# In honor of Seth Vidal https://www.redhat.com/it/blog/thank-you-seth-vidal

Expand Down
2 changes: 1 addition & 1 deletion .devops/llama-cpp-cuda.srpm.spec
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# SRPM for building from source and packaging an RPM for RPM-based distros.
# https://fedoraproject.org/wiki/How_to_create_an_RPM_package
# https://docs.fedoraproject.org/en-US/quick-docs/creating-rpm-packages
# Built and maintained by John Boero - [email protected]
# In honor of Seth Vidal https://www.redhat.com/it/blog/thank-you-seth-vidal

Expand Down
2 changes: 1 addition & 1 deletion .devops/llama-cpp.srpm.spec
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# SRPM for building from source and packaging an RPM for RPM-based distros.
# https://fedoraproject.org/wiki/How_to_create_an_RPM_package
# https://docs.fedoraproject.org/en-US/quick-docs/creating-rpm-packages
# Built and maintained by John Boero - [email protected]
# In honor of Seth Vidal https://www.redhat.com/it/blog/thank-you-seth-vidal

Expand Down
7 changes: 6 additions & 1 deletion .devops/server-cuda.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ FROM ${BASE_CUDA_DEV_CONTAINER} as build
ARG CUDA_DOCKER_ARCH=all

RUN apt-get update && \
apt-get install -y build-essential git
apt-get install -y build-essential git libcurl4-openssl-dev

WORKDIR /app

Expand All @@ -22,11 +22,16 @@ COPY . .
ENV CUDA_DOCKER_ARCH=${CUDA_DOCKER_ARCH}
# Enable CUDA
ENV LLAMA_CUDA=1
# Enable cURL
ENV LLAMA_CURL=1

RUN make

FROM ${BASE_CUDA_RUN_CONTAINER} as runtime

RUN apt-get update && \
apt-get install -y libcurl4-openssl-dev

COPY --from=build /app/server /server

ENTRYPOINT [ "/server" ]
7 changes: 5 additions & 2 deletions .devops/server-intel.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ FROM intel/oneapi-basekit:$ONEAPI_VERSION as build

ARG LLAMA_SYCL_F16=OFF
RUN apt-get update && \
apt-get install -y git
apt-get install -y git libcurl4-openssl-dev

WORKDIR /app

Expand All @@ -16,11 +16,14 @@ RUN mkdir build && \
echo "LLAMA_SYCL_F16 is set" && \
export OPT_SYCL_F16="-DLLAMA_SYCL_F16=ON"; \
fi && \
cmake .. -DLLAMA_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx ${OPT_SYCL_F16} && \
cmake .. -DLLAMA_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DLLAMA_CURL=ON ${OPT_SYCL_F16} && \
cmake --build . --config Release --target server

FROM intel/oneapi-basekit:$ONEAPI_VERSION as runtime

RUN apt-get update && \
apt-get install -y libcurl4-openssl-dev

COPY --from=build /app/build/bin/server /server

ENV LC_ALL=C.utf8
Expand Down
5 changes: 5 additions & 0 deletions .devops/server-rocm.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,11 @@ ENV LLAMA_HIPBLAS=1
ENV CC=/opt/rocm/llvm/bin/clang
ENV CXX=/opt/rocm/llvm/bin/clang++

# Enable cURL
ENV LLAMA_CURL=1
RUN apt-get update && \
apt-get install -y libcurl4-openssl-dev

RUN make

ENTRYPOINT [ "/app/server" ]
6 changes: 5 additions & 1 deletion .devops/server-vulkan.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -11,12 +11,16 @@ RUN wget -qO - https://packages.lunarg.com/lunarg-signing-key-pub.asc | apt-key
apt update -y && \
apt-get install -y vulkan-sdk

# Install cURL
RUN apt-get update && \
apt-get install -y libcurl4-openssl-dev

# Build it
WORKDIR /app
COPY . .
RUN mkdir build && \
cd build && \
cmake .. -DLLAMA_VULKAN=1 && \
cmake .. -DLLAMA_VULKAN=1 -DLLAMA_CURL=1 && \
cmake --build . --config Release --target server

# Clean up
Expand Down
7 changes: 6 additions & 1 deletion .devops/server.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -3,16 +3,21 @@ ARG UBUNTU_VERSION=22.04
FROM ubuntu:$UBUNTU_VERSION as build

RUN apt-get update && \
apt-get install -y build-essential git
apt-get install -y build-essential git libcurl4-openssl-dev

WORKDIR /app

COPY . .

ENV LLAMA_CURL=1

RUN make

FROM ubuntu:$UBUNTU_VERSION as runtime

RUN apt-get update && \
apt-get install -y libcurl4-openssl-dev

COPY --from=build /app/server /server

ENV LC_ALL=C.utf8
Expand Down
44 changes: 29 additions & 15 deletions .github/workflows/bench.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,15 +24,15 @@ on:
push:
branches:
- master
paths: ['.github/workflows/bench.yml', '**/CMakeLists.txt', '**/Makefile', '**/*.h', '**/*.hpp', '**/*.c', '**/*.cpp', '**/*.cu', '**/*.swift', '**/*.m', 'examples/server/bench/**.*']
pull_request:
paths: ['llama.cpp', 'ggml.c', 'ggml-backend.c', 'ggml-quants.c', '**/*.cu', 'examples/server/*.h*', 'examples/server/*.cpp']
pull_request_target:
types: [opened, synchronize, reopened]
paths: ['.github/workflows/bench.yml', '**/CMakeLists.txt', '**/Makefile', '**/*.h', '**/*.hpp', '**/*.c', '**/*.cpp', '**/*.cu', '**/*.swift', '**/*.m', 'examples/server/bench/**.*']
paths: ['llama.cpp', 'ggml.c', 'ggml-backend.c', 'ggml-quants.c', '**/*.cu', 'examples/server/*.h*', 'examples/server/*.cpp']
schedule:
- cron: '04 2 * * *'

concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
group: ${{ github.workflow }}-${{ github.head_ref && github.ref || github.run_id }}-${{ github.event.inputs.sha }}
cancel-in-progress: true

jobs:
Expand All @@ -42,11 +42,21 @@ jobs:
RUNNER_LABEL: Standard_NC4as_T4_v3 # FIXME Do not find a way to not duplicate it
N_USERS: 8
DURATION: 10m

strategy:
matrix:
model: [phi-2]
ftype: [q4_0, q8_0, f16]
include:
- model: phi-2
ftype: q4_0
pr_comment_enabled: "true"

if: ${{ github.event.inputs.gpu-series == 'Standard_NC4as_T4_v3' || github.event.schedule || github.event.pull_request || github.head_ref == 'master' || github.ref_name == 'master' || github.event.push.ref == 'refs/heads/master' }}
steps:
- name: Clone
id: checkout
uses: actions/checkout@v3
uses: actions/checkout@v4
with:
fetch-depth: 0
ref: ${{ github.event.inputs.sha || github.event.pull_request.head.sha || github.sha || github.head_ref || github.ref_name }}
Expand Down Expand Up @@ -116,7 +126,7 @@ jobs:
--scenario script.js \
--duration ${{ github.event.inputs.duration || env.DURATION }} \
--hf-repo ggml-org/models \
--hf-file phi-2/ggml-model-q4_0.gguf \
--hf-file ${{ matrix.model }}/ggml-model-${{ matrix.ftype }}.gguf \
--model-path-prefix /models \
--parallel ${{ env.N_USERS }} \
-ngl 33 \
Expand All @@ -134,7 +144,7 @@ jobs:

- uses: actions/upload-artifact@v4
with:
name: benchmark-results
name: bench-server-${{ github.job }}-${{ env.RUNNER_LABEL }}-${{ matrix.model }}-${{ matrix.ftype }}
compression-level: 9
path: |
examples/server/bench/*.jpg
Expand All @@ -143,11 +153,10 @@ jobs:

- name: Commit status
uses: Sibz/github-status-action@v1
continue-on-error: true # If not authorized on external repo
with:
authToken: ${{secrets.GITHUB_TOKEN}}
sha: ${{ inputs.sha || github.event.pull_request.head.sha || github.sha }}
context: bench-server-baseline
context: bench-server-${{ github.job }}-${{ env.RUNNER_LABEL }}-${{ matrix.model }}-${{ matrix.ftype }}
description: |
${{ env.BENCH_RESULTS }}
state: 'success'
Expand Down Expand Up @@ -204,21 +213,26 @@ jobs:
- name: Comment PR
uses: mshick/add-pr-comment@v2
id: comment_pr
if: ${{ github.event.pull_request != '' }}
if: ${{ github.event.pull_request != '' && matrix.pr_comment_enabled == 'true' }}
with:
message-id: bench-${{ github.job }}-${{ env.RUNNER_LABEL }}
message-id: bench-server-${{ github.job }}-${{ env.RUNNER_LABEL }}-${{ matrix.model }}-${{ matrix.ftype }}
message: |
📈 **llama.cpp server** for _${{ github.job }}_ on _${{ env.RUNNER_LABEL }}_: **${{ env.BENCH_ITERATIONS}} iterations** 🚀
<p align="center">

📈 **llama.cpp server** for _${{ github.job }}_ on _${{ env.RUNNER_LABEL }}_ for `${{ matrix.model }}`-`${{ matrix.ftype }}`: **${{ env.BENCH_ITERATIONS}} iterations** 🚀

</p>

<details>

<summary>Expand details for performance related PR only</summary>

- Concurrent users: ${{ env.N_USERS }}, duration: ${{ github.event.inputs.duration || env.DURATION }}
- HTTP request : avg=${{ env.HTTP_REQ_DURATION_AVG }}ms p(90)=${{ env.HTTP_REQ_DURATION_P_90_ }}ms fails=${{ env.HTTP_REQ_FAILED_PASSES }}, finish reason: stop=${{ env.LLAMACPP_COMPLETIONS_STOP_RATE_PASSES }} truncated=${{ env.LLAMACPP_COMPLETIONS_TRUNCATED_RATE_PASSES }}
- Prompt processing (pp): avg=${{ env.LLAMACPP_PROMPT_TOKENS_AVG }}tk/s p(90)=${{ env.LLAMACPP_PROMPT_TOKENS_P_90_ }}tk/s **total=${{ env.LLAMACPP_PROMPT_TOKENS_TOTAL_COUNTER_RATE }}tk/s**
- Token generation (tg): avg=${{ env.LLAMACPP_TOKENS_SECOND_AVG }}tk/s p(90)=${{ env.LLAMACPP_TOKENS_SECOND_P_90_ }}tk/s **total=${{ env.LLAMACPP_COMPLETION_TOKENS_TOTAL_COUNTER_RATE }}tk/s**
- ${{ env.BENCH_GRAPH_XLABEL }}

<details>

<summary>Time series</summary>

<p align="center">

Expand Down
Loading
Loading