Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merged from upstream #26

Merged
merged 122 commits into from
Jul 8, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
122 commits
Select commit Hold shift + click to select a range
f702a90
Update control vector help (#8104)
HatsuneMikuUwU33 Jun 25, 2024
3791ad2
SimpleChat v3.1: Boolean chat request options in Settings UI, cache_p…
hanishkvc Jun 25, 2024
48e6b92
Add chat template support for llama-cli (#8068)
ngxson Jun 25, 2024
49c03c7
cvector: better prompt handling, add "mean vector" method (#8069)
ngxson Jun 25, 2024
c8ad359
Gguf dump start data offset via --data-offset and some extra refactor…
mofosyne Jun 25, 2024
925c309
Add healthchecks to llama-server containers (#8081)
codearranger Jun 25, 2024
dd047b4
disable docker CI on pull requests (#8110)
slaren Jun 25, 2024
84631fe
`json`: support integer minimum, maximum, exclusiveMinimum, exclusive…
ochafik Jun 25, 2024
e6bf007
llama : return nullptr from llama_grammar_init (#8093)
danbev Jun 25, 2024
6fcbf68
llama : implement Unigram tokenizer needed by T5 and FLAN-T5 model fa…
fairydreaming Jun 25, 2024
163d50a
fixes #7999 (adds control vectors to all `build_XXX()` functions in `…
jukofyork Jun 25, 2024
6777c54
`json`: fix additionalProperties, allow space after enum/const (#7840)
ochafik Jun 26, 2024
9b2f16f
`json`: better support for "type" unions (e.g. nullable arrays w/ typ…
ochafik Jun 26, 2024
494165f
llama : extend llm_build_ffn() to support _scale tensors (#8103)
Eddie-Wang1120 Jun 26, 2024
c8771ab
CUDA: fix misaligned shared memory read (#8123)
JohannesGaessler Jun 26, 2024
8854044
Clarify default MMQ for CUDA and LLAMA_CUDA_FORCE_MMQ flag (#8115)
isaac-mcfadyen Jun 26, 2024
f3f6542
llama : reorganize source code + improve CMake (#8006)
ggerganov Jun 26, 2024
a95631e
readme : update API notes
ggerganov Jun 26, 2024
0e814df
devops : remove clblast + LLAMA_CUDA -> GGML_CUDA (#8139)
ggerganov Jun 26, 2024
4713bf3
authors : regen
ggerganov Jun 26, 2024
f2d48ff
sync : ggml
ggerganov Jun 26, 2024
c7ab7b6
make : fix missing -O3 (#8143)
slaren Jun 26, 2024
31ec399
ggml : add GGML_CUDA_USE_GRAPHS option, restore GGML_CUDA_FORCE_CUBLA…
slaren Jun 26, 2024
ae5d0f4
ci : publish new docker images only when the files change (#8142)
slaren Jun 26, 2024
c70d117
scripts : fix filename sync
ggerganov Jun 26, 2024
9b31a40
clip : suppress unused variable warnings (#8105)
danbev Jun 26, 2024
ac14662
Fix llama-android.cpp for error - "common/common.h not found" (#8145)
criminact Jun 27, 2024
911e35b
llama : fix CodeLlama FIM token checks (#8144)
CISC Jun 27, 2024
f675b20
Added support for Viking pre-tokenizer (#8135)
kustaaya Jun 27, 2024
85a267d
CUDA: fix MMQ stream-k for --split-mode row (#8167)
JohannesGaessler Jun 27, 2024
6030c61
Add Qwen2MoE 57B-A14B model identifier (#8158)
CISC Jun 27, 2024
3879526
Delete examples/llama.android/llama/CMakeLists.txt (#8165)
criminact Jun 27, 2024
97877eb
Control vector loading fixes (#8137)
jukofyork Jun 27, 2024
ab36791
flake.lock: Update (#8071)
ggerganov Jun 27, 2024
16791b8
Add chatml fallback for cpp `llama_chat_apply_template` (#8160)
ngxson Jun 27, 2024
8172ee9
cmake : fix deprecated option names not working (#8171)
slaren Jun 27, 2024
558f44b
CI: fix release build (Ubuntu+Mac) (#8170)
loonerin Jun 27, 2024
cb0b06a
`json`: update grammars/README w/ examples & note about additionalPro…
ochafik Jun 27, 2024
a27aa50
Add missing items in makefile (#8177)
ngxson Jun 28, 2024
e57dc62
llama: Add support for Gemma2ForCausalLM (#8156)
pculliton Jun 28, 2024
139cc62
`json`: restore default additionalProperties to false, fix some patte…
ochafik Jun 28, 2024
b851b3f
cmake : allow user to override default options (#8178)
slaren Jun 28, 2024
38373cf
Add SPM infill support (#8016)
CISC Jun 28, 2024
26a39bb
Add MiniCPM, Deepseek V2 chat template + clean up `llama_chat_apply_t…
ngxson Jun 28, 2024
8748d8a
json: attempt to skip slow tests when running under emulator (#8189)
ochafik Jun 28, 2024
72272b8
fix code typo in llama-cli (#8198)
ngxson Jun 28, 2024
1c5eba6
llama: Add attention and final logit soft-capping, update scaling fac…
abetlen Jun 30, 2024
9ef0780
Fix new line issue with chat template, disable template when in-prefi…
ngxson Jun 30, 2024
d0a7145
flake.lock: Update (#8218)
ggerganov Jun 30, 2024
197fe6c
[SYCL] Update SYCL-Rope op and Refactor (#8157)
zhentaoyu Jul 1, 2024
694c59c
Document BERT support. (#8205)
iacore Jul 1, 2024
257f8e4
nix : remove OpenCL remnants (#8235)
ggerganov Jul 1, 2024
3840b6f
nix : enable curl (#8043)
edude03 Jul 1, 2024
0ddeff1
readme : update tool list (#8209)
crashr Jul 1, 2024
49122a8
gemma2: add sliding window mask (#8227)
ngxson Jul 1, 2024
dae57a1
readme: add Paddler to the list of projects (#8239)
mcharytoniuk Jul 1, 2024
cb5fad4
CUDA: refactor and optimize IQ MMVQ (#8215)
JohannesGaessler Jul 1, 2024
5fac350
Fix gemma2 tokenizer convert (#8244)
ngxson Jul 1, 2024
d08c20e
[SYCL] Fix the sub group size of Intel (#8106)
luoyu-intel Jul 2, 2024
a9f3b10
[SYCL] Fix win build conflict of math library (#8230)
luoyu-intel Jul 2, 2024
0e0590a
cuda : update supports_op for matrix multiplication (#8245)
slaren Jul 2, 2024
023b880
convert-hf : print output file name when completed (#8181)
danbev Jul 2, 2024
9689673
Add `JAIS` model(s) (#8118)
fmz Jul 2, 2024
07a3fc0
Removes multiple newlines at the end of files that is breaking the ed…
HanClinto Jul 2, 2024
3e2618b
Adding step to `clean` target to remove legacy binary names to reduce…
HanClinto Jul 2, 2024
a27152b
fix: add missing short command line argument -mli for multiline-input…
MistApproach Jul 2, 2024
fadde67
Dequant improvements rebase (#8255)
Jul 3, 2024
f8d6a23
fix typo (#8267)
foldl Jul 3, 2024
916248a
fix phi 3 conversion (#8262)
ngxson Jul 3, 2024
5f2d4e6
ppl : fix n_seq_max for perplexity (#8277)
slaren Jul 3, 2024
d23287f
Define and optimize RDNA1 (#8085)
daniandtheweb Jul 3, 2024
f619024
[SYCL] Remove unneeded semicolons (#8280)
Jul 4, 2024
20fc380
convert : fix gemma v1 tokenizer convert (#8248)
ggerganov Jul 4, 2024
402d6fe
llama : suppress unref var in Windows MSVC (#8150)
danbev Jul 4, 2024
f8c4c07
tests : add _CRT_SECURE_NO_WARNINGS for WIN32 (#8231)
danbev Jul 4, 2024
807b0c4
Inference support for T5 and FLAN-T5 model families (#5763)
fairydreaming Jul 4, 2024
b0a4699
build(python): Package scripts with pip-0517 compliance
ditsuke Feb 27, 2024
b1c3f26
fix: Actually include scripts in build
ditsuke Feb 28, 2024
8219229
fix: Update script paths in CI scripts
ditsuke Mar 10, 2024
de14e2e
chore: ignore all __pychache__
ditsuke Jul 2, 2024
07786a6
chore: Fixup requirements and build
ditsuke Jul 2, 2024
01a5f06
chore: Remove rebase artifacts
ditsuke Jul 2, 2024
1e92001
doc: Add context for why we add an explicit pytorch source
ditsuke Jul 2, 2024
51d2eba
build: Export hf-to-gguf as snakecase
ditsuke Jul 4, 2024
6f63d64
tokenize : add --show-count (token) option (#8299)
danbev Jul 4, 2024
d7fd29f
llama : add OpenELM support (#7359)
icecream95 Jul 4, 2024
a38b884
cli: add EOT when user hit Ctrl+C (#8296)
ngxson Jul 4, 2024
f09b7cb
rm get_work_group_size() by local cache for performance (#8286)
NeoZhangJianyu Jul 5, 2024
e235b26
py : switch to snake_case (#8305)
ggerganov Jul 5, 2024
a9554e2
[SYCL] Fix WARP_SIZE=16 bug of Intel GPU (#8266)
luoyu-intel Jul 5, 2024
6c05752
contributing : update guidelines (#8316)
ggerganov Jul 5, 2024
aa5898d
llama : prefer n_ over num_ prefix (#8308)
ggerganov Jul 5, 2024
61ecafa
passkey : add short intro to README.md [no-ci] (#8317)
danbev Jul 5, 2024
5a7447c
readme : fix minor typos [no ci] (#8314)
pouwerkerk Jul 5, 2024
bcefa03
CUDA: fix MMQ stream-k rounding if ne00 % 128 != 0 (#8311)
JohannesGaessler Jul 5, 2024
d12f781
llama : streamline embeddings from "non-embedding" models (#8087)
iamlemec Jul 5, 2024
0a42380
CUDA: revert part of the RDNA1 optimizations (#8309)
daniandtheweb Jul 5, 2024
8e55830
CUDA: MMQ support for iq4_nl, iq4_xs (#8278)
JohannesGaessler Jul 5, 2024
2cccbaa
llama : minor indentation during tensor loading (#8304)
ggerganov Jul 5, 2024
148ec97
convert : remove AWQ remnants (#8320)
ggerganov Jul 5, 2024
1f3e1b6
Enabled more data types for oneMKL gemm_batch (#8236)
OuadiElfarouki Jul 5, 2024
1d894a7
cmake : add GGML_BUILD and GGML_SHARED macro definitions (#8281)
akemimadoka Jul 5, 2024
7ed03b8
llama : fix compile warning (#8304)
ggerganov Jul 5, 2024
be20e7f
Reorganize documentation pages (#8325)
ngxson Jul 5, 2024
213701b
Detokenizer fixes (#8039)
jaime-m-p Jul 5, 2024
87e25a1
llama : add early return for empty range (#8327)
danbev Jul 6, 2024
60d83a0
update main readme (#8333)
ngxson Jul 6, 2024
86e7299
added support for Authorization Bearer tokens when downloading model …
dwoolworth Jul 6, 2024
cb4d86c
server: Retrieve prompt template in /props (#8337)
bviksoe Jul 7, 2024
210eb9e
finetune: Rename an old command name in finetune.sh (#8344)
standby24x7 Jul 7, 2024
b81ba1f
finetune: Rename command name in README.md (#8343)
standby24x7 Jul 7, 2024
d39130a
py : use cpu-only torch in requirements.txt (#8335)
compilade Jul 7, 2024
b504008
llama : fix n_rot default (#8348)
ggerganov Jul 7, 2024
905942a
llama : support glm3 and glm4 (#8031)
youth123 Jul 7, 2024
f7cab35
gguf-hash: model wide and per tensor hashing using xxhash and sha1 (#…
mofosyne Jul 7, 2024
f1948f1
readme : update bindings list (#8222)
andy-tai Jul 7, 2024
4090ea5
ci : add checks for cmake,make and ctest in ci/run.sh (#8200)
AlexsCode Jul 7, 2024
a8db2a9
Update llama-cli documentation (#8315)
dspasyuk Jul 7, 2024
3fd62a6
py : type-check all Python scripts with Pyright (#8341)
compilade Jul 7, 2024
04ce3a8
readme : add supported glm models (#8360)
youth123 Jul 8, 2024
ffd0079
common : avoid unnecessary logits fetch (#8358)
kevmo314 Jul 8, 2024
6f0dbf6
infill : assert prefix/suffix tokens + remove old space logic (#8351)
ggerganov Jul 8, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
2 changes: 1 addition & 1 deletion .devops/full-cuda.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ COPY . .
# Set nvcc architecture
ENV CUDA_DOCKER_ARCH=${CUDA_DOCKER_ARCH}
# Enable CUDA
ENV LLAMA_CUDA=1
ENV GGML_CUDA=1
# Enable cURL
ENV LLAMA_CURL=1

Expand Down
2 changes: 1 addition & 1 deletion .devops/full-rocm.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ COPY . .
# Set nvcc architecture
ENV GPU_TARGETS=${ROCM_DOCKER_ARCH}
# Enable ROCm
ENV LLAMA_HIPBLAS=1
ENV GGML_HIPBLAS=1
ENV CC=/opt/rocm/llvm/bin/clang
ENV CXX=/opt/rocm/llvm/bin/clang++

Expand Down
2 changes: 1 addition & 1 deletion .devops/llama-cli-cuda.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ COPY . .
# Set nvcc architecture
ENV CUDA_DOCKER_ARCH=${CUDA_DOCKER_ARCH}
# Enable CUDA
ENV LLAMA_CUDA=1
ENV GGML_CUDA=1

RUN make -j$(nproc) llama-cli

Expand Down
10 changes: 5 additions & 5 deletions .devops/llama-cli-intel.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -2,19 +2,19 @@ ARG ONEAPI_VERSION=2024.1.1-devel-ubuntu22.04

FROM intel/oneapi-basekit:$ONEAPI_VERSION as build

ARG LLAMA_SYCL_F16=OFF
ARG GGML_SYCL_F16=OFF
RUN apt-get update && \
apt-get install -y git

WORKDIR /app

COPY . .

RUN if [ "${LLAMA_SYCL_F16}" = "ON" ]; then \
echo "LLAMA_SYCL_F16 is set" && \
export OPT_SYCL_F16="-DLLAMA_SYCL_F16=ON"; \
RUN if [ "${GGML_SYCL_F16}" = "ON" ]; then \
echo "GGML_SYCL_F16 is set" && \
export OPT_SYCL_F16="-DGGML_SYCL_F16=ON"; \
fi && \
cmake -B build -DLLAMA_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx ${OPT_SYCL_F16} && \
cmake -B build -DGGML_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx ${OPT_SYCL_F16} && \
cmake --build build --config Release --target llama-cli

FROM intel/oneapi-basekit:$ONEAPI_VERSION as runtime
Expand Down
2 changes: 1 addition & 1 deletion .devops/llama-cli-rocm.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ COPY . .
# Set nvcc architecture
ENV GPU_TARGETS=${ROCM_DOCKER_ARCH}
# Enable ROCm
ENV LLAMA_HIPBLAS=1
ENV GGML_HIPBLAS=1
ENV CC=/opt/rocm/llvm/bin/clang
ENV CXX=/opt/rocm/llvm/bin/clang++

Expand Down
2 changes: 1 addition & 1 deletion .devops/llama-cli-vulkan.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ RUN wget -qO - https://packages.lunarg.com/lunarg-signing-key-pub.asc | apt-key
# Build it
WORKDIR /app
COPY . .
RUN cmake -B build -DLLAMA_VULKAN=1 && \
RUN cmake -B build -DGGML_VULKAN=1 && \
cmake --build build --config Release --target llama-cli

# Clean up
Expand Down
84 changes: 0 additions & 84 deletions .devops/llama-cpp-clblast.srpm.spec

This file was deleted.

2 changes: 1 addition & 1 deletion .devops/llama-cpp-cuda.srpm.spec
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ CPU inference for Meta's Lllama2 models using default options.
%setup -n llama.cpp-master

%build
make -j LLAMA_CUDA=1
make -j GGML_CUDA=1

%install
mkdir -p %{buildroot}%{_bindir}/
Expand Down
6 changes: 4 additions & 2 deletions .devops/llama-server-cuda.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ COPY . .
# Set nvcc architecture
ENV CUDA_DOCKER_ARCH=${CUDA_DOCKER_ARCH}
# Enable CUDA
ENV LLAMA_CUDA=1
ENV GGML_CUDA=1
# Enable cURL
ENV LLAMA_CURL=1

Expand All @@ -30,8 +30,10 @@ RUN make -j$(nproc) llama-server
FROM ${BASE_CUDA_RUN_CONTAINER} as runtime

RUN apt-get update && \
apt-get install -y libcurl4-openssl-dev libgomp1
apt-get install -y libcurl4-openssl-dev libgomp1 curl

COPY --from=build /app/llama-server /llama-server

HEALTHCHECK CMD [ "curl", "-f", "http://localhost:8080/health" ]

ENTRYPOINT [ "/llama-server" ]
14 changes: 8 additions & 6 deletions .devops/llama-server-intel.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -2,28 +2,30 @@ ARG ONEAPI_VERSION=2024.1.1-devel-ubuntu22.04

FROM intel/oneapi-basekit:$ONEAPI_VERSION as build

ARG LLAMA_SYCL_F16=OFF
ARG GGML_SYCL_F16=OFF
RUN apt-get update && \
apt-get install -y git libcurl4-openssl-dev

WORKDIR /app

COPY . .

RUN if [ "${LLAMA_SYCL_F16}" = "ON" ]; then \
echo "LLAMA_SYCL_F16 is set" && \
export OPT_SYCL_F16="-DLLAMA_SYCL_F16=ON"; \
RUN if [ "${GGML_SYCL_F16}" = "ON" ]; then \
echo "GGML_SYCL_F16 is set" && \
export OPT_SYCL_F16="-DGGML_SYCL_F16=ON"; \
fi && \
cmake -B build -DLLAMA_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DLLAMA_CURL=ON ${OPT_SYCL_F16} && \
cmake -B build -DGGML_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DLLAMA_CURL=ON ${OPT_SYCL_F16} && \
cmake --build build --config Release --target llama-server

FROM intel/oneapi-basekit:$ONEAPI_VERSION as runtime

RUN apt-get update && \
apt-get install -y libcurl4-openssl-dev
apt-get install -y libcurl4-openssl-dev curl

COPY --from=build /app/build/bin/llama-server /llama-server

ENV LC_ALL=C.utf8

HEALTHCHECK CMD [ "curl", "-f", "http://localhost:8080/health" ]

ENTRYPOINT [ "/llama-server" ]
6 changes: 4 additions & 2 deletions .devops/llama-server-rocm.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -36,15 +36,17 @@ COPY . .
# Set nvcc architecture
ENV GPU_TARGETS=${ROCM_DOCKER_ARCH}
# Enable ROCm
ENV LLAMA_HIPBLAS=1
ENV GGML_HIPBLAS=1
ENV CC=/opt/rocm/llvm/bin/clang
ENV CXX=/opt/rocm/llvm/bin/clang++

# Enable cURL
ENV LLAMA_CURL=1
RUN apt-get update && \
apt-get install -y libcurl4-openssl-dev
apt-get install -y libcurl4-openssl-dev curl

RUN make -j$(nproc) llama-server

HEALTHCHECK CMD [ "curl", "-f", "http://localhost:8080/health" ]

ENTRYPOINT [ "/app/llama-server" ]
12 changes: 5 additions & 7 deletions .devops/llama-server-vulkan.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -5,20 +5,16 @@ FROM ubuntu:$UBUNTU_VERSION as build
# Install build tools
RUN apt update && apt install -y git build-essential cmake wget

# Install Vulkan SDK
# Install Vulkan SDK and cURL
RUN wget -qO - https://packages.lunarg.com/lunarg-signing-key-pub.asc | apt-key add - && \
wget -qO /etc/apt/sources.list.d/lunarg-vulkan-jammy.list https://packages.lunarg.com/vulkan/lunarg-vulkan-jammy.list && \
apt update -y && \
apt-get install -y vulkan-sdk

# Install cURL
RUN apt-get update && \
apt-get install -y libcurl4-openssl-dev
apt-get install -y vulkan-sdk libcurl4-openssl-dev curl

# Build it
WORKDIR /app
COPY . .
RUN cmake -B build -DLLAMA_VULKAN=1 -DLLAMA_CURL=1 && \
RUN cmake -B build -DGGML_VULKAN=1 -DLLAMA_CURL=1 && \
cmake --build build --config Release --target llama-server

# Clean up
Expand All @@ -28,4 +24,6 @@ RUN cp /app/build/bin/llama-server /llama-server && \

ENV LC_ALL=C.utf8

HEALTHCHECK CMD [ "curl", "-f", "http://localhost:8080/health" ]

ENTRYPOINT [ "/llama-server" ]
4 changes: 3 additions & 1 deletion .devops/llama-server.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ ARG UBUNTU_VERSION=22.04
FROM ubuntu:$UBUNTU_VERSION as build

RUN apt-get update && \
apt-get install -y build-essential git libcurl4-openssl-dev
apt-get install -y build-essential git libcurl4-openssl-dev curl

WORKDIR /app

Expand All @@ -22,4 +22,6 @@ COPY --from=build /app/llama-server /llama-server

ENV LC_ALL=C.utf8

HEALTHCHECK CMD [ "curl", "-f", "http://localhost:8080/health" ]

ENTRYPOINT [ "/llama-server" ]
Loading
Loading