Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge upstream #46

Merged
merged 103 commits into from
Nov 27, 2024
Merged
Changes from 1 commit
Commits
Show all changes
103 commits
Select commit Hold shift + click to select a range
f245cc2
scripts : fix missing key in compare-llama-bench.py (#10332)
ggerganov Nov 16, 2024
bcdb7a2
server: (web UI) Add samplers sequence customization (#10255)
MaggotHATE Nov 16, 2024
8ee0d09
make : auto-determine dependencies (#0)
ggerganov Nov 16, 2024
db4cfd5
llamafile : fix include path (#0)
ggerganov Nov 16, 2024
4e54be0
llama/ex: remove --logdir argument (#10339)
JohannesGaessler Nov 16, 2024
0fff7fd
docs : vulkan build instructions to use git bash mingw64 (#10303)
FirstTimeEZ Nov 16, 2024
5c9a8b2
scripts : update sync
ggerganov Nov 16, 2024
8a43e94
ggml: new optimization interface (ggml/988)
JohannesGaessler Nov 16, 2024
68fcb47
ggml : fix compile warnings (#0)
ggerganov Nov 16, 2024
84274a1
tests : remove test-grad0
ggerganov Nov 16, 2024
a4200ca
make : add ggml-opt (#0)
ggerganov Nov 16, 2024
5d9e599
ggml : adapt AMX to tensor->grad removal (#0)
ggerganov Nov 16, 2024
24203e9
ggml : inttypes.h -> cinttypes (#0)
ggerganov Nov 16, 2024
eda7e1d
ggml : fix possible buffer use after free in sched reserve (#9930)
slaren Nov 17, 2024
467576b
CMake: default to -arch=native for CUDA build (#10320)
JohannesGaessler Nov 17, 2024
c3ea58a
CUDA: remove DMMV, consolidate F16 mult mat vec (#10318)
JohannesGaessler Nov 17, 2024
a431782
ggml : fix undefined reference to 'getcpu' (#10354)
FirstTimeEZ Nov 17, 2024
cf32a9b
metal : refactor kernel args into structs (#10238)
ggerganov Nov 17, 2024
20a780c
gitignore : ignore local run scripts [no ci]
ggerganov Nov 17, 2024
be5cacc
llama : only use default buffer types for the KV cache (#10358)
slaren Nov 17, 2024
ce2e59b
CMake: fix typo in comment [no ci] (#10360)
JohannesGaessler Nov 17, 2024
76e9e58
CUDA: fix MMV kernel being used for FP16 src1 (#10357)
JohannesGaessler Nov 17, 2024
75207b3
docker: use GGML_NATIVE=OFF (#10368)
JohannesGaessler Nov 17, 2024
9b75f03
Vulkan: Fix device info output format specifiers (#10366)
0cc4m Nov 18, 2024
2eb76b2
flake.lock: Update (#10346)
ggerganov Nov 18, 2024
f139d2e
vulkan: remove use of null initializer (#10372)
jeffbolznv Nov 18, 2024
531cb1c
Skip searching root path for cross-compile builds (#10383)
bandoti Nov 18, 2024
d3481e6
cuda : only use native when supported by cmake (#10389)
slaren Nov 18, 2024
557924f
sycl: Revert MUL_MAT_OP support changes (#10385)
Alcpz Nov 19, 2024
b3e5859
vulkan: Optimize soft_max (#10301)
jeffbolznv Nov 19, 2024
2a1507c
sycl : Add option to set the SYCL architecture for all targets (#10266)
Rbiessy Nov 19, 2024
a88ad00
llama : add OLMo November 2024 support (#10394)
2015aroras Nov 19, 2024
8e752a7
llama : add check for KV cache shifts (#10401)
ggerganov Nov 19, 2024
3ee6382
cuda : fix CUDA_FLAGS not being applied (#10403)
slaren Nov 19, 2024
2a11b6b
Add required ggml-base and backend libs to cmake pkg (#10407)
bandoti Nov 19, 2024
342397d
cmake: force MSVC compiler charset to utf-8 (#9989)
shou692199 Nov 19, 2024
12b0ad9
metal : add `GGML_UNARY_OP_ELU` kernel (ggml/1018)
PABannier Nov 18, 2024
611fabd
metal : fox offset integer overflows in im2col (ggml/1015)
pminev Nov 18, 2024
9fe0fb0
sync : ggml
ggerganov Nov 19, 2024
42ae10b
add cmake rvv support (#10411)
lhpqaq Nov 19, 2024
3952a22
Fix missing file renames in Makefile due to changes in commit ae8de6d…
avdg Nov 19, 2024
ad21c9e
update rel to 4040 (#10395)
NeoZhangJianyu Nov 20, 2024
1bacb9f
vulkan: further optimize mul_mat_vec using larger loads (#10387)
jeffbolznv Nov 20, 2024
8fd4b7f
vulkan: copy iq4_nl LUT into shared memory (#10409)
jeffbolznv Nov 20, 2024
fab5d30
llama : add .clang-format file (#10415)
slaren Nov 20, 2024
f95caa7
cmake: add link dependencies to cmake find pkg (#10433)
bandoti Nov 20, 2024
9abe9ee
vulkan: predicate max operation in soft_max shaders/soft_max (#10437)
jeffbolznv Nov 20, 2024
02e4eaf
ggml-opt: fix data corruption (ggml/1022)
JohannesGaessler Nov 20, 2024
59b9172
ggml/sched : do not skip views in pre-assignments
slaren Nov 20, 2024
87a533b
sync : ggml
ggerganov Nov 21, 2024
1bb30bf
llama : handle KV shift for recurrent models (#10402)
ggerganov Nov 21, 2024
a5e4759
cuda : optimize argmax (#10441)
slaren Nov 21, 2024
c18610b
CANN: Support Ascend310P to accelerate F32 and F16 Model (#10216)
leo-pony Nov 22, 2024
599b3e0
GitHub: ask for more info in issue templates (#10426)
JohannesGaessler Nov 22, 2024
6dfcfef
ci: Update oneAPI runtime dll packaging (#10428)
shou692199 Nov 22, 2024
55ed008
ggml : do not use ARM features not included in the build (#10457)
slaren Nov 23, 2024
96fa2c5
fix gguf-py: Conversion error when multiple licenses are configured …
mmngays Nov 24, 2024
9336db4
convert : XLMRoberta Type Vocab Size (#10458)
gabe-l-hart Nov 24, 2024
dc39012
llama : fix op mul check with command-r-plus (#10476)
slaren Nov 24, 2024
cce5a90
flake.lock: Update (#10470)
ggerganov Nov 24, 2024
d9d54e4
speculative : refactor and add a simpler example (#10362)
ggerganov Nov 25, 2024
5a89877
[SYCL] Fix building Win package for oneAPI 2025.0 update (#10483)
NeoZhangJianyu Nov 25, 2024
b756441
metal : minor code formatting
ggerganov Nov 25, 2024
f6d12e7
tests : fix compile warning
ggerganov Nov 25, 2024
5931c1f
ggml : add support for dynamic loading of backends (#10469)
slaren Nov 25, 2024
9ca2e67
server : add speculative decoding support (#10455)
ggerganov Nov 25, 2024
a9a678a
Add download chat feature to server chat (#10481)
brucepro Nov 25, 2024
1f92225
Github: update issue templates [no ci] (#10489)
JohannesGaessler Nov 25, 2024
10bce04
llama : accept a list of devices to use to offload a model (#10497)
slaren Nov 25, 2024
80acb7b
Rename Olmo1124 to Olmo2 (#10500)
2015aroras Nov 25, 2024
106964e
metal : enable mat-vec kernels for bs <= 4 (#10491)
ggerganov Nov 25, 2024
47f931c
server : enable cache_prompt by default (#10501)
ggerganov Nov 25, 2024
9fd8c26
server : add more information about error (#10455)
ggerganov Nov 25, 2024
50d5cec
ci : build docker images only once daily (#10503)
slaren Nov 25, 2024
0cc6375
Introduce llama-run (#10291)
ericcurtin Nov 25, 2024
0eb4e12
vulkan: Fix a vulkan-shaders-gen arugment parsing error (#10484)
sparkleholic Nov 26, 2024
7066b4c
CANN: RoPE and CANCAT operator optimization (#10488)
noemotiovon Nov 26, 2024
9a4b79b
CANN: Improve the Inferencing Performance for Ascend NPU Device (#10454)
shen-shanshan Nov 26, 2024
811872a
speculative : simplify the implementation (#10504)
ggerganov Nov 26, 2024
84e1c33
server : fix parallel speculative decoding (#10513)
ggerganov Nov 26, 2024
25669aa
ggml-cpu: cmake add arm64 cpu feature check for macos (#10487)
chaxu01 Nov 26, 2024
c6807b3
ci : add ubuntu cuda build, build with one arch on windows (#10456)
slaren Nov 26, 2024
7db3846
ci : publish the docker images created during scheduled runs (#10515)
slaren Nov 26, 2024
ab96610
cmake : enable warnings in llama (#10474)
ggerganov Nov 26, 2024
0bbd226
restore the condistion to build & update pacakge when merge (#10507)
NeoZhangJianyu Nov 26, 2024
45abe0f
server : replace behave with pytest (#10416)
ngxson Nov 26, 2024
904109e
vulkan: fix group_norm (#10496)
jeffbolznv Nov 26, 2024
249cd93
mtgpu: Add MUSA_DOCKER_ARCH in Dockerfiles && update cmake and make (…
yeahdongcn Nov 26, 2024
be0e350
Fix HIP flag inconsistency & build docs (#10524)
tristandruyen Nov 26, 2024
30ec398
llama : disable warnings for 3rd party sha1 dependency (#10527)
slaren Nov 26, 2024
5a349f2
ci : remove nix workflows (#10526)
slaren Nov 26, 2024
de50973
Add OLMo 2 model in docs (#10530)
2015aroras Nov 26, 2024
c9b00a7
ci : fix cuda releases (#10532)
slaren Nov 26, 2024
4a57d36
vulkan: optimize Q2_K and Q3_K mul_mat_vec (#10459)
jeffbolznv Nov 27, 2024
71a6498
vulkan: skip integer div/mod in get_offsets for batch_idx==0 (#10506)
jeffbolznv Nov 27, 2024
249a790
vulkan: further optimize q5_k mul_mat_vec (#10479)
jeffbolznv Nov 27, 2024
5b3466b
vulkan: Handle GPUs with less shared memory (#10468)
jeffbolznv Nov 27, 2024
c31ed2a
vulkan: define all quant data structures in types.comp (#10440)
jeffbolznv Nov 27, 2024
9150f8f
Do not include arm_neon.h when compiling CUDA code (ggml/1028)
frankier Nov 26, 2024
fee824a
sync : ggml
ggerganov Nov 27, 2024
9e2301f
metal : fix group_norm support condition (#0)
ggerganov Nov 27, 2024
46c69e0
ci : faster CUDA toolkit installation method and use ccache (#10537)
slaren Nov 27, 2024
289e208
Merge branch 'layla-build' into merge
l3utterfly Nov 27, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
update rel to 4040 (ggerganov#10395)
Co-authored-by: arthw <[email protected]>
NeoZhangJianyu and arthw authored Nov 20, 2024

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
commit ad21c9e1f14d82b8c15ae369a8839019e3d498b4
7 changes: 4 additions & 3 deletions docs/backend/SYCL.md
Original file line number Diff line number Diff line change
@@ -34,9 +34,10 @@ The SYCL backend would be broken by some PRs due to no online CI.

The following release is verified with good quality:

|Commit ID|Tag|Release|Verified Platform|
|-|-|-|-|
|fb76ec31a9914b7761c1727303ab30380fd4f05c|b3038 |[llama-b3038-bin-win-sycl-x64.zip](https://github.com/ggerganov/llama.cpp/releases/download/b3038/llama-b3038-bin-win-sycl-x64.zip) |Arc770/Linux/oneAPI 2024.1<br>MTL Arc GPU/Windows 11/oneAPI 2024.1|
|Commit ID|Tag|Release|Verified Platform| Update date|
|-|-|-|-|-|
|3bcd40b3c593d14261fb2abfabad3c0fb5b9e318|b4040 |[llama-b4040-bin-win-sycl-x64.zip](https://github.com/ggerganov/llama.cpp/releases/download/b4040/llama-b4040-bin-win-sycl-x64.zip) |Arc770/Linux/oneAPI 2024.1<br>MTL Arc GPU/Windows 11/oneAPI 2024.1| 2024-11-19|
|fb76ec31a9914b7761c1727303ab30380fd4f05c|b3038 |[llama-b3038-bin-win-sycl-x64.zip](https://github.com/ggerganov/llama.cpp/releases/download/b3038/llama-b3038-bin-win-sycl-x64.zip) |Arc770/Linux/oneAPI 2024.1<br>MTL Arc GPU/Windows 11/oneAPI 2024.1||


## News