[pull] master from ggerganov:master #151

pull · 2024-11-16T10:29:50Z

See Commits and Changes for more details.

Can you help keep this open source service alive? 💖 Please sponsor : )

* backend-cpu: add online flow for aarch64 Q4_0 GEMV/GEMM kernels --------- Co-authored-by: Diego Devesa <[email protected]>

* sycl: Use syclcompat::dp4a * Using the syclcompat version allow the compiler to optimize the operation with native function * Update news section * Update CI Windows oneAPI version to 2025.0 * Reword doc * Call syclcompat::dp4a inside dpct::dp4a This reverts commit 90cb61d.

Co-authored-by: noemotiovon <[email protected]>

* server : (web ui) add copy btn for code blocks * fix problem with api key * use settings-modal-short-input component * always show copy btn for code snippet

Signed-off-by: Xiaodong Ye <[email protected]>

* use 128 bit loads (i've tried 256->128 to death and its slower) * double accumulator * avx bf16 vec dot * +3% q4_0 inference * +7% tg +5% pp compared to master * slower f16c version, kep for reference * 256b version, also slow. i tried :) * revert f16 * faster with madd * split to functions * Q8_0 and IQ4_NL, 5-7% faster * fix potential overflow (performance reduced) * 16 bit add for q4_0 only * merge

ggml-ci

…ags (#10314)

fixes #10285

Compute two result elements per workgroup (for Q{4,5}_{0,1}). This reuses the B loads across the rows and also reuses some addressing calculations. This required manually partially unrolling the loop, since the compiler is less willing to unroll outer loops. Add bounds-checking on the last iteration of the loop. I think this was at least partly broken before. Optimize the Q4_K shader to vectorize most loads and reduce the number of bit twiddling instructions.

ggml-ci

#10352

* metal : add kernel arg structs (wip) * metal : fattn args ggml-ci * metal : cont + avoid potential int overflow [no ci] * metal : mul mat struct (wip) * cont : mul mat vec * cont : pass by reference * cont : args is first argument * cont : use char ptr * cont : shmem style * cont : thread counters style * cont : mul mm id ggml-ci * cont : int safety + register optimizations ggml-ci * metal : GGML_OP_CONCAT ggml-ci * metal : GGML_OP_ADD, GGML_OP_SUB, GGML_OP_MUL, GGML_OP_DIV * metal : GGML_OP_REPEAT * metal : GGML_OP_CPY * metal : GGML_OP_RMS_NORM * metal : GGML_OP_NORM * metal : add TODOs for rest of ops * ggml : add ggml-metal-impl.h ggml-ci

* Vulkan: Fix device info output format specifiers * Vulkan: Use zu printf specifier for size_t instead of ld

Flake lock file updates: • Updated input 'nixpkgs': 'github:NixOS/nixpkgs/4aa36568d413aca0ea84a1684d2d46f55dbabad7?narHash=sha256-Zwl8YgTVJTEum%2BL%2B0zVAWvXAGbWAuXHax3KzuejaDyo%3D' (2024-11-05) → 'github:NixOS/nixpkgs/5e4fbfb6b3de1aa2872b76d49fafc942626e2add?narHash=sha256-OZiZ3m8SCMfh3B6bfGC/Bm4x3qc1m2SVEAlkV6iY7Yg%3D' (2024-11-15) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

Seems like this isn't working for vulkan-over-metal when the array is sized by a spec constant. Maybe a spirv-cross limitation?

chaxu01 and others added 19 commits November 15, 2024 01:28

backend cpu: add online flow for aarch64 Q4_0 GEMV/GEMM kernels (#9921)

1607a5e

* backend-cpu: add online flow for aarch64 Q4_0 GEMV/GEMM kernels --------- Co-authored-by: Diego Devesa <[email protected]>

scripts : fix regex in sync [no ci]

4802ad3

cann: dockerfile and doc adjustment (#10302)

231f936

Co-authored-by: noemotiovon <[email protected]>

server : (web UI) add copy button for code block, fix api key (#10242)

9901068

* server : (web ui) add copy btn for code blocks * fix problem with api key * use settings-modal-short-input component * always show copy btn for code snippet

sycl: Update Intel docker images to use DPC++ 2025.0 (#10305)

57f8355

ci: build test musa with cmake (#10298)

f0204a0

Signed-off-by: Xiaodong Ye <[email protected]>

sync : ggml

cbf5541

ggml : vulkan logs (whisper/2547)

3225008

cmake : fix ppc64 check (whisper/0)

09ecbcb

ggml-ci

ggml : fix some build issues

883d206

scripts: update compare-llama-bench.py (#10319)

4047be7

Make updates to fix issues with clang-cl builds while using AVX512 fl…

74d73dc

…ags (#10314)

llama : save number of parameters and the size in llama_model (#10286)

89e4caa

fixes #10285

ggml : optimize Q4_0 into Q4_0_X_Y repack (#10324)

1e58ee1

vulkan : add cmake preset debug/release (#10306)

dd3a6ce

scripts : fix missing key in compare-llama-bench.py (#10332)

f245cc2

github-actions bot added documentation Improvements or additions to documentation examples devops python server ggml SYCL build script labels Nov 16, 2024

pull bot removed documentation Improvements or additions to documentation examples labels Nov 16, 2024

github-actions bot added the script label Nov 16, 2024

ggerganov and others added 11 commits November 16, 2024 20:36

make : auto-determine dependencies (#0)

8ee0d09

llamafile : fix include path (#0)

db4cfd5

ggml-ci

llama/ex: remove --logdir argument (#10339)

4e54be0

docs : vulkan build instructions to use git bash mingw64 (#10303)

0fff7fd

scripts : update sync

5c9a8b2

ggml: new optimization interface (ggml/988)

8a43e94

ggml : fix compile warnings (#0)

68fcb47

ggml-ci

tests : remove test-grad0

84274a1

make : add ggml-opt (#0)

a4200ca

ggml-ci

ggml : adapt AMX to tensor->grad removal (#0)

5d9e599

ggml-ci

ggml : inttypes.h -> cinttypes (#0)

24203e9

ggml-ci

github-actions bot added Nvidia GPU testing labels Nov 17, 2024

slaren and others added 15 commits November 17, 2024 08:31

ggml : fix possible buffer use after free in sched reserve (#9930)

eda7e1d

CMake: default to -arch=native for CUDA build (#10320)

467576b

CUDA: remove DMMV, consolidate F16 mult mat vec (#10318)

c3ea58a

ggml : fix undefined reference to 'getcpu' (#10354)

a431782

#10352

gitignore : ignore local run scripts [no ci]

20a780c

llama : only use default buffer types for the KV cache (#10358)

be5cacc

CMake: fix typo in comment [no ci] (#10360)

ce2e59b

CUDA: fix MMV kernel being used for FP16 src1 (#10357)

76e9e58

docker: use GGML_NATIVE=OFF (#10368)

75207b3

Vulkan: Fix device info output format specifiers (#10366)

9b75f03

* Vulkan: Fix device info output format specifiers * Vulkan: Use zu printf specifier for size_t instead of ld

vulkan: remove use of null initializer (#10372)

f139d2e

Seems like this isn't working for vulkan-over-metal when the array is sized by a spec constant. Maybe a spirv-cross limitation?

Skip searching root path for cross-compile builds (#10383)

531cb1c

cuda : only use native when supported by cmake (#10389)

d3481e6

teleprint-me closed this Nov 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] master from ggerganov:master #151

[pull] master from ggerganov:master #151

pull bot commented Nov 16, 2024 •

edited

Loading

[pull] master from ggerganov:master #151

[pull] master from ggerganov:master #151

Conversation

pull bot commented Nov 16, 2024 • edited Loading

pull bot commented Nov 16, 2024 •

edited

Loading