merge upstream #46

l3utterfly · 2024-11-27T12:01:36Z

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

* Samplers sequence: simplified and input field. * Removed unused function * Modify and use `settings-modal-short-input` * rename "name" --> "label" --------- Co-authored-by: Xuan Son Nguyen <[email protected]>

ggml-ci

…0303)

ggml-ci

…#9930)

ggerganov#10352

* metal : add kernel arg structs (wip) * metal : fattn args ggml-ci * metal : cont + avoid potential int overflow [no ci] * metal : mul mat struct (wip) * cont : mul mat vec * cont : pass by reference * cont : args is first argument * cont : use char ptr * cont : shmem style * cont : thread counters style * cont : mul mm id ggml-ci * cont : int safety + register optimizations ggml-ci * metal : GGML_OP_CONCAT ggml-ci * metal : GGML_OP_ADD, GGML_OP_SUB, GGML_OP_MUL, GGML_OP_DIV * metal : GGML_OP_REPEAT * metal : GGML_OP_CPY * metal : GGML_OP_RMS_NORM * metal : GGML_OP_NORM * metal : add TODOs for rest of ops * ggml : add ggml-metal-impl.h ggml-ci

* Vulkan: Fix device info output format specifiers * Vulkan: Use zu printf specifier for size_t instead of ld

Flake lock file updates: • Updated input 'nixpkgs': 'github:NixOS/nixpkgs/4aa36568d413aca0ea84a1684d2d46f55dbabad7?narHash=sha256-Zwl8YgTVJTEum%2BL%2B0zVAWvXAGbWAuXHax3KzuejaDyo%3D' (2024-11-05) → 'github:NixOS/nixpkgs/5e4fbfb6b3de1aa2872b76d49fafc942626e2add?narHash=sha256-OZiZ3m8SCMfh3B6bfGC/Bm4x3qc1m2SVEAlkV6iY7Yg%3D' (2024-11-15) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

Seems like this isn't working for vulkan-over-metal when the array is sized by a spec constant. Maybe a spirv-cross limitation?

* vulkan: Optimize soft_max Large soft_max could already saturate memory, but small/medium sizes were pretty slow. The bulk of the gains for them comes from using a smaller workgroup size, and making the workgroup size match the subgroup size also makes the barriers much cheaper. Cache some values in locals to avoid refetching/recomputing. And stamp out a few "template instantiations" so smaller cases will fully unroll. Add a missing early return for OOB rows. This happens when there are more than 512 rows and the dispatch is 512 x H. * vulkan: Further soft_max optimizations Restore the workgroup size of 512 case, use it for >1024. Use unrollable loops for more iteration counts.

* Add link to OLMo 2 model in docs * Change link to landing page

…ov#10506)

There have been reports of failure to compile on systems with <= 32KB of shared memory (e.g. ggerganov#10037). This change makes the large tile size fall back to a smaller size if necessary, and makes mul_mat_id fall back to CPU if there's only 16KB of shared memory.

…v#10537) * ci : faster CUDA toolkit installation method and use ccache * remove fetch-depth * only pack CUDA runtime on master

ggerganov and others added 30 commits November 16, 2024 10:32

scripts : fix missing key in compare-llama-bench.py (ggerganov#10332)

f245cc2

server: (web UI) Add samplers sequence customization (ggerganov#10255)

bcdb7a2

* Samplers sequence: simplified and input field. * Removed unused function * Modify and use `settings-modal-short-input` * rename "name" --> "label" --------- Co-authored-by: Xuan Son Nguyen <[email protected]>

make : auto-determine dependencies (#0)

8ee0d09

llamafile : fix include path (#0)

db4cfd5

ggml-ci

llama/ex: remove --logdir argument (ggerganov#10339)

4e54be0

docs : vulkan build instructions to use git bash mingw64 (ggerganov#1…

0fff7fd

…0303)

scripts : update sync

5c9a8b2

ggml: new optimization interface (ggml/988)

8a43e94

ggml : fix compile warnings (#0)

68fcb47

ggml-ci

tests : remove test-grad0

84274a1

make : add ggml-opt (#0)

a4200ca

ggml-ci

ggml : adapt AMX to tensor->grad removal (#0)

5d9e599

ggml-ci

ggml : inttypes.h -> cinttypes (#0)

24203e9

ggml-ci

ggml : fix possible buffer use after free in sched reserve (ggerganov…

eda7e1d

…#9930)

CMake: default to -arch=native for CUDA build (ggerganov#10320)

467576b

CUDA: remove DMMV, consolidate F16 mult mat vec (ggerganov#10318)

c3ea58a

ggml : fix undefined reference to 'getcpu' (ggerganov#10354)

a431782

ggerganov#10352

gitignore : ignore local run scripts [no ci]

20a780c

llama : only use default buffer types for the KV cache (ggerganov#10358)

be5cacc

CMake: fix typo in comment [no ci] (ggerganov#10360)

ce2e59b

CUDA: fix MMV kernel being used for FP16 src1 (ggerganov#10357)

76e9e58

docker: use GGML_NATIVE=OFF (ggerganov#10368)

75207b3

Vulkan: Fix device info output format specifiers (ggerganov#10366)

9b75f03

* Vulkan: Fix device info output format specifiers * Vulkan: Use zu printf specifier for size_t instead of ld

vulkan: remove use of null initializer (ggerganov#10372)

f139d2e

Seems like this isn't working for vulkan-over-metal when the array is sized by a spec constant. Maybe a spirv-cross limitation?

Skip searching root path for cross-compile builds (ggerganov#10383)

531cb1c

cuda : only use native when supported by cmake (ggerganov#10389)

d3481e6

sycl: Revert MUL_MAT_OP support changes (ggerganov#10385)

557924f

slaren and others added 13 commits November 26, 2024 21:13

ci : remove nix workflows (ggerganov#10526)

5a349f2

Add OLMo 2 model in docs (ggerganov#10530)

de50973

* Add link to OLMo 2 model in docs * Change link to landing page

ci : fix cuda releases (ggerganov#10532)

c9b00a7

vulkan: optimize Q2_K and Q3_K mul_mat_vec (ggerganov#10459)

4a57d36

vulkan: skip integer div/mod in get_offsets for batch_idx==0 (ggergan…

71a6498

…ov#10506)

vulkan: further optimize q5_k mul_mat_vec (ggerganov#10479)

249a790

vulkan: define all quant data structures in types.comp (ggerganov#10440)

c31ed2a

Do not include arm_neon.h when compiling CUDA code (ggml/1028)

9150f8f

sync : ggml

fee824a

metal : fix group_norm support condition (#0)

9e2301f

ci : faster CUDA toolkit installation method and use ccache (ggergano…

46c69e0

…v#10537) * ci : faster CUDA toolkit installation method and use ccache * remove fetch-depth * only pack CUDA runtime on master

Merge branch 'layla-build' into merge

289e208

l3utterfly merged commit 9e9162f into layla-build Nov 27, 2024
3 of 4 checks passed

l3utterfly deleted the merge branch November 27, 2024 12:02

github-actions bot added documentation Improvements or additions to documentation SYCL Nvidia GPU Vulkan testing build examples devops python server ggml Kompute Apple Metal script nix labels Nov 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

merge upstream #46

merge upstream #46

l3utterfly commented Nov 27, 2024 •

edited

Loading

merge upstream #46

merge upstream #46

Conversation

l3utterfly commented Nov 27, 2024 • edited Loading

l3utterfly commented Nov 27, 2024 •

edited

Loading