[pull] master from ggerganov:master #160

pull · 2024-12-13T10:12:06Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.1)

Can you help keep this open source service alive? 💖 Please sponsor : )

* Try to reduce some unused and typecast warnings * Reduce compiler warnings step 2 * add a newline at the end of the file * Initialize nreduce as size_t * [SYCL] Remove pragma directives from mmq.cpp * SYCL: mmq add condition to prevent blocks_per_tile_x_row variable from becoming 0 * SYCL softmax: Initialize nreduce as size_t * ggml-sycl.cpp: fix some trailing whitespaces * SYCL: remove the unused variables instead of commenting it out * SYCL poo2d kernel: set NAN for invalid pooling op * SYCL gemm.hpp: remove pragma directives * SYCL gemm.hpp: use const cast to properly support dnnl::memory * SYCL: wkv6 remove a comment * SYCL: clean comments step 2 * SYCL: clean comments and variables step 3 * SYCL: Use GGML_UNUSED for unused variables * SYCL: remove extra empty lines and a comment * Remove TODO * cleanup spaces * add a stdout for unsupported op * use sycl printf over fprintf * remove prints for CI * SYCL ggml-sycl: pool2D use sycl::nan and remove if-else block --------- Co-authored-by: Abhilash Majumder <[email protected]>

* double the number of rows per workgroup * Update ggml-vulkan.cpp * Vulkan: Add VK_EXT_subgroup_size_control support to ensure full subgroups for coopmats * only increase the number of rows for amd and subgroup size 64 * fix missing NUM_ROWS for mul_mat_vec_iq4_nl_f16_f32, untested * use subgroup min and max to check for gcn (requires #10721) * manual merge ggml-vulkan.cpp * set min and max subgroup size in any case * Also double the number of rows for Intel GPUs

…ctivity (#10812) * Fix crash caused by ggml_backend_load_all when launching on AndroidActivity. Details: Calling ggml_backend_load_all during initialization in the AndroidActivity project leads to a crash with the error: terminating with uncaught exception of type std::__ndk1::__fs::filesystem::filesystem_error: filesystem error: in directory_iterator::directory_iterator(...): Permission denied [./]. This issue occurs because AndroidActivity restricts file access due to sandboxing. Reproduction: In the example folder, the LlamaAndroid project can reproduce the crash by calling ggml_backend_load_all first in Java_android_llama_cpp_LLamaAndroid_backend_1init. * Update ggml/src/ggml-backend-reg.cpp --------- Co-authored-by: Diego Devesa <[email protected]>

Added support for positional arguments `model` and `prompt`. Added functionality to download via strings like: llama-run llama3 llama-run ollama://granite-code llama-run ollama://granite-code:8b llama-run hf://QuantFactory/SmolLM-135M-GGUF/SmolLM-135M.Q2_K.gguf llama-run huggingface://bartowski/SmolLM-1.7B-Instruct-v0.2-GGUF/SmolLM-1.7B-Instruct-v0.2-IQ3_M.gguf llama-run https://example.com/some-file1.gguf llama-run some-file2.gguf llama-run file://some-file3.gguf Signed-off-by: Eric Curtin <[email protected]>

…eno GPUs (#10693) * [cl][adreno] Add Adreno GPU support Add new OpenCL backend to support Adreno GPUs --------- Co-authored-by: Skyler Szot <[email protected]> Co-authored-by: Shangqing Gu <[email protected]> Co-authored-by: Alexander Angus <[email protected]> Co-authored-by: Hongqiang Wang <[email protected]> Co-authored-by: Max Krasnyansky <[email protected]> * [cl][ci] Add workflow for CL * [cl][adreno] Fix memory leak for non SMALL_ALLOC path * opencl: integrate backend dyn.load interface and fix compiler and format warnings * opencl: remove small-alloc support and fix build errors for non-opencl platforms * opencl: fixed merge conflict (MUSA added twice in cmake) * opencl-ci: use RUNNER_TEMP instead of github.workspace * opencl: fix embed tool invocation with python3 * opencl: CI workflow fixes * opencl: Clean up small-alloc in CMake files * opencl: cleanup ggml-opencl2 header file * opencl: use ulong for offsets and strides in ADD kernel * opencl: use cl_ulong for all offsets * opencl: use cl_ulong for sizes and strides * opencl: use `GGML_LOG_xxx` instead of `fprintf(stderr, ...)` * opencl: rename backend `opencl2` -> `opencl` * opencl: rename kernel files `ggml-opencl2` -> `ggml-opencl` * opencl: make OpenCL required, remove redundant lib and inc directories * `ggml-base`, `..` and `.` are added by `ggml_add_backend_library` * opencl: rename backend - funcs, structs, etc `opencl2` -> `opencl` * opencl: remove copyright marker since main license already covers * opencl: replace some more OPENCL2 leftovers * opencl: remove limits on `tensor_extra` * opencl: use pools for `tensor_extra` * opencl: fix compiler warnings with GCC and Clang Still getting the warning about clCreateCmdQueue being obsolete. Will fix that separately. * opencl: fail gracefully if opencl devices are not available Also for unsupported GPUs. * opencl: fix MSVC builds (string length error) * opencl: check for various requirements, allow deprecated API * opencl: update log message for unsupported GPUs --------- Co-authored-by: Skyler Szot <[email protected]> Co-authored-by: Shangqing Gu <[email protected]> Co-authored-by: Alexander Angus <[email protected]> Co-authored-by: Hongqiang Wang <[email protected]> Co-authored-by: Max Krasnyansky <[email protected]>

…eat lines as binary and therefore hidden by default (#10771) Signed-off-by: Charles Darke <[email protected]> Co-authored-by: Charles Darke <[email protected]>

* Barebone Qwen2VL LLM convertor * Add Qwen2VL cli entrypoint * [WIP] add qwen2vl arch * Verify m-rope output * Add vl-rope/2d-rope support for qwen2vl ViT * update qwen2vl cli tool * update 5D tensor op workaround * [WIP] qwen2vl vision model * make batch and clip utils compatible with qwen2vl * [WIP] create inference workflow, gguf convert script but fix * correcting vision-rope behavior, add the missing last layer back to ViT * add arg parser to qwen2vl_surgery * replace variable size array with vector * cuda-gdb cmake preset * add fp32 mrope, vision rope kernel * add fp16 support for qwen2vl and m-rope * add `GGML_ROPE_TYPE_MROPE`, `GGML_ROPE_TYPE_VISION` * fix rope op mode switching, out dated func args * update `llama_hparams` * update to keep up stream changes * resolve linter, test errors * add makefile entry, update speical image padding token * add mrope unit test, fix few compiler warnings * rename `mrope` related function, params * minor updates on debug util, bug fixs * add `m-rope` testcase to `test-backend-ops` * Apply suggestions from code review Co-authored-by: Georgi Gerganov <[email protected]> * fix traililng whitespce * store `llama_hparams.rope_sections` with fixed size array * update position id tensor size check in GGML_OP_ROPE * minor updates * update `ggml_backend_*_supports_op` of unsupported backends * remote old `rope_section` compare operator --------- Co-authored-by: Georgi Gerganov <[email protected]>

This allows to reduce compile time when you are building for a single GPU.

* Update server JSON response. * Add unit test to check `has_new_line` JSON response * Remove `has_new_line` unit test changes. * Address code review comment: type check for `has_new_line` in unit test

* add code highlighting and math formatting * code cleanup * build public/index.html * rebuild public/index.html * fixed coding style * fixed coding style * style fixes * highlight: smaller bundle size, fix light & dark theme * remove katex * add bundle size check * add more languages * add php * reuse some langs * use gzip * Revert "remove katex" This reverts commit c0e5046. * use better maintained @vscode/markdown-it-katex * fix gzip non deterministic * ability to add a demo conversation for dev * fix latex rendering * add comment * latex codeblock as code --------- Co-authored-by: Xuan Son Nguyen <[email protected]>

…10836)

* Add deepseek v1 arch & gigachat template * improve template code * add readme * delete comments * remove comment * fix format * lint llama.cpp * fix order of deepseek and deepseek2, move gigachat temlate to the end of func * fix order of deepseek and deepseek2 in constants; mark shared exp as deepseek arch need * remove comments * move deepseek above deepseek2 * change placement of gigachat chat template

* Allow locally downloaded models for QwenVL * Define model_path * rm trailing space --------- Co-authored-by: Xuan Son Nguyen <[email protected]>

qnixsynapse and others added 2 commits December 13, 2024 12:12

pull bot added the ⤵️ pull label Dec 13, 2024

github-actions bot added ggml Vulkan SYCL labels Dec 13, 2024

sienaiwun and others added 2 commits December 13, 2024 13:56

gguf-py : numpy 2 newbyteorder fix (#9772)

4601a8b

github-actions bot added the python label Dec 13, 2024

fix: graceful shutdown for Docker images (#10815)

11e07fd

github-actions bot added the devops label Dec 13, 2024

github-actions bot added the examples label Dec 13, 2024

lhez and others added 2 commits December 13, 2024 12:23

Removes spurious \r in output that causes logging in journalctl to tr…

56eea07

…eat lines as binary and therefore hidden by default (#10771) Signed-off-by: Charles Darke <[email protected]> Co-authored-by: Charles Darke <[email protected]>

github-actions bot added the server label Dec 13, 2024

github-actions bot added Nvidia GPU testing Apple Metal Kompute labels Dec 14, 2024

nix: allow to override rocm gpu targets (#10794)

e52aba5

This allows to reduce compile time when you are building for a single GPU.

github-actions bot added the nix label Dec 14, 2024

MichelleTanPY and others added 4 commits December 14, 2024 23:29

server: Fix has_next_line in JSON response (#10818)

89d604f

* Update server JSON response. * Add unit test to check `has_new_line` JSON response * Remove `has_new_line` unit test changes. * Address code review comment: type check for `has_new_line` in unit test

gguf-py : bump to v0.13.0

b5ae1dd

scripts : change build path to "build-bench" for compare-commits.sh (#…

87cf323

…10836)

github-actions bot added the script label Dec 15, 2024

Inf1delis and others added 2 commits December 15, 2024 19:02

llava : Allow locally downloaded models for QwenVL (#10833)

4ddd199

* Allow locally downloaded models for QwenVL * Define model_path * rm trailing space --------- Co-authored-by: Xuan Son Nguyen <[email protected]>

teleprint-me closed this Dec 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] master from ggerganov:master #160

[pull] master from ggerganov:master #160

pull bot commented Dec 13, 2024 •

edited

Loading

[pull] master from ggerganov:master #160

[pull] master from ggerganov:master #160

Conversation

pull bot commented Dec 13, 2024 • edited Loading

pull bot commented Dec 13, 2024 •

edited

Loading