[pull] master from ggerganov:master #163

pull · 2024-12-23T10:12:03Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.1)

Can you help keep this open source service alive? 💖 Please sponsor : )

* server : add system_fingerprint to chat/completion * update README

* server : fix missing model id in /model endpoint * fix ci

ggml-ci

* llama : the WPM vocabs use the CLS token as BOS ggml-ci * llama : add comment

* llama_server_response_fields * llama_server_response_fields_fix_issues * params fixes * fix * clarify docs * change to "response_fields" --------- Co-authored-by: Xuan Son Nguyen <[email protected]>

* more perfo with llamafile tinyblas on x86_64. - add bf16 suport - change dispache strategie (thanks: ikawrakow/ik_llama.cpp#71 ) - reduce memory bandwidth simple tinyblas dispache and more cache freindly * tinyblas dynamic dispaching * sgemm: add M blocs. * - git 2.47 use short id of len 9. - show-progress is not part of GNU Wget2 * remove not stable test

…ngs endpoints (#10967) * add support for base64 * fix base64 test * improve test --------- Co-authored-by: Xuan Son Nguyen <[email protected]>

Warning types fixed (observed under MSYS2 GCC 14.2.0): * format '%ld' expects argument of type 'long int', but argument has type 'size_t' * llama.cpp/ggml/src/ggml-vulkan/vulkan-shaders/vulkan-shaders-gen.cpp:81:46: warning: missing initializer for member '_STARTUPINFOA::lpDesktop' [-Wmissing-field-initializers] (emitted for all struct field except first)

* multi row k quant shaders! * better row selection * more row choices * readjust row selection * rm_kq=2 by default

…0987)

* tests: Add im2col perf tests * vulkan: optimize im2col, more elements per thread * vulkan: increase small tile size for NV_coopmat2 * vulkan: change im2col to 512 elements per workgroup

Make the mul_mat_vec shaders support N>1 (as a spec constant, NUM_COLS) where the batch_strides are overloaded to hold the row strides. Put the loads from the B matrix in the innermost loop because it should cache better. Share some code for reducing the result values to memory in mul_mat_vec_base.

…hen building with LLAMA_CURL=ON and GGML_OPENCL=ON (#11013) In common/common.cpp: * Convert usage of stat() function call to check if file exists to standard library function std::filesystem::exists (error unable to match to correct function signature) * Additional conditions to check if PATH_MAX is already defined in WIN32 environment (warning it is already defined in MSYS2) In examples/run/run.cpp: * Add io.h header inclusion (error cannot find function _get_osfhandle) * Change initialisers for OVERLAPPED to empty struct (warning about uninitialised members) * Add initialiser for hFile (warning it may be uninitialised) * Add cast for curl_off_t percentage value to long int in generate_progress_prefix function (warning that curl_off_t is long long int) In ggml/src/ggml-opencl/ggml-opencl.cpp: * Initialise certain declared cl_mem variables to nullptr for greater safety (warning about B_d variable possibly used unassigned)

* conflict resolution * move comments after bracket to its own line * DeciLMCausalModel now reads rope_theta from config.json properly

* server : add OAI compat for /v1/completions * add test * add docs * better docs

* server : clean up built-in template detection * fix compilation * add chat template test * fix condition

* Fixes for clang AVX VNNI * enable AVX VNNI and alder lake build for MSVC * Apply suggestions from code review --------- Co-authored-by: slaren <[email protected]>

rpc-server : add support for the SYCL backend (#10934)

86bf31c

pull bot added the ⤵️ pull label Dec 23, 2024

github-actions bot added the examples label Dec 23, 2024

server : add system_fingerprint to chat/completion (#10917)

485dc01

* server : add system_fingerprint to chat/completion * update README

github-actions bot added python server labels Dec 23, 2024

ngxson and others added 2 commits December 23, 2024 12:52

server : fix missing model id in /model endpoint (#10957)

14b699e

* server : fix missing model id in /model endpoint * fix ci

ggml : fix const usage in SSE path (#10962)

32d6ee6

github-actions bot added the ggml label Dec 23, 2024

slaren and others added 5 commits December 24, 2024 04:05

ggml : fix arm enabled features check (#10961)

3327bb0

ggml : use wstring for backend search paths (#10960)

60cfa72

ggml-ci

llama : the WPM vocabs use the CLS token as BOS (#10930)

30caac3

* llama : the WPM vocabs use the CLS token as BOS ggml-ci * llama : add comment

server: allow filtering llama server response fields (#10940)

09fe2e7

* llama_server_response_fields * llama_server_response_fields_fix_issues * params fixes * fix * clarify docs * change to "response_fields" --------- Co-authored-by: Xuan Son Nguyen <[email protected]>

github-actions bot added the script label Dec 24, 2024

elk-cloner and others added 2 commits December 24, 2024 21:33

server : add support for "encoding_format": "base64" to the */embeddi…

9ba399d

…ngs endpoints (#10967) * add support for base64 * fix base64 test * improve test --------- Co-authored-by: Xuan Son Nguyen <[email protected]>

github-actions bot added the Vulkan label Dec 26, 2024

netrunnereve and others added 5 commits December 26, 2024 16:54

vulkan: multi-row k quants (#10846)

d79d8f3

* multi row k quant shaders! * better row selection * more row choices * readjust row selection * rm_kq=2 by default

server : fix token duplication when streaming with stop strings (#10997)

16cdce7

server: added more docs for response_fields field (#10995)

f865ea1

vulkan: Use push constant offset to handle misaligned descriptors (#1…

fdd2188

…0987)

vulkan: im2col and matmul optimizations for stable diffusion (#10942)

a813bad

* tests: Add im2col perf tests * vulkan: optimize im2col, more elements per thread * vulkan: increase small tile size for NV_coopmat2 * vulkan: change im2col to 512 elements per workgroup

github-actions bot added the testing label Dec 29, 2024

android : fix llama_batch free (#11014)

c250ecb

github-actions bot added the android label Dec 30, 2024

jeffbolznv and others added 4 commits December 30, 2024 18:27

convert : fix Llama-3_1-Nemotron-51B rope settings (#11008)

bc7b1f8

* conflict resolution * move comments after bracket to its own line * DeciLMCausalModel now reads rope_theta from config.json properly

server : add OAI compat for /v1/completions (#10974)

5896c65

* server : add OAI compat for /v1/completions * add test * add docs * better docs

ngxson and others added 2 commits December 31, 2024 15:22

server : clean up built-in template detection (#11026)

45095a6

* server : clean up built-in template detection * fix compilation * add chat template test * fix condition

ggml : fixes for AVXVNNI instruction set with MSVC and Clang (#11027)

0827b2c

* Fixes for clang AVX VNNI * enable AVX VNNI and alder lake build for MSVC * Apply suggestions from code review --------- Co-authored-by: slaren <[email protected]>

teleprint-me closed this Jan 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] master from ggerganov:master #163

[pull] master from ggerganov:master #163

pull bot commented Dec 23, 2024 •

edited

Loading

[pull] master from ggerganov:master #163

[pull] master from ggerganov:master #163

Conversation

pull bot commented Dec 23, 2024 • edited Loading

pull bot commented Dec 23, 2024 •

edited

Loading