[pull] master from ggerganov:master #162

pull · 2024-12-18T04:12:03Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.1)

Can you help keep this open source service alive? 💖 Please sponsor : )

…I. (#10868) * Bump model_template to 16384 bytes to support larger chat templates. * Use `model->gguf_kv` for efficiency.

This reverts commit 382bc7f.

Related to #10524 / be0e350 references to hipBLAS have been removed across the repository. This fixes the link from the repositories `README.md`. Signed-off-by: Brian 'redbeard' Harrington <[email protected]>

…0872) * server : (embeddings) using same format for "input" and "content" * fix test case * handle empty input case * fix test

* server : add "tokens" output ggml-ci * server : update readme ggml-ci * server : return tokens ids only if requested ggml-ci * tests : improve "tokens" type check Co-authored-by: Xuan Son Nguyen <[email protected]> * server : remove "tokens" from the OAI endpoint ggml-ci --------- Co-authored-by: Xuan Son Nguyen <[email protected]>

* server : add "tokens" output ggml-ci * server : output embeddings for all tokens when pooling = none ggml-ci * server : update readme [no ci] * server : fix spacing [no ci] Co-authored-by: Xuan Son Nguyen <[email protected]> * server : be explicit about the pooling type in the tests ggml-ci * server : update /embeddings and /v1/embeddings endpoints ggml-ci * server : do not normalize embeddings when there is no pooling ggml-ci * server : update readme ggml-ci * server : fixes * tests : update server tests ggml-ci * server : update readme [no ci] * server : remove rebase artifact --------- Co-authored-by: Xuan Son Nguyen <[email protected]>

* server: avoid overwriting Authorization header If no API key is set, leave the Authorization header as is. It may be used by another part of the Web stack, such as an authenticating proxy. Fixes #10854 * rebuild --------- Co-authored-by: Xuan Son Nguyen <[email protected]>

* server : add "tokens" output ggml-ci * server : output embeddings for all tokens when pooling = none ggml-ci * server : be explicit about the pooling type in the tests ggml-ci * server : do not normalize embeddings when there is no pooling ggml-ci * llama : add OuteTTS support (wip) * wip * extract features * first conv * group norm * resnet conv * resnet * attn * pos net * layer norm * convnext * head * hann window * fix n_embd + remove llama.cpp hacks * compute hann window * fft * spectrum processing * clean-up * tts : receive input text and generate codes * clip : fix new conv name * tts : minor fix * tts : add header + minor fixes ggml-ci * tts : add matchematical constant ggml-ci * tts : fix sampling + cut initial noise * tts : fixes * tts : update default samplers ggml-ci * tts : text pre-processing * tts : outetts-voc -> wavtokenizer-dec * tts : remove hardcoded constants ggml-ci * tts : fix tensor shapes * llama : refactor wavtokenizer tensors ggml-ci * cont ggml-ci * cont [no ci] * llama : update WavTokenizer to non-causal attn * llama : handle no-vocab detokenization * tts : add Python example for OuteTTS (wip) * tts : extend python example to generate spectrogram ggml-ci * server : fix rebase artifacts * tts : enable "return_tokens" in Python example ggml-ci * tts : minor fixes * common : support HF download for vocoder

* ggml: GGML_NATIVE uses -mcpu=native on ARM Signed-off-by: Adrien Gallouët <[email protected]> * ggml: Show detected features with GGML_NATIVE Signed-off-by: Adrien Gallouët <[email protected]> * remove msvc support, add GGML_CPU_ARM_ARCH option * disable llamafile in android example * march -> mcpu, skip adding feature macros ggml-ci --------- Signed-off-by: Adrien Gallouët <[email protected]> Co-authored-by: Adrien Gallouët <[email protected]>

Set default width to whatever the terminal is. Also fixed a small bug around default n_gpu_layers value. Signed-off-by: Eric Curtin <[email protected]>

* convert : use GPT2 vocab for Phi-4 model * convert : use null value of sliding_window to distinguish Phi-4 from other PHI3-based models * llama : do not use sliding window attention mask for Phi-4 model --------- Co-authored-by: Stanisław Szymczyk <[email protected]>

* fix: Use gpt2 tokenizer for roberta and add eos/bos tokens Branch: RobertaTokenizer Signed-off-by: Gabe Goodhart <[email protected]> * fixes to position embeddings Signed-off-by: Sukriti-Sharma4 <[email protected]> * map roberta-bpe to gpt-2 Signed-off-by: Sukriti-Sharma4 <[email protected]> * fix linting Signed-off-by: Sukriti-Sharma4 <[email protected]> --------- Signed-off-by: Gabe Goodhart <[email protected]> Signed-off-by: Sukriti-Sharma4 <[email protected]> Co-authored-by: Gabe Goodhart <[email protected]>

Signed-off-by: Adrien Gallouët <[email protected]>

* server : fix logprobs, make it openai-compatible * update docs * add std::log * return pre-sampling p * sort before apply softmax * add comment * fix test * set p for sampled token * update docs * add --multi-token-probs * update docs * add `post_sampling_probs` option * update docs [no ci] * remove --multi-token-probs * "top_probs" with "post_sampling_probs" * resolve review comments * rename struct token_prob to prob_info * correct comment placement * fix setting prob for sampled token

ggml-ci

* Enable --no-context-shift for llama-perplexity example Signed-off-by: Molly Sophia <[email protected]> * RWKV 6: Fix error in ggml_cuda_op_bin_bcast Signed-off-by: Molly Sophia <[email protected]> --------- Signed-off-by: Molly Sophia <[email protected]>

* Migrate to tensor->buffer for checking backend buffer type: 1 * SYCL: common.cpp try to migrate away from tensor->backend * SYCL: fix assertions and add proper comments * SYCL: remove extra space * SYCL: Add back static to ggml_backend_buffer_is_sycl_split function * SYCL: Add pragma directive to suppress warning spam * SYCL: Integrate debug logs with GGML_LOG and other fixes * Revert "SYCL: Integrate debug logs with GGML_LOG and other fixes" This reverts commit 2607b7d. Let's keep the current SYCL specific logging mechanism for now * SYCL: Use GGML_SYCL_DEBUG after reverting * SYCL: reg_get_proc_address func, update to the current func signature * SYCL: Refactor SYCL buffer checks in ggml_sycl_cpy_tensor_2d

…() (#10874) * ggml-cpu: replace NEON asm with intrinsics in ggml_gemv_q4_0_4x8_q8_0() Signed-off-by: Adrien Gallouët <[email protected]> * ggml-cpu: format code Signed-off-by: Adrien Gallouët <[email protected]> --------- Signed-off-by: Adrien Gallouët <[email protected]>

Change the code to do 16b loads when possible and extract the appropriate component late, so the code is effectively decoding a pair of elements and then selecting one. This can allow more commoning to happen in the compiler when neighboring elements are loaded.

* vulkan: build fixes for 32b Should fix #10923 * vulkan: initialize some buffer/offset variables

dranger003 and others added 2 commits December 17, 2024 23:24

Use model->gguf_kv for loading the template instead of using the C AP…

d62b532

…I. (#10868) * Bump model_template to 16384 bytes to support larger chat templates. * Use `model->gguf_kv` for efficiency.

Revert "llama : add Falcon3 support (#10864)" (#10876)

4da69d1

This reverts commit 382bc7f.

pull bot added the ⤵️ pull label Dec 18, 2024

github-actions bot added the python label Dec 18, 2024

brianredbeard and others added 2 commits December 18, 2024 10:35

docs: Fix HIP (née hipBLAS) in README (#10880)

6b064c9

Related to #10524 / be0e350 references to hipBLAS have been removed across the repository. This fixes the link from the repositories `README.md`. Signed-off-by: Brian 'redbeard' Harrington <[email protected]>

server : (embeddings) using same format for "input" and "content" (#1…

4682887

…0872) * server : (embeddings) using same format for "input" and "content" * fix test case * handle empty input case * fix test

github-actions bot added examples server labels Dec 18, 2024

ggerganov and others added 4 commits December 18, 2024 11:05

github-actions bot added the ggml label Dec 18, 2024

github-actions bot added the android label Dec 18, 2024

ericcurtin and others added 2 commits December 19, 2024 03:58

llama-run : improve progress bar (#10821)

7909e85

Set default width to whatever the terminal is. Also fixed a small bug around default n_gpu_layers value. Signed-off-by: Eric Curtin <[email protected]>

tests: disable GGUF test for bad value size (#10886)

cd920d0

github-actions bot added the testing label Dec 19, 2024

fairydreaming and others added 11 commits December 19, 2024 10:37

ggml: fix arm build with gcc (#10895)

a3c33b1

Signed-off-by: Adrien Gallouët <[email protected]>

tts : small QoL for easy model fetch (#10903)

36319de

llama : minor grammar refactor (#10897)

5cab3e4

ggml-ci

clip : disable GPU support (#10896)

d408bb9

ggml-ci

ggml : add test for SVE and disable when it fails (#10906)

21ae3b9

server : (UI) fix copy to clipboard function (#10916)

0ca416c

github-actions bot added the SYCL label Dec 20, 2024

angt and others added 2 commits December 21, 2024 00:33

github-actions bot added the Vulkan label Dec 21, 2024

ggerganov and others added 2 commits December 21, 2024 10:10

convert : add BertForMaskedLM (#10919)

5cd85b5

vulkan: build fixes for 32b (#10927)

ebdee94

* vulkan: build fixes for 32b Should fix #10923 * vulkan: initialize some buffer/offset variables

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] master from ggerganov:master #162

[pull] master from ggerganov:master #162

pull bot commented Dec 18, 2024 •

edited

Loading

[pull] master from ggerganov:master #162

Are you sure you want to change the base?

[pull] master from ggerganov:master #162

Conversation

pull bot commented Dec 18, 2024 • edited Loading

pull bot commented Dec 18, 2024 •

edited

Loading