Releases · ngxson/llama.cpp

09 Mar 10:42

e1fa956

b2368

server : add SSL support (#5926)

* add cmake build toggle to enable ssl support in server

Signed-off-by: Gabe Goodhart <[email protected]>

* add flags for ssl key/cert files and use SSLServer if set

All SSL setup is hidden behind CPPHTTPLIB_OPENSSL_SUPPORT in the same
way that the base httlib hides the SSL support

Signed-off-by: Gabe Goodhart <[email protected]>

* Update readme for SSL support in server

Signed-off-by: Gabe Goodhart <[email protected]>

* Add LLAMA_SERVER_SSL variable setup to top-level Makefile

Signed-off-by: Gabe Goodhart <[email protected]>

---------

Signed-off-by: Gabe Goodhart <[email protected]>

Assets 14

08 Mar 14:24

github-actions

b2364

76e8688

b2364

server: metrics: add llamacpp:prompt_seconds_total and llamacpp:token…

Assets 14

08 Mar 10:32

github-actions

b2361

581ed5c

b2361

log : fix MSVC compile errors (#5643)

MSVC gives the following error with the existing macros:
`Error C2059 : syntax error: ','`

This patch adds `##` as a prefix to `__VA_ARGS__` to address this error.

Assets 14

02 Mar 10:12

github-actions

b2311

9bf297a

b2311

workflows : remove nocleanup arg for check-requirements.sh (#5826)

Reduces peak tmpfs usage and should prevent the check from failing from
running out of space.

Fixes the 'No space left on device' issue mentioned in #5703.

Assets 14

29 Feb 14:10

github-actions

b2296

d5ab297

b2296

llama : constified `llama_set_state_data`'s `src` (#5774)

Assets 14

28 Feb 21:10

github-actions

b2295

87c91c0

b2295

ci : reduce 3b ppl chunks to 1 to avoid timeout (#5771)

ggml-ci

Assets 14

27 Feb 20:58

github-actions

b2282

cb49e0f

b2282

Attempt to fix android build (#5752)

Co-authored-by: Iwan Kawrakow <[email protected]>

Assets 14

26 Feb 14:19

github-actions

b2271

67fd331

b2271

unicode : reuse iterator (#5726)

Assets 14

25 Feb 21:02

github-actions

b2264

bf08e00

b2264

llama : refactor k-shift implementation + KV defragmentation (#5691)

* llama : refactor k-shift implementation

ggml-ci

* llama : rename llama_kv_cache_seq_shift to llama_kv_cache_seq_add

* llama : cont k-shift refactoring + normalize type names

ggml-ci

* minor : fix MPI builds

* llama : reuse n_rot from the build context

ggml-ci

* llama : revert enum name changes from this PR

ggml-ci

* llama : update llama_rope_type

* llama : add comment about rope values

* llama : fix build

* passkey : apply kv cache updates explicitly

ggml-ci

* llama : change name to llama_kv_cache_update()

* llama : add llama_kv_cache_seq_pos_max()

* passkey : fix llama_kv_cache_seq_pos_max() usage

* llama : some llama_kv_cell simplifications

* llama : add llama_kv_cache_compress (EXPERIMENTAL)

* llama : add alternative KV cache merging (EXPERIMENTAL)

* llama : add llama_kv_cache_defrag

* llama : comments

* llama : remove llama_kv_cache_compress

will add in a separate PR

ggml-ci

* llama : defragment via non-overlapping moves

* llama : ggml_graph based defrag implementation

ggml-ci

* llama : switch the loop order in build_defrag

* llama : add comments

Assets 14

25 Feb 14:21

github-actions

b2259

930b178

b2259

server: logs - unified format and --log-format option (#5700)

* server: logs - always use JSON logger, add add thread_id in message, log task_id and slot_id

* server : skip GH copilot requests from logging

* server : change message format of server_log()

* server : no need to repeat log in comment

* server : log style consistency

* server : fix compile warning

* server : fix tests regex patterns on M2 Ultra

* server: logs: PR feedback on log level

* server: logs: allow to choose log format in json or plain text

* server: tests: output server logs in text

* server: logs switch init logs to server logs macro

* server: logs ensure value json value does not raised error

* server: logs reduce level VERBOSE to VERB to max 4 chars

* server: logs lower case as other log messages

* server: logs avoid static in general

Co-authored-by: Georgi Gerganov <[email protected]>

* server: logs PR feedback: change text log format to: LEVEL [function_name] message | additional=data

---------

Co-authored-by: Georgi Gerganov <[email protected]>

Assets 14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: ngxson/llama.cpp

b2368

b2364

b2361

b2311

b2296

b2295

b2282

b2271

b2264

b2259