Releases: ngxson/llama.cpp
Releases · ngxson/llama.cpp
b2368
server : add SSL support (#5926) * add cmake build toggle to enable ssl support in server Signed-off-by: Gabe Goodhart <[email protected]> * add flags for ssl key/cert files and use SSLServer if set All SSL setup is hidden behind CPPHTTPLIB_OPENSSL_SUPPORT in the same way that the base httlib hides the SSL support Signed-off-by: Gabe Goodhart <[email protected]> * Update readme for SSL support in server Signed-off-by: Gabe Goodhart <[email protected]> * Add LLAMA_SERVER_SSL variable setup to top-level Makefile Signed-off-by: Gabe Goodhart <[email protected]> --------- Signed-off-by: Gabe Goodhart <[email protected]>
b2364
server: metrics: add llamacpp:prompt_seconds_total and llamacpp:token…
b2361
log : fix MSVC compile errors (#5643) MSVC gives the following error with the existing macros: `Error C2059 : syntax error: ','` This patch adds `##` as a prefix to `__VA_ARGS__` to address this error.
b2311
workflows : remove nocleanup arg for check-requirements.sh (#5826) Reduces peak tmpfs usage and should prevent the check from failing from running out of space. Fixes the 'No space left on device' issue mentioned in #5703.
b2296
llama : constified `llama_set_state_data`'s `src` (#5774)
b2295
ci : reduce 3b ppl chunks to 1 to avoid timeout (#5771) ggml-ci
b2282
Attempt to fix android build (#5752) Co-authored-by: Iwan Kawrakow <[email protected]>
b2271
unicode : reuse iterator (#5726)
b2264
llama : refactor k-shift implementation + KV defragmentation (#5691) * llama : refactor k-shift implementation ggml-ci * llama : rename llama_kv_cache_seq_shift to llama_kv_cache_seq_add * llama : cont k-shift refactoring + normalize type names ggml-ci * minor : fix MPI builds * llama : reuse n_rot from the build context ggml-ci * llama : revert enum name changes from this PR ggml-ci * llama : update llama_rope_type * llama : add comment about rope values * llama : fix build * passkey : apply kv cache updates explicitly ggml-ci * llama : change name to llama_kv_cache_update() * llama : add llama_kv_cache_seq_pos_max() * passkey : fix llama_kv_cache_seq_pos_max() usage * llama : some llama_kv_cell simplifications * llama : add llama_kv_cache_compress (EXPERIMENTAL) * llama : add alternative KV cache merging (EXPERIMENTAL) * llama : add llama_kv_cache_defrag * llama : comments * llama : remove llama_kv_cache_compress will add in a separate PR ggml-ci * llama : defragment via non-overlapping moves * llama : ggml_graph based defrag implementation ggml-ci * llama : switch the loop order in build_defrag * llama : add comments
b2259
server: logs - unified format and --log-format option (#5700) * server: logs - always use JSON logger, add add thread_id in message, log task_id and slot_id * server : skip GH copilot requests from logging * server : change message format of server_log() * server : no need to repeat log in comment * server : log style consistency * server : fix compile warning * server : fix tests regex patterns on M2 Ultra * server: logs: PR feedback on log level * server: logs: allow to choose log format in json or plain text * server: tests: output server logs in text * server: logs switch init logs to server logs macro * server: logs ensure value json value does not raised error * server: logs reduce level VERBOSE to VERB to max 4 chars * server: logs lower case as other log messages * server: logs avoid static in general Co-authored-by: Georgi Gerganov <[email protected]> * server: logs PR feedback: change text log format to: LEVEL [function_name] message | additional=data --------- Co-authored-by: Georgi Gerganov <[email protected]>