update test 2 #8

kalomaze · 2023-12-23T23:25:35Z

No description provided.

Fix bug in identifying the grammar.

* build : Check the ROCm installation location * more generic approach * fixup! It was returning the path instead of the command output * fixup! Trailing whitespace

* llama.swiftui : add bench button * llama.swiftui : initial bench functionality * force to use n_gpu_layers on simulator * add download buttons & expose llamaState.loadModel * update project.pbxproj * comment #Preview & fix editorconfig check * gitignore : xcode stuff * llama.swiftui : UX improvements * llama.swiftui : avoid data copy via "downloadTask" * llama.swiftui : remove model from project * llama : remove "mostly" from model infos * llama.swiftui : improve bench --------- Co-authored-by: jhen <[email protected]>

…rganov#4519)

…4490) * phi2 implementation * fix breaking change * phi-2 : various fixes * phi-2 : use layer norm eps * py : whitespaces * llama : fix meta KV override bug * convert : phi don't add BOS token * convert : revert "added_tokens_decoder" change * phi-2 : scale Q instead of KQ for better precision * ggml : fix NeoX rope to rotate just first n_dims * cuda : less diff in the rope_neox kernel * ggml : add ggml_mul_mat_set_prec ggml-ci * Update ggml-cuda.cu Co-authored-by: slaren <[email protected]> * Update ggml-cuda.cu Co-authored-by: slaren <[email protected]> * cuda : ggml_cuda_op_mul_mat_cublas support F32 precision * cuda : remove oboslete comment --------- Co-authored-by: Ebey Abraham <[email protected]> Co-authored-by: Georgi Gerganov <[email protected]> Co-authored-by: slaren <[email protected]>

regression of ggerganov#4490 Adds defines for two new datatypes cublasComputeType_t, cudaDataType_t. Currently using deprecated hipblasDatatype_t since newer ones very recent.

* Add in model processes as a separate process so it can be killed when unloading to release memory on windows * Fix from Henky

# Conflicts: # Makefile # README.md

Co-authored-by: Eric Sommerlade <[email protected]>

* CUDA: make MoE tensors contiguous for batch size>1 * Update ggml-cuda.cu Co-authored-by: slaren <[email protected]> --------- Co-authored-by: slaren <[email protected]>

…gerganov#4554)

(cherry picked from commit e1f013b)

…4540) * allowed getting n_batch from llama_context in c api * changed to use `uint32_t` instead of `int` * changed to use `uint32_t` instead of `int` in `llama_n_ctx` * Update llama.h --------- Co-authored-by: Georgi Gerganov <[email protected]>

* llama : initial ggml-backend integration * add ggml-metal * cuda backend can be used though ggml-backend with LLAMA_GGML_BACKEND_CUDA_TEST access all tensor data with ggml_backend_tensor_get/set * add ggml_backend_buffer_clear zero-init KV cache buffer * add ggml_backend_buffer_is_hos, used to avoid copies if possible when accesing tensor data * disable gpu backends with ngl 0 * more accurate mlock * unmap offloaded part of the model * use posix_fadvise64(.., POSIX_FADV_SEQUENTIAL) to improve performance with mmap * update quantize and lora * update session copy/set to use ggml-backend ggml-ci * use posix_fadvise instead of posix_fadvise64 * ggml_backend_alloc_ctx_tensors_from_buft : remove old print * llama_mmap::align_offset : use pointers instead of references for out parameters * restore progress_callback behavior * move final progress_callback call to load_all_data * cuda : fix fprintf format string (minor) * do not offload scales * llama_mmap : avoid unmapping the same fragments again in the destructor * remove unnecessary unmap * metal : add default log function that prints to stderr, cleanup code ggml-ci --------- Co-authored-by: Georgi Gerganov <[email protected]>

* [github][workflows][docker]: removes hardcoded `ggerganov` from `ghcr` repo * [github][workflows][docker]: adds `jlumbroso/free-disk-space`

…#4573) * ggml : change ggml_scale to take a float instead of tensor * ggml : fix CPU implementation * tests : fix test-grad0 ggml-ci

* llama : Add ability to cancel model load Updated llama_progress_callback so that if it returns false, the model loading is aborted. * llama : Add test for model load cancellation * Fix bool return in llama_model_load, remove std::ignore use * Update llama.cpp Co-authored-by: Jared Van Bortel <[email protected]> * Fail test if model file is missing * Revert "Fail test if model file is missing" This reverts commit 32ebd52. * Add test-model-load-cancel to Makefile * Revert "Revert "Fail test if model file is missing"" This reverts commit 2796953. * Simplify .gitignore for tests, clang-tidy fixes * Label all ctest tests * ci : ctest uses -L main * Attempt at writing ctest_with_model * ci : get ci/run.sh working with test-model-load-cancel * ci : restrict .github/workflows/build.yml ctest to -L main * update requirements.txt * Disable test-model-load-cancel in make * Remove venv before creation * Restructure requirements.txt Top-level now imports the specific additional requirements for each python file. Using `pip install -r requirements.txt` will fail if versions become mismatched in the per-file requirements. * Make per-python-script requirements work alone This doesn't break the main requirements.txt. * Add comment * Add convert-persimmon-to-gguf.py to new requirements.txt scheme * Add check-requirements.sh script and GitHub workflow * Remove shellcheck installation step from workflow * Add nocleanup special arg * Fix merge see: ggerganov#4462 (comment) * reset to upstream/master * Redo changes for cancelling model load --------- Co-authored-by: Georgi Gerganov <[email protected]> Co-authored-by: Jared Van Bortel <[email protected]>

# Conflicts: # .github/workflows/docker.yml # CMakeLists.txt # Makefile # README.md # llama.cpp # tests/test-grad0.cpp

…ganov#4579)

NB: LLAMA_HIP_UMA=1 (or any value) adds MK_CPPFLAG -DGGML_HIP_UMA

* llama : fix platforms without mmap * win32 : limit prefetch size to the file size * fix win32 error clobber, unnecessary std::string in std::runtime_error

# Conflicts: # .github/workflows/docker.yml # Makefile # README.md # llama.cpp

* fix old jetson compile error * Update Makefile * update jetson detect and cuda version detect * update cuda marco define * update makefile and cuda,fix some issue * Update README.md Co-authored-by: Georgi Gerganov <[email protected]> * Update Makefile * Update README.md --------- Co-authored-by: Georgi Gerganov <[email protected]>

* cuda : fix im2col_f32_f16 (ggml/LostRuins#658) ggml-ci * ggml-alloc : fix ggml_tallocr_is_own --------- Co-authored-by: leejet <[email protected]>

# Conflicts: # Makefile # README.md

* initial commit, going through initializations * main loop finished, starting to debug * BUG: generates gibberish/repeating tokens after a while * kv_cache management * Added colors to distinguish drafted tokens (--color). Updated README * lookup : fix token positions in the draft batch * lookup : use n_draft from CLI params * lookup : final touches --------- Co-authored-by: Leon Ericsson <[email protected]> Co-authored-by: Georgi Gerganov <[email protected]>

# Conflicts: # Makefile

bullno1 and others added 30 commits December 17, 2023 11:57

Link to cublas dynamically on Windows even with LLAMA_STATIC (ggergan…

5daa5f5

…ov#4506)

server : allow requests larger than 8K (ggerganov#4500)

62bd52b

server : fix possible ambiguity in content type charset (ggerganov#4501)

eb16dae

server : fix grammar being ignored (ggerganov#4494)

8edd2b4

Fix bug in identifying the grammar.

server : disable llm logs if SERVER_VERBOSE is off (ggerganov#3792)

0ffc92d

finetune : keep allocs alive until all allocations are done (ggergano…

4566863

…v#4486)

build : Check the ROCm installation location (ggerganov#4485)

919c406

* build : Check the ROCm installation location * more generic approach * fixup! It was returning the path instead of the command output * fixup! Trailing whitespace

gguf-py : fail fast on nonsensical special token IDs (ggerganov#4489)

f7f468a

readme : update hot topics

b1306c4

decode : fix logits_valid for legacy API (ggerganov#4516)

2994f0c

llama : fix try_override for bool_value which always return true (gge…

3c04bf6

…rganov#4519)

llama.swiftui : add more models

6ff39b1

llama.swiftui : add tinyllama 1.1B F16

0e18b2e

ggml-cuda: Fix HIP build (ggerganov#4528)

a7aee47

regression of ggerganov#4490 Adds defines for two new datatypes cublasComputeType_t, cudaDataType_t. Currently using deprecated hipblasDatatype_t since newer ones very recent.

fix tools compilation

4c274dc

Fix for windows model unloading not releasing memory (LostRuins#569)

6948da5

* Add in model processes as a separate process so it can be killed when unloading to release memory on windows * Fix from Henky

move multiprocessing import into function scope

1f77d2a

Merge branch 'master' into concedo_experimental

49a5dfc

# Conflicts: # Makefile # README.md

Added support for ssl cert and key

da2db03

add presence penalty

3f863ee

ggml : fixed check for _MSC_VER (ggerganov#4535)

328b83d

Co-authored-by: Eric Sommerlade <[email protected]>

CUDA: Faster Mixtral prompt processing (ggerganov#4538)

799fc22

* CUDA: make MoE tensors contiguous for batch size>1 * Update ggml-cuda.cu Co-authored-by: slaren <[email protected]> --------- Co-authored-by: slaren <[email protected]>

Handle broken pipe error (LostRuins#572)

a787ebe

Fix access violation in ggml_cuda_free_data if tensor->extra is NULL (g…

1d7a191

…gerganov#4554)

testing workflow for windows cuda builds

e1f013b

Merge branch 'master' into concedo_experimental

96c12cf

testing workflow for windows cuda builds

ff4c2b1

(cherry picked from commit e1f013b)

Merge branch 'concedo' into concedo_experimental

c05d195

finnvoor and others added 27 commits December 21, 2023 21:55

metal : fix ggml_metal_log vargs (ggerganov#4373)

56fa508

ci : add jlumbroso/free-disk-space to docker workflow (ggerganov#4150)

4a5f9d6

* [github][workflows][docker]: removes hardcoded `ggerganov` from `ghcr` repo * [github][workflows][docker]: adds `jlumbroso/free-disk-space`

gguf : simplify example dependencies

32259b2

gguf-py : fix broken link

769a7bc

ggml : change ggml_scale to take a float instead of tensor (ggerganov…

afefa31

…#4573) * ggml : change ggml_scale to take a float instead of tensor * ggml : fix CPU implementation * tests : fix test-grad0 ggml-ci

always show reported arch

375003b

Merge branch 'master' into concedo_experimental

230a638

# Conflicts: # .github/workflows/docker.yml # CMakeLists.txt # Makefile # README.md # llama.cpp # tests/test-grad0.cpp

ggml : extend enum ggml_log_level with GGML_LOG_LEVEL_DEBUG (gger…

0137ef8

…ganov#4579)

readme : add zig bindings (ggerganov#4581)

2bb9827

ci : tag docker image with build number (ggerganov#4584)

f31b984

batch size improvements

77463e0

make : add LLAMA_HIP_UMA option (ggerganov#4587)

28cb35a

NB: LLAMA_HIP_UMA=1 (or any value) adds MK_CPPFLAG -DGGML_HIP_UMA

ggml : add comment about backward GGML_OP_DIAG_MASK_INF (ggerganov#4203)

48b24b1

llama : fix platforms without mmap (ggerganov#4578)

48b7ff1

* llama : fix platforms without mmap * win32 : limit prefetch size to the file size * fix win32 error clobber, unnecessary std::string in std::runtime_error

cherrypicked the Hipblas fixed from PR LostRuins#571

852ca78

Fix CudaMemcpy direction (ggerganov#4599)

6724ef1

Merge branch 'master' into concedo_experimental

3bca03d

# Conflicts: # .github/workflows/docker.yml # Makefile # README.md # llama.cpp

sync : ggml (fix im2col) (ggerganov#4591)

ba66175

* cuda : fix im2col_f32_f16 (ggml/LostRuins#658) ggml-ci * ggml-alloc : fix ggml_tallocr_is_own --------- Co-authored-by: leejet <[email protected]>

Merge branch 'master' into concedo_experimental

b814bb2

# Conflicts: # Makefile # README.md

added presence penalty into lite ui

8823e8b

Merge branch 'master' into concedo_experimental

4a8308b

# Conflicts: # Makefile

fixed incorrect localflag

71a5afa

kalomaze closed this Dec 23, 2023

kalomaze reopened this Dec 23, 2023

Merge branch 'exp-dynatemp-minp-latest' into try-update-concedo

af0a669

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update test 2 #8

update test 2 #8

kalomaze commented Dec 23, 2023

update test 2 #8

Are you sure you want to change the base?

update test 2 #8

Conversation

kalomaze commented Dec 23, 2023