forked from ggerganov/llama.cpp
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
merge upstream #46
Merged
Merged
merge upstream #46
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Owner
l3utterfly
commented
Nov 27, 2024
•
edited
Loading
edited
- I have read the contributing guidelines
- Self-reported review complexity:
- Low
- Medium
- High
* Samplers sequence: simplified and input field. * Removed unused function * Modify and use `settings-modal-short-input` * rename "name" --> "label" --------- Co-authored-by: Xuan Son Nguyen <[email protected]>
ggml-ci
* metal : add kernel arg structs (wip) * metal : fattn args ggml-ci * metal : cont + avoid potential int overflow [no ci] * metal : mul mat struct (wip) * cont : mul mat vec * cont : pass by reference * cont : args is first argument * cont : use char ptr * cont : shmem style * cont : thread counters style * cont : mul mm id ggml-ci * cont : int safety + register optimizations ggml-ci * metal : GGML_OP_CONCAT ggml-ci * metal : GGML_OP_ADD, GGML_OP_SUB, GGML_OP_MUL, GGML_OP_DIV * metal : GGML_OP_REPEAT * metal : GGML_OP_CPY * metal : GGML_OP_RMS_NORM * metal : GGML_OP_NORM * metal : add TODOs for rest of ops * ggml : add ggml-metal-impl.h ggml-ci
* Vulkan: Fix device info output format specifiers * Vulkan: Use zu printf specifier for size_t instead of ld
Flake lock file updates: • Updated input 'nixpkgs': 'github:NixOS/nixpkgs/4aa36568d413aca0ea84a1684d2d46f55dbabad7?narHash=sha256-Zwl8YgTVJTEum%2BL%2B0zVAWvXAGbWAuXHax3KzuejaDyo%3D' (2024-11-05) → 'github:NixOS/nixpkgs/5e4fbfb6b3de1aa2872b76d49fafc942626e2add?narHash=sha256-OZiZ3m8SCMfh3B6bfGC/Bm4x3qc1m2SVEAlkV6iY7Yg%3D' (2024-11-15) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Seems like this isn't working for vulkan-over-metal when the array is sized by a spec constant. Maybe a spirv-cross limitation?
* vulkan: Optimize soft_max Large soft_max could already saturate memory, but small/medium sizes were pretty slow. The bulk of the gains for them comes from using a smaller workgroup size, and making the workgroup size match the subgroup size also makes the barriers much cheaper. Cache some values in locals to avoid refetching/recomputing. And stamp out a few "template instantiations" so smaller cases will fully unroll. Add a missing early return for OOB rows. This happens when there are more than 512 rows and the dispatch is 512 x H. * vulkan: Further soft_max optimizations Restore the workgroup size of 512 case, use it for >1024. Use unrollable loops for more iteration counts.
* Add link to OLMo 2 model in docs * Change link to landing page
There have been reports of failure to compile on systems with <= 32KB of shared memory (e.g. ggerganov#10037). This change makes the large tile size fall back to a smaller size if necessary, and makes mul_mat_id fall back to CPU if there's only 16KB of shared memory.
…v#10537) * ci : faster CUDA toolkit installation method and use ccache * remove fetch-depth * only pack CUDA runtime on master
github-actions
bot
added
documentation
Improvements or additions to documentation
SYCL
Nvidia GPU
Vulkan
testing
build
examples
devops
python
server
ggml
Kompute
Apple Metal
script
nix
labels
Nov 27, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Apple Metal
build
devops
documentation
Improvements or additions to documentation
examples
ggml
Kompute
nix
Nvidia GPU
python
script
server
SYCL
testing
Vulkan
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.