Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pull] master from ggerganov:master #157

Closed
wants to merge 14 commits into from
Closed

Conversation

pull[bot]
Copy link

@pull pull bot commented Nov 29, 2024

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.1)

Can you help keep this open source service alive? 💖 Please sponsor : )

This is an incremental improvement over #9118 to get work to the GPU a bit
sooner. The first part is to start with a smaller number of nodes before
the first submit, and ramp it up to the current 100 nodes/submit. The
second part is to reduce the dryrun overhead for all the nodes that just
need to request descriptor space.

With these changes I get around 1-2% speedup on RTX 4070 combined with my
old Haswell-era CPU.
noemotiovon and others added 2 commits November 29, 2024 14:46
* [cann] RoPE operator optimization

* [CANN]Code Formatting

---------

Co-authored-by: noemotiovon <[email protected]>
This PR fixes the failing MUL_MAT tests for the sycl backend.
@github-actions github-actions bot added the SYCL label Nov 29, 2024
slaren and others added 3 commits November 29, 2024 17:45
* cleanup UI link list

* sort list alphabetically

* add missing licenses
* imatrix-combine-only idea

* ensured that behavior consistent with log
* server : add split model test

* add test speculative

* add invalid cases
* ggml : move AMX to the CPU backend

---------

Co-authored-by: Georgi Gerganov <[email protected]>
netrunnereve and others added 3 commits November 30, 2024 08:00
* subgroup 64 version with subgroup add. 15% faster

scalable version

tested for subgroup sizes 16-128

* check for subgroup multiple of 16 and greater than 16

* subgroup sizes are always a power of 2 (KhronosGroup/GLSL#45)

* force 16 sequential threads per block

* make 16 subgroup size a constant
* readme : refresh

* readme : move section [no ci]

* readme : clarify [no ci]

* readme : fixes [no ci]

* readme : more fixes [no ci]

* readme : simplify [no ci]

* readme : clarify GGUF
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.