Releases · b4rtaz/distributed-llama

New tokenizer format (old tokenizer files are not supported, please regenerate tokenizer files).
Added Llama 3 support.
Simple-server mode, check this example: nodejs-example.cjs Now you may use Distributed Llama as a simple LLM server.

Assets 2

11 Apr 21:29

b4rtaz

v0.2.0

620644a

0.2.0

Added Grok-1 support!

Breaking changes: you need to re-convert Llama 2 models to the new version.

Assets 2

23 Jan 23:07

b4rtaz

v0.1.1

f2137af

0.1.1

This version introduces partial optimization for x86_64 AVX2 CPUs. Now it's possible to run the inference with Q40 weights and Q80 buffer with partial AVX2 acceleration.

Assets 2

23 Jan 22:50

b4rtaz

v0.1.0

7eb77ca

0.1.0

Initial release! 🚢

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contributors

Releases: b4rtaz/distributed-llama

0.5.2

0.5.1

Contributors

0.5.0

0.4.0

0.3.1

0.3.0

0.2.0

0.1.1

0.1.0