Releases: ngxson/wllama
1.14.2
Update to latest upstream llama.cpp source code:
- Fix support for llama-3.1, phi 3 and SmolLM
Full Changelog: 1.14.0...1.14.2
1.14.0
1.13.0
What's Changed
- Update README.md by @flatsiedatsie in #78
- sync with upstream llama.cpp source code (+gemma2 support) by @ngxson in #81
- Fix exit() function crash if model is not loaded by @flatsiedatsie in #84
- Improve cache API by @ngxson in #80
- v1.13.0 by @ngxson in #85
New Contributors
- @flatsiedatsie made their first contribution in #78
Full Changelog: 1.12.1...1.13.0
1.12.1
1.12.0
Important
In prior versions, if you initialize wllama with embeddings: true
, you will still able to generate completions.
From v1.12.0, if you start wllama with embeddings: true
, this will throws an error when you try to use createCompletion
. You must add wllama.setOptions({ embeddings: false })
to turn of embeddings.
More details: This feature is introduced in ggerganov/llama.cpp#7477 , which allows models like GritLM to be used for both embeddings and text generation.
What's Changed
- Add
wllama.setOptions
by @ngxson in #73 - v1.12.0 by @ngxson in #74
- warn user if embeddings is incorrectly set by @ngxson in #75
Full Changelog: 1.11.0...1.12.0
1.11.0
What's Changed
- Internally generate the model URL array when the provided URL for
loadModelFromUrl
method is from a single shard of a model split with thegguf-split
tool by @felladrin in #61 - Allow loading a model using relative path by @felladrin in #64
- Git ignore also .DS_Store which are created by MacOS Finder by @felladrin in #65
- v1.11.0 by @ngxson in #68
Full Changelog: 1.10.0...1.11.0
1.10.0
What's Changed
loadModel()
now also acceptsBlob
orFile
- Added
GGUFRemoteBlob
that can stream Blob from a remote URL - Added example for loading local gguf files
- Implement OPFS for cache
Note: Optionally, you can clear the CacheStorage
used by previous version.
Pull requests:
- fix small typo in README by @ngxson in #51
- sync with latest llama.cpp source code by @ngxson in #59
- add Blob support + OPFS + load from local file(s) by @ngxson in #52
- v1.10.0 by @ngxson in #60
Full Changelog: 1.9.0...1.10.0
1.9.0
1.8.1
What's Changed
HeapFS allow us to save more memory while loading model. It also prevent doing memcpy, so loading model will be a bit faster.
- Make the
config
parameter of theloadModelFromUrl
function optional by @felladrin in #32 - Remove prebuilt esm by @ngxson in #33
- Improve error handling on abort() by @ngxson in #34
- add tool for debugging memory by @ngxson in #37
- sync to upstream llama.cpp source code by @ngxson in #46
Full Changelog: 1.8.0...1.8.1
1.8.0
What's Changed
- Docs & demo address changed from
ngxson.github.io
togithub.ngxson.com
. This allows adding COOP/COEP headers (required to run multi-thread examples) - Add download progress callback by @ngxson in #13
- Free buffer after uploaded to worker by @ngxson in #14
- Correct number of pthread pool size by @ngxson in #21
- Build docs on CI by @ngxson in #24
- fix OOM on iOS by @ngxson in #23
- Add
abortSignal
forcreateCompletion
by @ngxson in #26 - Sync upstream llama.cpp source code by @ngxson in #27
- Better exception handling by @ngxson in #29
New Contributors
- @felladrin made their first contribution in #15
Full Changelog: https://github.com/ngxson/wllama/commits/1.8.0