Skip to content

Releases: ngxson/wllama

1.14.2

28 Jul 11:39
d15748b
Compare
Choose a tag to compare

Update to latest upstream llama.cpp source code:

  • Fix support for llama-3.1, phi 3 and SmolLM

Full Changelog: 1.14.0...1.14.2

1.14.0

10 Jul 11:51
94ebb81
Compare
Choose a tag to compare

What's Changed

  • save ETag metadata, add allowOffline option in #90
  • Added experimental support for encoder-decoder architecture #91

Full Changelog: 1.13.0...1.14.0

1.13.0

03 Jul 15:13
44a4de5
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: 1.12.1...1.13.0

1.12.1

27 Jun 20:49
b847495
Compare
Choose a tag to compare

What's Changed

  • Sync with latest upstream source code + adapt to project structure change by @ngxson in #77

Full Changelog: 1.12.0...1.12.1

1.12.0

24 Jun 15:29
896c160
Compare
Choose a tag to compare

Important

In prior versions, if you initialize wllama with embeddings: true, you will still able to generate completions.

From v1.12.0, if you start wllama with embeddings: true, this will throws an error when you try to use createCompletion. You must add wllama.setOptions({ embeddings: false }) to turn of embeddings.

More details: This feature is introduced in ggerganov/llama.cpp#7477 , which allows models like GritLM to be used for both embeddings and text generation.

What's Changed

Full Changelog: 1.11.0...1.12.0

1.11.0

11 Jun 18:47
a5e919b
Compare
Choose a tag to compare

What's Changed

  • Internally generate the model URL array when the provided URL for loadModelFromUrl method is from a single shard of a model split with the gguf-split tool by @felladrin in #61
  • Allow loading a model using relative path by @felladrin in #64
  • Git ignore also .DS_Store which are created by MacOS Finder by @felladrin in #65
  • v1.11.0 by @ngxson in #68

Full Changelog: 1.10.0...1.11.0

1.10.0

01 Jun 16:34
bbaff9b
Compare
Choose a tag to compare

What's Changed

  • loadModel() now also accepts Blob or File
  • Added GGUFRemoteBlob that can stream Blob from a remote URL
  • Added example for loading local gguf files
  • Implement OPFS for cache

Note: Optionally, you can clear the CacheStorage used by previous version.

Pull requests:

Full Changelog: 1.9.0...1.10.0

1.9.0

18 May 10:17
454c5ed
Compare
Choose a tag to compare

What's Changed

  • Add support for EOT (end of turn) and stopTokens by @ngxson in #47
  • Ability to get model metadata by @ngxson in #48
  • Add custom logger by @ngxson in #49
  • sync to upstream llama.cpp source code (+ release v1.9.0) by @ngxson in #50

Full Changelog: 1.8.1...1.9.0

1.8.1

16 May 11:58
50aecda
Compare
Choose a tag to compare

What's Changed

HeapFS allow us to save more memory while loading model. It also prevent doing memcpy, so loading model will be a bit faster.

  • Make the config parameter of the loadModelFromUrl function optional by @felladrin in #32
  • Remove prebuilt esm by @ngxson in #33
  • Improve error handling on abort() by @ngxson in #34
  • add tool for debugging memory by @ngxson in #37
  • sync to upstream llama.cpp source code by @ngxson in #46

Full Changelog: 1.8.0...1.8.1

1.8.0

12 May 22:33
c6419de
Compare
Choose a tag to compare

What's Changed

  • Docs & demo address changed from ngxson.github.io to github.ngxson.com. This allows adding COOP/COEP headers (required to run multi-thread examples)
  • Add download progress callback by @ngxson in #13
  • Free buffer after uploaded to worker by @ngxson in #14
  • Correct number of pthread pool size by @ngxson in #21
  • Build docs on CI by @ngxson in #24
  • fix OOM on iOS by @ngxson in #23
  • Add abortSignal for createCompletion by @ngxson in #26
  • Sync upstream llama.cpp source code by @ngxson in #27
  • Better exception handling by @ngxson in #29

New Contributors

Full Changelog: https://github.com/ngxson/wllama/commits/1.8.0