Releases · ngxson/wllama

03 Aug 20:34

ngxson

1.15.0

667dd91

1.15.0

New features

downloadModel()

Download model to cache without loading it. The use case would be to allow application to have a "model manager" screen that allows:

Download model via downloadModel()
List all downloaded models using CacheManager.list()
Delete a downloaded model using CacheManager.delete()

KV cache reuse in createCompletion

When calling createCompletion, you can pass useCache: true as an option. It will reuse the KV cache from the last createCompletion call. It is equivalent to cache_prompt option on llama.cpp server.

wllama.createCompletion(input, {
  useCache: true,
  ...
});

For example:

On the first call, you have 2 messages: user: hello, assistant: hi
On the second call, you add one message: user: hello, assistant: hi, user: who are you?

Then, only the added message user: who are you? will need to be evaluated.

What's Changed

Add downloadModel function by @ngxson in #95
fix log print and downloadModel by @ngxson in #100
Add main example (chat UI) by @ngxson in #99
Improve main UI example by @ngxson in #102
implement KV cache reuse by @ngxson in #103

Full Changelog: 1.14.2...1.15.0

Contributors

ngxson

Assets 2

28 Jul 11:39

ngxson

1.14.2

d15748b

1.14.2

Update to latest upstream llama.cpp source code:

Fix support for llama-3.1, phi 3 and SmolLM

Full Changelog: 1.14.0...1.14.2

Assets 2

10 Jul 11:51

ngxson

1.14.0

94ebb81

1.14.0

What's Changed

save ETag metadata, add allowOffline option in #90
Added experimental support for encoder-decoder architecture #91

Full Changelog: 1.13.0...1.14.0

Assets 2

03 Jul 15:13

ngxson

1.13.0

44a4de5

1.13.0

What's Changed

Update README.md by @flatsiedatsie in #78
sync with upstream llama.cpp source code (+gemma2 support) by @ngxson in #81
Fix exit() function crash if model is not loaded by @flatsiedatsie in #84
Improve cache API by @ngxson in #80
v1.13.0 by @ngxson in #85

New Contributors

@flatsiedatsie made their first contribution in #78

Full Changelog: 1.12.1...1.13.0

Contributors

flatsiedatsie and ngxson

Assets 2

27 Jun 20:49

ngxson

1.12.1

b847495

1.12.1

What's Changed

Sync with latest upstream source code + adapt to project structure change by @ngxson in #77

Full Changelog: 1.12.0...1.12.1

Contributors

ngxson

Assets 2

24 Jun 15:29

ngxson

1.12.0

896c160

1.12.0

Important

In prior versions, if you initialize wllama with embeddings: true, you will still able to generate completions.

From v1.12.0, if you start wllama with embeddings: true, this will throws an error when you try to use createCompletion. You must add wllama.setOptions({ embeddings: false }) to turn of embeddings.

More details: This feature is introduced in ggerganov/llama.cpp#7477 , which allows models like GritLM to be used for both embeddings and text generation.

What's Changed

Add wllama.setOptions by @ngxson in #73
v1.12.0 by @ngxson in #74
warn user if embeddings is incorrectly set by @ngxson in #75

Full Changelog: 1.11.0...1.12.0

Contributors

ngxson

Assets 2

11 Jun 18:47

ngxson

1.11.0

a5e919b

1.11.0

What's Changed

Internally generate the model URL array when the provided URL for loadModelFromUrl method is from a single shard of a model split with the gguf-split tool by @felladrin in #61
Allow loading a model using relative path by @felladrin in #64
Git ignore also .DS_Store which are created by MacOS Finder by @felladrin in #65
v1.11.0 by @ngxson in #68

Full Changelog: 1.10.0...1.11.0

Contributors

felladrin and ngxson

Assets 2

01 Jun 16:34

ngxson

1.10.0

bbaff9b

1.10.0

What's Changed

loadModel() now also accepts Blob or File
Added GGUFRemoteBlob that can stream Blob from a remote URL
Added example for loading local gguf files
Implement OPFS for cache

Note: Optionally, you can clear the CacheStorage used by previous version.

Pull requests:

fix small typo in README by @ngxson in #51
sync with latest llama.cpp source code by @ngxson in #59
add Blob support + OPFS + load from local file(s) by @ngxson in #52
v1.10.0 by @ngxson in #60

Full Changelog: 1.9.0...1.10.0

Contributors

ngxson

Assets 2

18 May 10:17

ngxson

1.9.0

454c5ed

1.9.0

What's Changed

Add support for EOT (end of turn) and stopTokens by @ngxson in #47
Ability to get model metadata by @ngxson in #48
Add custom logger by @ngxson in #49
sync to upstream llama.cpp source code (+ release v1.9.0) by @ngxson in #50

Full Changelog: 1.8.1...1.9.0

Contributors

ngxson

Assets 2

16 May 11:58

ngxson

1.8.1

50aecda

1.8.1

What's Changed

Introduce heapfs by @ngxson in #39

HeapFS allow us to save more memory while loading model. It also prevent doing memcpy, so loading model will be a bit faster.

Make the config parameter of the loadModelFromUrl function optional by @felladrin in #32
Remove prebuilt esm by @ngxson in #33
Improve error handling on abort() by @ngxson in #34
add tool for debugging memory by @ngxson in #37
sync to upstream llama.cpp source code by @ngxson in #46

Full Changelog: 1.8.0...1.8.1

Contributors

felladrin and ngxson

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New features

downloadModel()

KV cache reuse in createCompletion

What's Changed

Contributors

What's Changed

What's Changed

New Contributors

Contributors

What's Changed

Contributors

Important

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

Releases: ngxson/wllama

1.15.0

New features

downloadModel()

KV cache reuse in createCompletion

What's Changed

Contributors

1.14.2

1.14.0

What's Changed

1.13.0

What's Changed

New Contributors

Contributors

1.12.1

What's Changed

Contributors

1.12.0

Important

What's Changed

Contributors

1.11.0

What's Changed

Contributors

1.10.0

What's Changed

Contributors

1.9.0

What's Changed

Contributors

1.8.1

What's Changed

Contributors