03 Dec 17:59

ngxson

48e36d1

2.0.1 Latest

Latest

Important

Migration guide from 1.x to 2.0: https://github.com/ngxson/wllama/releases/tag/2.0.0

What's Changed

Update available types for cache_type_k and cache_type_v by @felladrin in #134
Fix nextjs build by @ngxson in #138
Fix load file order by @ngxson in #139

Full Changelog: 2.0.0...2.0.1

Contributors

felladrin and ngxson

Assets 2

01 Dec 15:09

ngxson

2.0.0

c9e1af2

Version 2.0 is out 🎉

Introducing Wllama V2.0

What's new

V2.0 introduces significant improvements in model management and caching. Key features include:

Completely rewritten model downloader with service worker
New ModelManager class providing comprehensive model handling and caching capabilities
Enhanced testing system built on the vitest framework

Added `ModelManager`

The new ModelManager class provides a robust interface for handling model files:

// Example usage
const modelManager = new ModelManager();

// List all models in cache
const cachedModels = await modelManager.getModels();

// Add a new model
const model = await modelManager.downloadModel('https://example.com/model.gguf');

// Check if model is valid (i.e. it is not corrupted)
// If status === ModelValidationStatus.VALID, you can use the model
// Otherwise, call model.refresh() to re-download it
const status = await model.validate();

// Re-download if needed (useful when remote model file has changed)
await model.refresh();

// Remove model from cache
await model.remove();

// Load the selected model into llama.cpp
const wllama = new Wllama(CONFIG_PATHS);
await wllama.loadModel(model);

// Alternatively, you can also pass directly model URL like in v1.x
// This will automatically download the model to cache
await wllama.loadModelFromUrl('https://example.com/model.gguf');

Key features of ModelManager:

Automatic handling of split GGUF models
Built-in model validation
Parallel downloads of model shards
Cache management with refresh and removal options

Added `loadModelFromHF`

A new helper function to load models directly from Hugging Face Hub. This is a convenient wrapper over loadModelFromUrl that handles HF repository URLs.

await wllama.loadModelFromHF(
  'ggml-org/models',
  'tinyllamas/stories260K.gguf'
);

Migration to v2.0

Simplified `new Wllama()` constructor

In v2.0, the configuration paths have been simplified. You now only need to specify the *.wasm files, as the *.js files are no longer required.

Previously in v1.x:

const CONFIG_PATHS = {
  'single-thread/wllama.js'       : '../../esm/single-thread/wllama.js',
  'single-thread/wllama.wasm'     : '../../esm/single-thread/wllama.wasm',
  'multi-thread/wllama.js'        : '../../esm/multi-thread/wllama.js',
  'multi-thread/wllama.wasm'      : '../../esm/multi-thread/wllama.wasm',
  'multi-thread/wllama.worker.mjs': '../../esm/multi-thread/wllama.worker.mjs',
};
const wllama = new Wllama(CONFIG_PATHS);

From v2.0:

// You only need to specify 2 files
const CONFIG_PATHS = {
  'single-thread/wllama.wasm': '../../esm/single-thread/wllama.wasm',
  'multi-thread/wllama.wasm' : '../../esm/multi-thread/wllama.wasm',
};
const wllama = new Wllama(CONFIG_PATHS);

Alternatively, you can use the *.wasm files from CDN:

import WasmFromCDN from '@wllama/wllama/esm/wasm-from-cdn.js';
const wllama = new Wllama(WasmFromCDN);
// NOTE: this is not recommended
// only use this when you can't embed wasm files in your project

The Wllama constructor now accepts an optional second parameter of type WllamaConfig for configuration options:

Important

Most configuration options previously available in DownloadModelConfig used with loadModelFromUrl() have been moved to this constructor config.

const wllama = new Wllama(CONFIG_PATHS, {
  parallelDownloads: 5, // maximum concurrent downloads
  allowOffline: false, // whether to allow offline model loading
});

`Wllama.loadModelFromUrl`

As mentioned earlier, some options are moved to Wllama constructor, including:

parallelDownloads
allowOffline

Other changes

Wllama.downloadModel is removed. Please use ModelManager.downloadModel instead
loadModelFromUrl won't check if cached model is up-to-date. You may need to manually call Model.refresh() to re-download the model.
Changes in CacheManager:
- Added CacheManager.download function
- CacheManager.open(nameOrURL) now accepts both file name and original URL. It now returns a Blob instead of a ReadableStream

Internal Changes

Notable internal improvements made to the codebase:

Comprehensive test coverage using vitest, with browser testing for Chrome and Firefox (Safari support planned for the future)
Enhanced CI pipeline including validation for example builds, ESM compilation and lint checks

Assets 2

21 Nov 21:08

ngxson

1.17.1

dc9c917

1.17.1

What's Changed

sync to latest upstream source code by @ngxson in #132

Full Changelog: 1.17.0...1.17.1

Contributors

ngxson

Assets 2

31 Oct 12:56

ngxson

1.17.0

11ac816

1.17.0

What's Changed

Add WllamaError class, fix llama_decode hangs on long input text by @ngxson in #130

This release fixes an issue where long input text can cause the app to hang up.

Full Changelog: 1.16.4...1.17.0

Contributors

ngxson

Assets 2

24 Oct 15:17

ngxson

1.16.4

b727c3c

1.16.4

What's Changed

sync to latest upstream source code by @ngxson in #129

Full Changelog: 1.16.3...1.16.4

Contributors

ngxson

Assets 2

07 Oct 09:49

ngxson

1.16.3

ffcd98a

1.16.3

What's Changed

sync to latest upstream source code by @ngxson in #125

Thanks to a small refactoring on llama.cpp, be binary size is now reduced from 1.78MB to 1.52MB

Full Changelog: 1.16.2...1.16.3

Contributors

ngxson

Assets 2

23 Sep 16:15

ngxson

1.16.2

d9b849e

1.16.2

What's Changed

decode/encode : do not fail on empty batch by @ngxson in #118
Update to latest llama.cpp source code by @ngxson in #119

Full Changelog: 1.16.1...1.16.2

Contributors

ngxson

Assets 2

06 Sep 14:29

ngxson

1.16.1

7beefeb

1.16.1

What's Changed

v1.16.1 by @ngxson in #113

Full Changelog: 1.16.0...1.16.1

Contributors

ngxson

Assets 2

19 Aug 10:04

ngxson

1.16.0

e7fe626

1.16.0

SmolLM-360m is added as a model in main example. Try it now --> https://huggingface.co/spaces/ngxson/wllama

Special thanks to @huggingface team for providing a such powerful model in a very small size!

What's Changed

ability to use custom cacheManager by @ngxson in #109

Full Changelog: 1.15.0...1.16.0

Contributors

ngxson and huggingface

Assets 2

03 Aug 20:34

ngxson

1.15.0

667dd91

1.15.0

New features

downloadModel()

Download model to cache without loading it. The use case would be to allow application to have a "model manager" screen that allows:

Download model via downloadModel()
List all downloaded models using CacheManager.list()
Delete a downloaded model using CacheManager.delete()

KV cache reuse in createCompletion

When calling createCompletion, you can pass useCache: true as an option. It will reuse the KV cache from the last createCompletion call. It is equivalent to cache_prompt option on llama.cpp server.

wllama.createCompletion(input, {
  useCache: true,
  ...
});

For example:

On the first call, you have 2 messages: user: hello, assistant: hi
On the second call, you add one message: user: hello, assistant: hi, user: who are you?

Then, only the added message user: who are you? will need to be evaluated.

What's Changed

Add downloadModel function by @ngxson in #95
fix log print and downloadModel by @ngxson in #100
Add main example (chat UI) by @ngxson in #99
Improve main UI example by @ngxson in #102
implement KV cache reuse by @ngxson in #103

Full Changelog: 1.14.2...1.15.0

Contributors

ngxson

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

Contributors

Introducing Wllama V2.0

What's new

Added `ModelManager`

Added `loadModelFromHF`

Migration to v2.0

Simplified `new Wllama()` constructor

`Wllama.loadModelFromUrl`

Other changes

Internal Changes

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

New features

downloadModel()

KV cache reuse in createCompletion

What's Changed

Contributors

Releases: ngxson/wllama

2.0.1

What's Changed

Contributors

Version 2.0 is out 🎉

Introducing Wllama V2.0

What's new

Added ModelManager

Added loadModelFromHF

Migration to v2.0

Simplified new Wllama() constructor

Wllama.loadModelFromUrl

Other changes

Internal Changes

1.17.1

What's Changed

Contributors

1.17.0

What's Changed

Contributors

1.16.4

What's Changed

Contributors

1.16.3

What's Changed

Contributors

1.16.2

What's Changed

Contributors

1.16.1

What's Changed

Contributors

1.16.0

What's Changed

Contributors

1.15.0

New features

downloadModel()

KV cache reuse in createCompletion

What's Changed

Contributors

Added `ModelManager`

Added `loadModelFromHF`

Simplified `new Wllama()` constructor

`Wllama.loadModelFromUrl`