Skip to content

Releases: ngxson/wllama

2.0.1

03 Dec 17:59
48e36d1
Compare
Choose a tag to compare

Important

Migration guide from 1.x to 2.0: https://github.com/ngxson/wllama/releases/tag/2.0.0

What's Changed

Full Changelog: 2.0.0...2.0.1

Version 2.0 is out 🎉

01 Dec 15:09
c9e1af2
Compare
Choose a tag to compare

Introducing Wllama V2.0

What's new

V2.0 introduces significant improvements in model management and caching. Key features include:

  • Completely rewritten model downloader with service worker
  • New ModelManager class providing comprehensive model handling and caching capabilities
  • Enhanced testing system built on the vitest framework

Added ModelManager

The new ModelManager class provides a robust interface for handling model files:

// Example usage
const modelManager = new ModelManager();

// List all models in cache
const cachedModels = await modelManager.getModels();

// Add a new model
const model = await modelManager.downloadModel('https://example.com/model.gguf');

// Check if model is valid (i.e. it is not corrupted)
// If status === ModelValidationStatus.VALID, you can use the model
// Otherwise, call model.refresh() to re-download it
const status = await model.validate();

// Re-download if needed (useful when remote model file has changed)
await model.refresh();

// Remove model from cache
await model.remove();

// Load the selected model into llama.cpp
const wllama = new Wllama(CONFIG_PATHS);
await wllama.loadModel(model);

// Alternatively, you can also pass directly model URL like in v1.x
// This will automatically download the model to cache
await wllama.loadModelFromUrl('https://example.com/model.gguf');

Key features of ModelManager:

  • Automatic handling of split GGUF models
  • Built-in model validation
  • Parallel downloads of model shards
  • Cache management with refresh and removal options

Added loadModelFromHF

A new helper function to load models directly from Hugging Face Hub. This is a convenient wrapper over loadModelFromUrl that handles HF repository URLs.

await wllama.loadModelFromHF(
  'ggml-org/models',
  'tinyllamas/stories260K.gguf'
);

Migration to v2.0

Simplified new Wllama() constructor

In v2.0, the configuration paths have been simplified. You now only need to specify the *.wasm files, as the *.js files are no longer required.

Previously in v1.x:

const CONFIG_PATHS = {
  'single-thread/wllama.js'       : '../../esm/single-thread/wllama.js',
  'single-thread/wllama.wasm'     : '../../esm/single-thread/wllama.wasm',
  'multi-thread/wllama.js'        : '../../esm/multi-thread/wllama.js',
  'multi-thread/wllama.wasm'      : '../../esm/multi-thread/wllama.wasm',
  'multi-thread/wllama.worker.mjs': '../../esm/multi-thread/wllama.worker.mjs',
};
const wllama = new Wllama(CONFIG_PATHS);

From v2.0:

// You only need to specify 2 files
const CONFIG_PATHS = {
  'single-thread/wllama.wasm': '../../esm/single-thread/wllama.wasm',
  'multi-thread/wllama.wasm' : '../../esm/multi-thread/wllama.wasm',
};
const wllama = new Wllama(CONFIG_PATHS);

Alternatively, you can use the *.wasm files from CDN:

import WasmFromCDN from '@wllama/wllama/esm/wasm-from-cdn.js';
const wllama = new Wllama(WasmFromCDN);
// NOTE: this is not recommended
// only use this when you can't embed wasm files in your project

The Wllama constructor now accepts an optional second parameter of type WllamaConfig for configuration options:

Important

Most configuration options previously available in DownloadModelConfig used with loadModelFromUrl() have been moved to this constructor config.

const wllama = new Wllama(CONFIG_PATHS, {
  parallelDownloads: 5, // maximum concurrent downloads
  allowOffline: false, // whether to allow offline model loading
});

Wllama.loadModelFromUrl

As mentioned earlier, some options are moved to Wllama constructor, including:

  • parallelDownloads
  • allowOffline

Other changes

  • Wllama.downloadModel is removed. Please use ModelManager.downloadModel instead
  • loadModelFromUrl won't check if cached model is up-to-date. You may need to manually call Model.refresh() to re-download the model.
  • Changes in CacheManager:
    • Added CacheManager.download function
    • CacheManager.open(nameOrURL) now accepts both file name and original URL. It now returns a Blob instead of a ReadableStream

Internal Changes

Notable internal improvements made to the codebase:

  • Comprehensive test coverage using vitest, with browser testing for Chrome and Firefox (Safari support planned for the future)
  • Enhanced CI pipeline including validation for example builds, ESM compilation and lint checks

1.17.1

21 Nov 21:08
dc9c917
Compare
Choose a tag to compare

What's Changed

  • sync to latest upstream source code by @ngxson in #132

Full Changelog: 1.17.0...1.17.1

1.17.0

31 Oct 12:56
Compare
Choose a tag to compare

What's Changed

  • Add WllamaError class, fix llama_decode hangs on long input text by @ngxson in #130

This release fixes an issue where long input text can cause the app to hang up.

Full Changelog: 1.16.4...1.17.0

1.16.4

24 Oct 15:17
b727c3c
Compare
Choose a tag to compare

What's Changed

  • sync to latest upstream source code by @ngxson in #129

Full Changelog: 1.16.3...1.16.4

1.16.3

07 Oct 09:49
ffcd98a
Compare
Choose a tag to compare

What's Changed

  • sync to latest upstream source code by @ngxson in #125

Thanks to a small refactoring on llama.cpp, be binary size is now reduced from 1.78MB to 1.52MB

Full Changelog: 1.16.2...1.16.3

1.16.2

23 Sep 16:15
d9b849e
Compare
Choose a tag to compare

What's Changed

  • decode/encode : do not fail on empty batch by @ngxson in #118
  • Update to latest llama.cpp source code by @ngxson in #119

Full Changelog: 1.16.1...1.16.2

1.16.1

06 Sep 14:29
7beefeb
Compare
Choose a tag to compare

What's Changed

Full Changelog: 1.16.0...1.16.1

1.16.0

19 Aug 10:04
Compare
Choose a tag to compare

SmolLM-360m is added as a model in main example. Try it now --> https://huggingface.co/spaces/ngxson/wllama

Special thanks to @huggingface team for providing a such powerful model in a very small size!

Screenshot 2024-08-19 at 11 35 22

What's Changed

  • ability to use custom cacheManager by @ngxson in #109

Full Changelog: 1.15.0...1.16.0

1.15.0

03 Aug 20:34
667dd91
Compare
Choose a tag to compare

New features

downloadModel()

Download model to cache without loading it. The use case would be to allow application to have a "model manager" screen that allows:

  • Download model via downloadModel()
  • List all downloaded models using CacheManager.list()
  • Delete a downloaded model using CacheManager.delete()

KV cache reuse in createCompletion

When calling createCompletion, you can pass useCache: true as an option. It will reuse the KV cache from the last createCompletion call. It is equivalent to cache_prompt option on llama.cpp server.

wllama.createCompletion(input, {
  useCache: true,
  ...
});

For example:

  • On the first call, you have 2 messages: user: hello, assistant: hi
  • On the second call, you add one message: user: hello, assistant: hi, user: who are you?

Then, only the added message user: who are you? will need to be evaluated.

What's Changed

Full Changelog: 1.14.2...1.15.0