Skip to content

Commit

Permalink
Version 2.0 (#133)
Browse files Browse the repository at this point in the history
* add model manager

* add generated web worker code

* sync

* get rid of remote-blob

* get rid of import wllama.js

* add vitest

* fix ci

* add firefox test

* fix firefox deps

* enable generate docs on pr

* add intro

* add tests for ModelManager

* small clarification

* update docs

* add wasm-from-cdn

* update readme

* add test for progress callback

* remove reactjs example

* add embd test

* ci: bump to node 22, add lint

* fix ci

* fix npm ci

* update readme

* add loadModelFromHF

* fix remove model from cache

* fix main example

* update docs

* add allowOffline test

* fix model validation, add tests

* bump upstream source code
  • Loading branch information
ngxson authored Dec 1, 2024
1 parent dc9c917 commit c9e1af2
Show file tree
Hide file tree
Showing 66 changed files with 8,338 additions and 5,530 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/build-hf-space.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ jobs:
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: '18'
node-version: '22'

- name: Build Hugging Face space
shell: bash
Expand Down
80 changes: 80 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
name: CI

on:
push:
workflow_dispatch:

concurrency:
group: ci-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true

env:
PLAYWRIGHT_BROWSERS_PATH: ${{ github.workspace }}/.cache/ms-playwright

jobs:
test:
runs-on: ${{ matrix.os }}

timeout-minutes: 10

strategy:
matrix:
os: [ubuntu-latest]
node_version: [22]
# include:
# - os: macos-14
# node_version: 22
# - os: windows-latest
# node_version: 22
fail-fast: false

steps:
- uses: actions/checkout@v4

- name: Set node version to ${{ inputs.node-version }}
uses: actions/setup-node@v4
with:
node-version: ${{ inputs.node-version }}

- name: Install
run: npm ci --include=dev

- name: Install Playwright Dependencies
run: npx playwright install --with-deps

- name: Build
run: npm run build

- name: Test (Chrome)
run: npm run test

- name: Test (Firefox)
run: npm run test:firefox

lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Setup NodeJS
uses: actions/setup-node@v4
with:
node-version: '22'

- name: Install
run: npm ci --include=dev

- name: Check format
run: |
git config --global --add safe.directory $(realpath .)
git status
npm run format
git status
modified_files="$(git status -s)"
echo "Modified files: ${modified_files}"
if [ -n "${modified_files}" ]; then
echo "Detect unformatted files"
echo "You may need to run: npm run format"
echo "${modified_files}"
exit 1
fi
9 changes: 5 additions & 4 deletions .github/workflows/generate-docs.yml
Original file line number Diff line number Diff line change
@@ -1,9 +1,8 @@
name: Deploy docs and demo to GitHub Pages
name: Build docs and demo

on:
# Runs on pushes targeting the default branch
# Runs on pushes
push:
branches: ["master"]

# Allows you to run this workflow manually from the Actions tab
workflow_dispatch:
Expand Down Expand Up @@ -31,7 +30,7 @@ jobs:
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: '18'
node-version: '22'

- name: Install Dependencies
run: npm ci
Expand All @@ -53,10 +52,12 @@ jobs:
rm -rf node_modules
- name: Upload artifact
if: github.ref == 'refs/heads/master'
uses: actions/upload-pages-artifact@v3
with:
path: "./"

- name: Deploy to GitHub Pages
id: deployment
if: github.ref == 'refs/heads/master'
uses: actions/deploy-pages@v4
34 changes: 34 additions & 0 deletions .github/workflows/verify-generated-code.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
name: Verify generated worker code is up-to-date

on:
# Runs on pushes
push:
# Allows you to run this workflow manually from the Actions tab
workflow_dispatch:

jobs:
verify:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4

- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: '22'

- name: Verify generated code
run: |
git config --global --add safe.directory $(realpath .)
git status
npm run build:worker
git status
modified_files="$(git status -s)"
echo "Modified files: ${modified_files}"
if [ -n "${modified_files}" ]; then
echo "Generated code file is not up-to-date"
echo "Hint: You may need to run: npm run build:worker"
echo "${modified_files}"
exit 1
fi
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
node_modules
.DS_Store
.vscode
/cache
/docs
/dist
Expand Down
2 changes: 2 additions & 0 deletions .prettierignore
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@

/src/multi-thread
/src/single-thread
/src/workers-code/generated.ts
/src/wasm-from-cdn.ts

*.md
*.mdx
Expand Down
5 changes: 0 additions & 5 deletions .vscode/settings.json

This file was deleted.

56 changes: 36 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,9 @@ WebAssembly binding for [llama.cpp](https://github.com/ggerganov/llama.cpp)

For changelog, please visit [releases page](https://github.com/ngxson/wllama/releases)

> [!IMPORTANT]
> Version 2.0 is released 👉 [read more](./guides/intro-v2.md)
![](./assets/screenshot_0.png)

## Features
Expand All @@ -35,8 +38,8 @@ Limitations:

Demo:
- Basic usages with completions and embeddings: https://github.ngxson.com/wllama/examples/basic/
- Advanced example using low-level API: https://github.ngxson.com/wllama/examples/advanced/
- Embedding and cosine distance: https://github.ngxson.com/wllama/examples/embeddings/
- For more advanced example using low-level API, have a look at test file: [wllama.test.ts](./src/wllama.test.ts)

## How to use

Expand All @@ -48,7 +51,15 @@ Install it:
npm i @wllama/wllama
```

For complete code, see [examples/reactjs](./examples/reactjs)
Then, import the module:

```ts
import { Wllama } from '@wllama/wllama';
let wllamaInstance = new Wllama(WLLAMA_CONFIG_PATHS, ...);
// (the rest is the same with earlier example)
```

For complete code example, see [examples/main/utils/wllama.context.tsx](./examples/main/utils/wllama.context.tsx)

NOTE: this example only covers completions usage. For embeddings, please see [examples/embeddings/index.html](./examples/embeddings/index.html)

Expand All @@ -67,11 +78,8 @@ import { Wllama } from './esm/index.js';

(async () => {
const CONFIG_PATHS = {
'single-thread/wllama.js' : './esm/single-thread/wllama.js',
'single-thread/wllama.wasm' : './esm/single-thread/wllama.wasm',
'multi-thread/wllama.js' : './esm/multi-thread/wllama.js',
'multi-thread/wllama.wasm' : './esm/multi-thread/wllama.wasm',
'multi-thread/wllama.worker.mjs': './esm/multi-thread/wllama.worker.mjs',
'single-thread/wllama.wasm': './esm/single-thread/wllama.wasm',
'multi-thread/wllama.wasm' : './esm/multi-thread/wllama.wasm',
};
// Automatically switch between single-thread and multi-thread version based on browser support
// If you want to enforce single-thread, add { "n_threads": 1 } to LoadModelConfig
Expand All @@ -83,8 +91,11 @@ import { Wllama } from './esm/index.js';
// Log the progress in a user-friendly format
console.log(`Downloading... ${progressPercentage}%`);
};
await wllama.loadModelFromUrl(
"https://huggingface.co/ggml-org/models/resolve/main/tinyllamas/stories260K.gguf",
// Load GGUF from Hugging Face hub
// (alternatively, you can use loadModelFromUrl if the model is not from HF hub)
await wllama.loadModelFromHF(
'ggml-org/models',
'tinyllamas/stories260K.gguf',
{
progressCallback,
}
Expand All @@ -101,6 +112,14 @@ import { Wllama } from './esm/index.js';
})();
```

Alternatively, you can use the `*.wasm` files from CDN:

```js
import WasmFromCDN from '@wllama/wllama/esm/wasm-from-cdn.js';
const wllama = new Wllama(WasmFromCDN);
// NOTE: this is not recommended, only use when you can't embed wasm files in your project
```

### Split model

Cases where we want to split the model:
Expand All @@ -116,14 +135,15 @@ We use `llama-gguf-split` to split a big gguf file into smaller files. You can d

This will output files ending with `-00001-of-00003.gguf`, `-00002-of-00003.gguf`, and so on.

You can then pass to `loadModelFromUrl` the URL of the first file and it will automatically load all the chunks:
You can then pass to `loadModelFromUrl` or `loadModelFromHF` the URL of the first file and it will automatically load all the chunks:

```js
await wllama.loadModelFromUrl(
'https://huggingface.co/ngxson/tinyllama_split_test/resolve/main/stories15M-q8_0-00001-of-00003.gguf',
{
parallelDownloads: 5, // optional: maximum files to download in parallel (default: 3)
},
const wllama = new Wllama(CONFIG_PATHS, {
parallelDownloads: 5, // optional: maximum files to download in parallel (default: 3)
});
await wllama.loadModelFromHF(
'ngxson/tinyllama_split_test',
'stories15M-q8_0-00001-of-00003.gguf'
);
```

Expand Down Expand Up @@ -184,11 +204,7 @@ npm run build

## TODO

Short term:
- Add a more pratical embedding example (using a better model)
- Maybe doing a full RAG-in-browser example using tinyllama?

Long term:
- Add support for LoRA adapter
- Support GPU inference via WebGL
- Support multi-sequences: knowing the resource limitation when using WASM, I don't think having multi-sequences is a good idea
- Multi-modal: Waiting for refactoring LLaVA implementation from llama.cpp
2 changes: 1 addition & 1 deletion actions.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -287,7 +287,7 @@ json action_set_options(app_t &app, json &body)
json action_sampling_init(app_t &app, json &body)
{
// sampling
common_sampler_params sparams;
common_params_sampling sparams;
sparams.seed = app.seed;
if (sparams.seed == LLAMA_DEFAULT_SEED)
sparams.seed = time(NULL);
Expand Down
Loading

0 comments on commit c9e1af2

Please sign in to comment.