Skip to content

Commit

Permalink
Update available types for cache_type_k and cache_type_v (#134)
Browse files Browse the repository at this point in the history
I've tested, based on [kv_cache_type_from_str](https://github.com/ngxson/wllama/blob/ac7dc45c2d4a99867eea589e9a30650015f8f52d/actions.hpp#L61-L78), and confirmed that all those types works.

Tested with SmolLM2 360M Instruct model.
  • Loading branch information
felladrin authored Dec 3, 2024
1 parent ac7dc45 commit 4ff6b5f
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions src/wllama.ts
Original file line number Diff line number Diff line change
Expand Up @@ -88,8 +88,8 @@ export interface LoadModelConfig {
yarn_orig_ctx?: number;
// TODO: add group attention
// optimizations
cache_type_k?: 'f16' | 'q8_0' | 'q4_0';
cache_type_v?: 'f16';
cache_type_k?: 'f32' | 'f16' | 'q8_0' | 'q5_1' | 'q5_0' | 'q4_1' | 'q4_0';
cache_type_v?: 'f32' | 'f16' | 'q8_0' | 'q5_1' | 'q5_0' | 'q4_1' | 'q4_0';
}

export interface SamplingConfig {
Expand Down

0 comments on commit 4ff6b5f

Please sign in to comment.