Skip to content

Commit

Permalink
Update README.md and modify default value
Browse files Browse the repository at this point in the history
  • Loading branch information
msy-kato committed May 28, 2024
1 parent 834e4a8 commit 86f042f
Show file tree
Hide file tree
Showing 2 changed files with 5 additions and 5 deletions.
6 changes: 3 additions & 3 deletions examples/batched-bench/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,13 +13,13 @@ There are 2 modes of operation:
./batched-bench MODEL_PATH [N_KV_MAX] [N_BATCH] [N_UBATCH] [FATTN] [IS_PP_SHARED] [NGL] [NT] [NTB] <PP> <TG> <PL>

# LLaMA 7B, F16, N_KV_MAX = 16384 (8GB), prompt not shared
./batched-bench ./models/llama-7b/ggml-model-f16.gguf 16384 2048 512 0 99
./batched-bench ./models/llama-7b/ggml-model-f16.gguf 16384 2048 512 0 0 99

# LLaMA 7B, Q8_0, N_KV_MAX = 16384 (8GB), prompt is shared
./batched-bench ./models/llama-7b/ggml-model-q8_0.gguf 16384 2048 512 1 99
./batched-bench ./models/llama-7b/ggml-model-q8_0.gguf 16384 2048 512 0 1 99

# custom set of batches
./batched-bench ./models/llama-7b/ggml-model-q8_0.gguf 2048 512 512 0 0 999 8 8 128,256,512 128,256 1,2,4,8,16,32
./batched-bench ./models/llama-7b/ggml-model-q8_0.gguf 16384 2048 512 0 0 999 8 8 128,256,512 128,256 1,2,4,8,16,32
```

## Sample results
Expand Down
4 changes: 2 additions & 2 deletions examples/batched-bench/batched-bench.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -34,11 +34,11 @@ int main(int argc, char ** argv) {
if (argc == 1 || argv[1][0] == '-') {
printf("usage: %s MODEL_PATH [N_KV_MAX] [N_BATCH] [N_UBATCH] [FATTN] [IS_PP_SHARED] [NGL] [NT] [NTB] <PP> <TG> <PL>\n", argv[0]);
printf(" <PP>, <TG> and PL are comma-separated lists of numbers without spaces\n\n");
printf(" example: %s ggml-model-f16.gguf 2048 512 512 0 0 999 8 8 128,256,512 128,256 1,2,4,8,16,32\n\n", argv[0]);
printf(" example: %s ggml-model-f16.gguf 16384 2048 512 0 0 999 8 8 128,256,512 128,256 1,2,4,8,16,32\n\n", argv[0]);
return 1 ;
}

int n_kv_max = 2048;
int n_kv_max = 16384;
int n_batch = 2048;
int n_ubatch = 512;
bool flash_attn = false;
Expand Down

0 comments on commit 86f042f

Please sign in to comment.