Benchmark scripts is provided to quickly get the model inference performance.
Please refer to Installation. This example supports using source code which means you don't need install xFasterTransformer into pip and just build xFasterTransformer library, and it will search library in src directory.
Please refer to Prepare model
- Please refer to Prepare Environment to install oneCCL.
- Python dependencies.
# requirements.txt in root directory. pip install -r requirements.txt
Enter the folder corresponding to the model and run run_benchmark.sh -m <model_name>
.
Please choose <model_name>
as follows:
- llama-2 (-7b,-13b,-70b)
- llama (-7b,-13b,-30b,-65b)
- chatglm2-6b
- chatglm3-6b
- chatglm-6b
- baichuan2 (-7b,-13b)
Please choose -d
or --dtype
as follows:
- bf16 (default)
- bf16_fp16
- int8
- bf16_int8
- fp16
- bf16_int4
- int4
- bf16_nf4
- nf4
- bf16_w8a8
- w8a8
- w8a8_int8
- w8a8_int4
- w8a8_nf4
Please choose -s
or --sockets
as follows:
- 1 (default, benchmarking on single socket)
- 2 (benchmarking on 2 sockets)
Specify batch size using -bs
or --batch_size
. (default 1)
Specify input tokens using -in
or --input_tokens
. (default 32)
Specify output tokens using -out
or --output_tokens
. (default 32)
Specify beam width using -b
or --beam_width
. (default 1)
Specify inference iteration using -i
or --iter
. (default 10)
# Example of llama-2-7b with precision bf16, batch size 1, 1024 input tokens and 128 output tokens on single socket.
cd benchmark
# setup mpirun env
source ../3rdparty/oneccl/build/_install/env/setvars.sh
bash run_benchmark.sh -m llama-2-7b -d bf16 -s 1 -bs 1 -in 1024 -out 128 -i 10
-
Shell script will automatically check number of numa nodes.
-
If system configuration needs modification, please change run_benchmark.sh.
-
If you want the custom input, please modify the
prompt.json
file.
Notes!!!: The system and CPU configuration may be different. For the best performance, please try to modify OMP_NUM_THREADS, datatype and the memory nodes number (check the memory nodes using numactl -H
) according to your test environment.