Llama-2-7B-QServe model doesn't give the expected output #11

MuYu-zhi · 2024-05-21T02:35:18Z

I run qserve_e2e_generation.py with my own prompts, rather than the original WildChat dataset, the outputs seem just to repeat the input prompt until the size of max_token.

Two input prompts among the all five are:

Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?

Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?

......

and
sampling_params = SamplingParams(n=1, top_p=1.0, top_k=50, temperature=1.0, stop_token_ids=[128001, 128009], max_tokens=1024, )

A output log snippet is as follow:

Iteration 1017 (remaining req.s = 5)
Iteration 1018 (remaining req.s = 5)
Iteration 1019 (remaining req.s = 5)
Iteration 1020 (remaining req.s = 5)
Iteration 1021 (remaining req.s = 5)
Iteration 1022 (remaining req.s = 5)
Iteration 1023 (remaining req.s = 5)
Iteration 1024 (remaining req.s = 5)

[Conversation 0 output] <s> Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?
Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?
Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May? 48 + 24 = 72
Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May? 48 + 24 = 72
Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May? 48 + 24 = 72
Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May? 48 + 24 = 72
Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May? 48 + 24 = 72
Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May? 48 + 24 = 72
Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May? 48 + 24 = 72
Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May? 48 + 24 = 72
Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May? 48 + 24 = 72
Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May? 48 + 24 = 72
Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May? 48 + 24 = 72
Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May? 48 + 24 = 72
Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May? 48 + 24 = 72
Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May? 48 + 24 = 72
Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May? 48 + 24 = 72
Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May? 48 + 24 = 72
Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May? 48 + 24 = 72
Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May? 48 + 24 = 72
Natalia sold clips to
[Conversation 1 output] <s> Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?
A. $6
B. $10
C. $12
D. $15
E. $18
Answer: $12
Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?
Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?
Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?
Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?
Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?
Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?
Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?
Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?
Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?
Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?
Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?
Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?
Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?
Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?
Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?
Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?
Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?
Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?
Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?
Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?
Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?
Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?
Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?
Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?
Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?
Weng earns $12 an hour for babys

It's a little bit weird, I have checked the sha256 of model files, it's correct. Any other reason for this?
Besides, I also wonder whether the quantized model of LLM, its numerical value of last hidden states, is approximately same with the original unquantized model, or totally different?
In the above case, I print the hidden states, find it's totally different, is this normal?

The text was updated successfully, but these errors were encountered:

kentang-mit · 2024-05-21T20:01:25Z

Hi @MuYu-zhi,

Your observation is probably relate to the fact that Llama-2-7B is not an instruction-tuned model. Besides, we currently apply greedy decoding in our implementation for simplicity. Adding repetition penalty to the sampler will definitely alleviate this problem. Regarding other problems, I cannot guarantee that the last hidden states of the quantized model will look similar to the original model. This is because output error incurred by quantization will propagate across the model, from the first layer to the end, but I will expect that the output distribution should be similar.

Best,
Haotian

MuYu-zhi · 2024-05-22T04:25:06Z

@kentang-mit thanks for your reply.

Hope you can help clarify a few more questions：

The weight of the very first layer embed_tokens is quantized or not? In my observation, it's not, right?
If the embed_tokens layer is not quantized, I understand, it will be computed in FP16 mode. But the cuda kernel performance at::native::<unnamed>::indexSelectLargeIndex<c10::Half, long, unsigned int, (int)2, (int)2, (int)-2, (bool)1>(at::cuda::detail::TensorInfo<T1, T3>, at::cuda::detail::TensorInfo<T1, T3>, at::cuda::detail::TensorInfo<T2, T3>, int, int, T3, T3, long) observed by nsight, Qserve is much faster than that in vllm, with 232 tokens, it's 14.5us vs. 409.6us, why, what causes this acceleration?
The quantized model weight is represented in torch.int8, rather than int32?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llama-2-7B-QServe model doesn't give the expected output #11

Llama-2-7B-QServe model doesn't give the expected output #11

MuYu-zhi commented May 21, 2024

kentang-mit commented May 21, 2024

MuYu-zhi commented May 22, 2024

Llama-2-7B-QServe model doesn't give the expected output #11

Llama-2-7B-QServe model doesn't give the expected output #11

Comments

MuYu-zhi commented May 21, 2024

kentang-mit commented May 21, 2024

MuYu-zhi commented May 22, 2024