Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Llama-2-7B-QServe model doesn't give the expected output #11

Open
MuYu-zhi opened this issue May 21, 2024 · 2 comments
Open

Llama-2-7B-QServe model doesn't give the expected output #11

MuYu-zhi opened this issue May 21, 2024 · 2 comments

Comments

@MuYu-zhi
Copy link

I run qserve_e2e_generation.py with my own prompts, rather than the original WildChat dataset, the outputs seem just to repeat the input prompt until the size of max_token.

Two input prompts among the all five are:

Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?

Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?

......

and
sampling_params = SamplingParams(n=1, top_p=1.0, top_k=50, temperature=1.0, stop_token_ids=[128001, 128009], max_tokens=1024, )

A output log snippet is as follow:

Iteration 1017 (remaining req.s = 5)
Iteration 1018 (remaining req.s = 5)
Iteration 1019 (remaining req.s = 5)
Iteration 1020 (remaining req.s = 5)
Iteration 1021 (remaining req.s = 5)
Iteration 1022 (remaining req.s = 5)
Iteration 1023 (remaining req.s = 5)
Iteration 1024 (remaining req.s = 5)

[Conversation 0 output] <s> Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?
Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?
Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May? 48 + 24 = 72
Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May? 48 + 24 = 72
Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May? 48 + 24 = 72
Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May? 48 + 24 = 72
Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May? 48 + 24 = 72
Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May? 48 + 24 = 72
Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May? 48 + 24 = 72
Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May? 48 + 24 = 72
Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May? 48 + 24 = 72
Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May? 48 + 24 = 72
Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May? 48 + 24 = 72
Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May? 48 + 24 = 72
Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May? 48 + 24 = 72
Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May? 48 + 24 = 72
Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May? 48 + 24 = 72
Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May? 48 + 24 = 72
Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May? 48 + 24 = 72
Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May? 48 + 24 = 72
Natalia sold clips to
[Conversation 1 output] <s> Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?
A. $6
B. $10
C. $12
D. $15
E. $18
Answer: $12
Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?
Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?
Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?
Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?
Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?
Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?
Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?
Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?
Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?
Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?
Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?
Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?
Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?
Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?
Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?
Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?
Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?
Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?
Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?
Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?
Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?
Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?
Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?
Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?
Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?
Weng earns $12 an hour for babys

  1. It's a little bit weird, I have checked the sha256 of model files, it's correct. Any other reason for this?
  2. Besides, I also wonder whether the quantized model of LLM, its numerical value of last hidden states, is approximately same with the original unquantized model, or totally different?
  3. In the above case, I print the hidden states, find it's totally different, is this normal?
@kentang-mit
Copy link
Contributor

Hi @MuYu-zhi,

Your observation is probably relate to the fact that Llama-2-7B is not an instruction-tuned model. Besides, we currently apply greedy decoding in our implementation for simplicity. Adding repetition penalty to the sampler will definitely alleviate this problem. Regarding other problems, I cannot guarantee that the last hidden states of the quantized model will look similar to the original model. This is because output error incurred by quantization will propagate across the model, from the first layer to the end, but I will expect that the output distribution should be similar.

Best,
Haotian

@MuYu-zhi
Copy link
Author

@kentang-mit thanks for your reply.

Hope you can help clarify a few more questions:

  1. The weight of the very first layer embed_tokens is quantized or not? In my observation, it's not, right?
  2. If the embed_tokens layer is not quantized, I understand, it will be computed in FP16 mode. But the cuda kernel performance at::native::<unnamed>::indexSelectLargeIndex<c10::Half, long, unsigned int, (int)2, (int)2, (int)-2, (bool)1>(at::cuda::detail::TensorInfo<T1, T3>, at::cuda::detail::TensorInfo<T1, T3>, at::cuda::detail::TensorInfo<T2, T3>, int, int, T3, T3, long) observed by nsight, Qserve is much faster than that in vllm, with 232 tokens, it's 14.5us vs. 409.6us, why, what causes this acceleration?
  3. The quantized model weight is represented in torch.int8, rather than int32?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants