Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Phi-3 mini 128k produces gibberish if context >4k tokens #2185

Open
2 of 4 tasks
jphme opened this issue Jul 4, 2024 · 4 comments
Open
2 of 4 tasks

Phi-3 mini 128k produces gibberish if context >4k tokens #2185

jphme opened this issue Jul 4, 2024 · 4 comments

Comments

@jphme
Copy link

jphme commented Jul 4, 2024

System Info

GPU: RTX4090

Run 2.1.0 with docker like:
docker run -it --rm --gpus all --ipc=host -p 8080:80 -v /home/jp/.cache/data:/data ghcr.io/huggingface/text-generation-inference:2.1.0 --model-id microsoft/Phi-3-mini-128k-instruct --max-batch-prefill-tokens=8192 --max-total-tokens=8192 --max-input-tokens=8191 --trust-remote-code --revision bb5bf1e4001277a606e11debca0ef80323e5f824 --sharded false

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

Running Phi-3 128k (the old revision as the new one fails - see #2172 ) I get good results as long as total context (input tokens + output tokens) are below 4096.

As soon as Input + Output tokens > 4096, Phi-3 outputs just gibberish, e.g.
,,..,,,,,,,,,,,,,,,,ß,,.s,ß,gen,gen,,,,s,,,,,,,,,,,,,,,,,,,,,,,,,,,o,,,,,,,,,,,,,,,,,,,,,,-hn,.,,,,,,,,,,und,,,,,,,,,,,,,,,,,,,,,,,s,,gen...,

I think there has to be some bug in the rotary embedding implementation, see also #2060 and #2055 .

Expected behavior

Inference works for longer contexts.

@jphme
Copy link
Author

jphme commented Jul 4, 2024

With VLLM I got the same issue initially - but then was able to figure out that this is due to the FP8 KV cache (see here) - does TGI this by default? because I didn't enable it knowingly.

@annadmitrieva
Copy link

Hi, did you manage to solve this with TGI? I am running into the same issue. I am currently running the latest release, 2.2.0: the Phi3-128k support is back, but this issue persists.

@ytjhai
Copy link

ytjhai commented Aug 12, 2024

I can't get the phi-3-mini 128k model to publish at all through inference endpoints. Is there a particular tagged version compatible with it?

edit: Adding the environment variable TRUST_REMOTE_CODE and setting it to true fixed the issue

@nbroad1881
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants