You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Run 2.1.0 with docker like: docker run -it --rm --gpus all --ipc=host -p 8080:80 -v /home/jp/.cache/data:/data ghcr.io/huggingface/text-generation-inference:2.1.0 --model-id microsoft/Phi-3-mini-128k-instruct --max-batch-prefill-tokens=8192 --max-total-tokens=8192 --max-input-tokens=8191 --trust-remote-code --revision bb5bf1e4001277a606e11debca0ef80323e5f824 --sharded false
Information
Docker
The CLI directly
Tasks
An officially supported command
My own modifications
Reproduction
Running Phi-3 128k (the old revision as the new one fails - see #2172 ) I get good results as long as total context (input tokens + output tokens) are below 4096.
As soon as Input + Output tokens > 4096, Phi-3 outputs just gibberish, e.g. ,,..,,,,,,,,,,,,,,,,ß,,.s,ß,gen,gen,,,,s,,,,,,,,,,,,,,,,,,,,,,,,,,,o,,,,,,,,,,,,,,,,,,,,,,-hn,.,,,,,,,,,,und,,,,,,,,,,,,,,,,,,,,,,,s,,gen...,
I think there has to be some bug in the rotary embedding implementation, see also #2060 and #2055 .
Expected behavior
Inference works for longer contexts.
The text was updated successfully, but these errors were encountered:
With VLLM I got the same issue initially - but then was able to figure out that this is due to the FP8 KV cache (see here) - does TGI this by default? because I didn't enable it knowingly.
Hi, did you manage to solve this with TGI? I am running into the same issue. I am currently running the latest release, 2.2.0: the Phi3-128k support is back, but this issue persists.
System Info
GPU: RTX4090
Run 2.1.0 with docker like:
docker run -it --rm --gpus all --ipc=host -p 8080:80 -v /home/jp/.cache/data:/data ghcr.io/huggingface/text-generation-inference:2.1.0 --model-id microsoft/Phi-3-mini-128k-instruct --max-batch-prefill-tokens=8192 --max-total-tokens=8192 --max-input-tokens=8191 --trust-remote-code --revision bb5bf1e4001277a606e11debca0ef80323e5f824 --sharded false
Information
Tasks
Reproduction
Running Phi-3 128k (the old revision as the new one fails - see #2172 ) I get good results as long as total context (input tokens + output tokens) are below 4096.
As soon as Input + Output tokens > 4096, Phi-3 outputs just gibberish, e.g.
,,..,,,,,,,,,,,,,,,,ß,,.s,ß,gen,gen,,,,s,,,,,,,,,,,,,,,,,,,,,,,,,,,o,,,,,,,,,,,,,,,,,,,,,,-hn,.,,,,,,,,,,und,,,,,,,,,,,,,,,,,,,,,,,s,,gen...,
I think there has to be some bug in the rotary embedding implementation, see also #2060 and #2055 .
Expected behavior
Inference works for longer contexts.
The text was updated successfully, but these errors were encountered: