-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] errors in tgi with multiple shards #639
Labels
bug
Something isn't working
Comments
Based on tgi source code, tgi will use transformers.AutoTokenizer to load and resave the tokenizer in the hardcoded "out" directory. Running tgi container with non-root user or readOnlyRootFileSystem will cause trouble and potentially limit the model usage availability in tgi. |
lianhao
added a commit
to lianhao/GenAIInfra
that referenced
this issue
Dec 12, 2024
Fix issue opea-project#639 Signed-off-by: Lianhao Lu <[email protected]>
1 task
lianhao
added a commit
to lianhao/GenAIInfra
that referenced
this issue
Dec 12, 2024
Fix issue opea-project#639 Signed-off-by: Lianhao Lu <[email protected]>
lianhao
added a commit
to lianhao/GenAIInfra
that referenced
this issue
Dec 12, 2024
Fix issue opea-project#639 Signed-off-by: Lianhao Lu <[email protected]>
lianhao
added a commit
to lianhao/GenAIInfra
that referenced
this issue
Dec 13, 2024
Fix issue opea-project#639 Signed-off-by: Lianhao Lu <[email protected]>
lianhao
added a commit
to lianhao/GenAIInfra
that referenced
this issue
Dec 16, 2024
Fix issue opea-project#639 Signed-off-by: Lianhao Lu <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Priority
Undecided
OS type
Ubuntu
Hardware type
Xeon-GNR
Installation method
Deploy method
Running nodes
Single Node
What's the version?
github e9dc58a
Description
tgi and tgi-gaudi both give errors due to permission issue when running with multiple shards:
in tgi(xeon) case, the following error is in the log, but the functionality seems ok with default model.
in tgi-gaudi case, pod crashes
Reproduce steps
create a test-values.yaml file with the following content:
cd helm-charts/common/tgi
helm install tgi . -f test-values.yaml (for xeon case)
helm install tgi . -f gaudi-values.yaml -f test-valueys.yaml (for gaudi case)
Raw log
No response
The text was updated successfully, but these errors were encountered: