Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] errors in tgi with multiple shards #639

Open
2 of 6 tasks
lianhao opened this issue Dec 12, 2024 · 1 comment · May be fixed by #642
Open
2 of 6 tasks

[Bug] errors in tgi with multiple shards #639

lianhao opened this issue Dec 12, 2024 · 1 comment · May be fixed by #642
Assignees
Labels
bug Something isn't working

Comments

@lianhao
Copy link
Collaborator

lianhao commented Dec 12, 2024

Priority

Undecided

OS type

Ubuntu

Hardware type

Xeon-GNR

Installation method

  • Pull docker images from hub.docker.com
  • Build docker images from source

Deploy method

  • Docker compose
  • Docker
  • Kubernetes
  • Helm

Running nodes

Single Node

What's the version?

github e9dc58a

Description

tgi and tgi-gaudi both give errors due to permission issue when running with multiple shards:

in tgi(xeon) case, the following error is in the log, but the functionality seems ok with default model.

2024-12-12T03:11:23.380509Z ERROR text_generation_router::server: router/src/server.rs:1787: Failed to import python tokenizer OSError: [Errno 30] Read-only file system: 'out'

in tgi-gaudi case, pod crashes

df: /.triton/autotune: No such file or directory
Traceback (most recent call last):
  File "/usr/local/bin/deepspeed", line 3, in <module>
    from deepspeed.launcher.runner import main
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/__init__.py", line 25, in <module>
    from . import ops
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/ops/__init__.py", line 11, in <module>
    from . import transformer
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/ops/transformer/__init__.py", line 7, in <module>
    from .inference.config import DeepSpeedInferenceConfig
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/ops/transformer/inference/__init__.py", line 7, in <module>
    from ....model_implementations.transformers.ds_transformer import DeepSpeedTransformerInference
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/model_implementations/__init__.py", line 6, in <module>
    from .transformers.ds_transformer import DeepSpeedTransformerInference
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/model_implementations/transformers/ds_transformer.py", line 18, in <module>
    from deepspeed.ops.transformer.inference.triton.mlp import TritonMLP
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/ops/transformer/inference/triton/__init__.py", line 10, in <module>
    from .ops import *
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/ops/transformer/inference/triton/ops.py", line 6, in <module>
    import deepspeed.ops.transformer.inference.triton.matmul_ext as matmul_ext
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/ops/transformer/inference/triton/matmul_ext.py", line 457, in <module>
    fp16_matmul = Fp16Matmul()
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/ops/transformer/inference/triton/matmul_ext.py", line 206, in __init__
    __class__._read_autotune_table()
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/ops/transformer/inference/triton/matmul_ext.py", line 440, in _read_autotune_table
    TritonMatmul._read_autotune_table(__class__.__name__ + "_2d_kernel", __class__._2d_kernel)
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/ops/transformer/inference/triton/matmul_ext.py", line 161, in _read_autotune_table
    cache_manager = AutotuneCacheManager(cache_key)
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/ops/transformer/inference/triton/matmul_ext.py", line 84, in __init__
    os.makedirs(self.cache_dir, exist_ok=True)
  File "/usr/lib/python3.10/os.py", line 215, in makedirs
    makedirs(head, exist_ok=exist_ok)
  File "/usr/lib/python3.10/os.py", line 225, in makedirs
    mkdir(name, mode)
OSError: [Errno 30] Read-only file system: '/.triton' rank=0

Reproduce steps

create a test-values.yaml file with the following content:

resources:
  limits:
    habana.ai/gaudi: 2

extraCmdArgs: ["--sharded","true","--num-shard","2"]

cd helm-charts/common/tgi
helm install tgi . -f test-values.yaml (for xeon case)
helm install tgi . -f gaudi-values.yaml -f test-valueys.yaml (for gaudi case)

Raw log

No response

@lianhao lianhao added the bug Something isn't working label Dec 12, 2024
@lianhao lianhao self-assigned this Dec 12, 2024
@lianhao
Copy link
Collaborator Author

lianhao commented Dec 12, 2024

Based on tgi source code, tgi will use transformers.AutoTokenizer to load and resave the tokenizer in the hardcoded "out" directory. Running tgi container with non-root user or readOnlyRootFileSystem will cause trouble and potentially limit the model usage availability in tgi.

lianhao added a commit to lianhao/GenAIInfra that referenced this issue Dec 12, 2024
@lianhao lianhao linked a pull request Dec 12, 2024 that will close this issue
1 task
lianhao added a commit to lianhao/GenAIInfra that referenced this issue Dec 12, 2024
lianhao added a commit to lianhao/GenAIInfra that referenced this issue Dec 12, 2024
lianhao added a commit to lianhao/GenAIInfra that referenced this issue Dec 13, 2024
lianhao added a commit to lianhao/GenAIInfra that referenced this issue Dec 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant