[Bug] errors in tgi with multiple shards #639

lianhao · 2024-12-12T03:29:46Z

Priority

Undecided

OS type

Ubuntu

Hardware type

Xeon-GNR

Installation method

Pull docker images from hub.docker.com
Build docker images from source

Deploy method

Docker compose
Docker
Kubernetes
Helm

Running nodes

Single Node

What's the version?

github e9dc58a

Description

tgi and tgi-gaudi both give errors due to permission issue when running with multiple shards:

in tgi(xeon) case, the following error is in the log, but the functionality seems ok with default model.

2024-12-12T03:11:23.380509Z ERROR text_generation_router::server: router/src/server.rs:1787: Failed to import python tokenizer OSError: [Errno 30] Read-only file system: 'out'

in tgi-gaudi case, pod crashes

df: /.triton/autotune: No such file or directory
Traceback (most recent call last):
  File "/usr/local/bin/deepspeed", line 3, in <module>
    from deepspeed.launcher.runner import main
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/__init__.py", line 25, in <module>
    from . import ops
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/ops/__init__.py", line 11, in <module>
    from . import transformer
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/ops/transformer/__init__.py", line 7, in <module>
    from .inference.config import DeepSpeedInferenceConfig
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/ops/transformer/inference/__init__.py", line 7, in <module>
    from ....model_implementations.transformers.ds_transformer import DeepSpeedTransformerInference
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/model_implementations/__init__.py", line 6, in <module>
    from .transformers.ds_transformer import DeepSpeedTransformerInference
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/model_implementations/transformers/ds_transformer.py", line 18, in <module>
    from deepspeed.ops.transformer.inference.triton.mlp import TritonMLP
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/ops/transformer/inference/triton/__init__.py", line 10, in <module>
    from .ops import *
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/ops/transformer/inference/triton/ops.py", line 6, in <module>
    import deepspeed.ops.transformer.inference.triton.matmul_ext as matmul_ext
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/ops/transformer/inference/triton/matmul_ext.py", line 457, in <module>
    fp16_matmul = Fp16Matmul()
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/ops/transformer/inference/triton/matmul_ext.py", line 206, in __init__
    __class__._read_autotune_table()
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/ops/transformer/inference/triton/matmul_ext.py", line 440, in _read_autotune_table
    TritonMatmul._read_autotune_table(__class__.__name__ + "_2d_kernel", __class__._2d_kernel)
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/ops/transformer/inference/triton/matmul_ext.py", line 161, in _read_autotune_table
    cache_manager = AutotuneCacheManager(cache_key)
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/ops/transformer/inference/triton/matmul_ext.py", line 84, in __init__
    os.makedirs(self.cache_dir, exist_ok=True)
  File "/usr/lib/python3.10/os.py", line 215, in makedirs
    makedirs(head, exist_ok=exist_ok)
  File "/usr/lib/python3.10/os.py", line 225, in makedirs
    mkdir(name, mode)
OSError: [Errno 30] Read-only file system: '/.triton' rank=0

Reproduce steps

create a test-values.yaml file with the following content:

resources:
  limits:
    habana.ai/gaudi: 2

extraCmdArgs: ["--sharded","true","--num-shard","2"]

cd helm-charts/common/tgi
helm install tgi . -f test-values.yaml (for xeon case)
helm install tgi . -f gaudi-values.yaml -f test-valueys.yaml (for gaudi case)

Raw log

No response

The text was updated successfully, but these errors were encountered:

lianhao · 2024-12-12T05:49:31Z

Based on tgi source code, tgi will use transformers.AutoTokenizer to load and resave the tokenizer in the hardcoded "out" directory. Running tgi container with non-root user or readOnlyRootFileSystem will cause trouble and potentially limit the model usage availability in tgi.

Fix issue opea-project#639 Signed-off-by: Lianhao Lu <[email protected]>

lianhao added the bug Something isn't working label Dec 12, 2024

lianhao self-assigned this Dec 12, 2024

lianhao mentioned this issue Dec 12, 2024

Adapt to latest vllm changes #632

Open

1 task

lianhao added a commit to lianhao/GenAIInfra that referenced this issue Dec 12, 2024

tgi: loosen securityContext constraints

2e98034

Fix issue opea-project#639 Signed-off-by: Lianhao Lu <[email protected]>

lianhao linked a pull request Dec 12, 2024 that will close this issue

Fix model-downloader and tgi in multi shard case #642

Open

1 task

lianhao added a commit to lianhao/GenAIInfra that referenced this issue Dec 12, 2024

tgi: loosen securityContext constraints

a61a25f

Fix issue opea-project#639 Signed-off-by: Lianhao Lu <[email protected]>

lianhao added a commit to lianhao/GenAIInfra that referenced this issue Dec 12, 2024

tgi: loosen securityContext constraints

9009207

Fix issue opea-project#639 Signed-off-by: Lianhao Lu <[email protected]>

lianhao added a commit to lianhao/GenAIInfra that referenced this issue Dec 13, 2024

tgi: loosen securityContext constraints

22f4649

Fix issue opea-project#639 Signed-off-by: Lianhao Lu <[email protected]>

lianhao added a commit to lianhao/GenAIInfra that referenced this issue Dec 16, 2024

tgi: Fix permission issue of non-root user

c12565a

Fix issue opea-project#639 Signed-off-by: Lianhao Lu <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] errors in tgi with multiple shards #639

[Bug] errors in tgi with multiple shards #639

lianhao commented Dec 12, 2024 •

edited

Loading

lianhao commented Dec 12, 2024

[Bug] errors in tgi with multiple shards #639

[Bug] errors in tgi with multiple shards #639

Comments

lianhao commented Dec 12, 2024 • edited Loading

Priority

OS type

Hardware type

Installation method

Deploy method

Running nodes

What's the version?

Description

Reproduce steps

Raw log

lianhao commented Dec 12, 2024

lianhao commented Dec 12, 2024 •

edited

Loading