Adapt to latest vllm changes #632

lianhao · 2024-12-10T07:08:43Z

Description

Remove --eager-enforce on hpu to improve performance
Refactor to the upstream docker entrypoint changes

Issues

Fixes #631.

Type of change

List the type of change like below. Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)

Dependencies

List the newly introduced 3rd party dependency if exists.

Tests

Describe the tests that you ran to verify your changes.

eero-t

Approved, matches changes for "GenAIExamples" in opea-project/GenAIExamples#1210 (and corresponding PR for "GenAIComps" repo).

@poussa ?

- Remove --eager-enforce on hpu to improve performance - Refactor to the upstream docker entrypoint changes Fixes issue opea-project#631. Signed-off-by: Lianhao Lu <[email protected]>

eero-t · 2024-12-11T15:13:33Z

Investigating the CI failure for "agent, gaudi, ci-gaudi-values, common" test, I see 2 bugs:

agent values.yaml specifies (huge) meta-llama/Meta-Llama-3.1-70B-Instruct model for CPU version of TGI
- https://github.com/opea-project/GenAIInfra/blob/main/helm-charts/common/agent/values.yaml#L10
- Something that heavy should be used only for sharded accelerator configs, not for CPU!
CI test does not provide HF token to access that for the sharded Gaudi TGI pods:
- [pod/agent20241211092531-tgi-6f6b65dc97-cjhjh/model-downloader] Access to model meta-llama/Llama-3.1-70B-Instruct is restricted. You must have access to it and be authenticated to access it. Please log in.

(Besides the size, I think another model would be nicer as default due to license used on Meta's models.)

eero-t · 2024-12-11T16:21:59Z

My vLLM PR includes same agent and (relevant) vLLM component changes as yours, but strangely that same CI agent test succeeded for it: https://github.com/opea-project/GenAIInfra/actions/runs/12262626198/job/34212355870?pr=610 ?

EDIT: today's push on my PR got the same issue.

eero-t

--tensor-parallel-size option can be dropped, as 1 value is the default:
https://docs.vllm.ai/en/latest/usage/engine_args.html

lianhao · 2024-12-12T08:15:46Z

This specific failure is caused by 2 different bugs: #639, #641

lianhao · 2024-12-16T06:40:03Z

we need to wait for PR #642 to land-in first

lianhao requested a review from yongfengdu as a code owner December 10, 2024 07:08

lianhao force-pushed the bug631 branch 2 times, most recently from 4aad447 to 6bfddb6 Compare December 10, 2024 07:20

eero-t approved these changes Dec 10, 2024

View reviewed changes

eero-t mentioned this pull request Dec 10, 2024

Add vLLM+HPA support to ChatQnA Helm chart #610

Draft

2 tasks

Adapt to latest vllm changes

193f120

- Remove --eager-enforce on hpu to improve performance - Refactor to the upstream docker entrypoint changes Fixes issue opea-project#631. Signed-off-by: Lianhao Lu <[email protected]>

lianhao force-pushed the bug631 branch from 6bfddb6 to 193f120 Compare December 11, 2024 08:53

eero-t reviewed Dec 11, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adapt to latest vllm changes #632

Adapt to latest vllm changes #632

lianhao commented Dec 10, 2024 •

edited

Loading

eero-t left a comment

eero-t commented Dec 11, 2024 •

edited

Loading

eero-t commented Dec 11, 2024 •

edited

Loading

eero-t left a comment

lianhao commented Dec 12, 2024

lianhao commented Dec 16, 2024

Adapt to latest vllm changes #632

Are you sure you want to change the base?

Adapt to latest vllm changes #632

Conversation

lianhao commented Dec 10, 2024 • edited Loading

Description

Issues

Type of change

Dependencies

Tests

eero-t left a comment

Choose a reason for hiding this comment

eero-t commented Dec 11, 2024 • edited Loading

eero-t commented Dec 11, 2024 • edited Loading

eero-t left a comment

Choose a reason for hiding this comment

lianhao commented Dec 12, 2024

lianhao commented Dec 16, 2024

lianhao commented Dec 10, 2024 •

edited

Loading

eero-t commented Dec 11, 2024 •

edited

Loading

eero-t commented Dec 11, 2024 •

edited

Loading