Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DO NOT MERGE] Ibm dev rebased #255

Open
wants to merge 43 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 41 commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
42d3a81
[Docs] Add Nebius as sponsors (#10371)
simon-mo Nov 15, 2024
a18c4fb
[Frontend] Add --version flag to CLI (#10369)
russellb Nov 15, 2024
067dd2b
[Doc] Move PR template content to docs (#10159)
russellb Nov 15, 2024
82be00f
[Docs] Misc updates to TPU installation instructions (#10165)
mikegre-google Nov 15, 2024
2ca5912
[Frontend] Automatic detection of chat content format from AST (#9919)
DarkLight1337 Nov 16, 2024
ea480a1
[doc] add doc for the plugin system (#10372)
youkaichao Nov 16, 2024
2e453bc
[misc][plugin] improve log messages (#10386)
youkaichao Nov 16, 2024
51813a4
[BugFix] [Kernel] Fix GPU SEGV occuring in fused_moe kernel (#10385)
rasmith Nov 16, 2024
6c45f56
[Misc] Update benchmark to support image_url file or http (#10287)
kakao-steve-ai Nov 16, 2024
d93bde9
[Misc] Medusa supports custom bias (#10361)
skylee-01 Nov 16, 2024
068451f
[Bugfix] Fix M-RoPE position calculation when chunked prefill is enab…
imkero Nov 16, 2024
d49dacb
[V1] Add code owners for V1 (#10397)
WoosukKwon Nov 16, 2024
11d2bbc
[2/N][torch.compile] make compilation cfg part of vllm cfg (#10383)
youkaichao Nov 17, 2024
2ea854d
[V1] Refactor model executable interface for all text-only language m…
ywang96 Nov 17, 2024
2652ea1
[CI/Build] Fix IDC hpu [Device not found] issue (#10384)
xuechendi Nov 17, 2024
47325d5
[Bugfix][CPU] Fix CPU embedding runner with tensor parallel (#10394)
Isotr0py Nov 17, 2024
e288d72
[platforms] refactor cpu code (#10402)
youkaichao Nov 17, 2024
f6dc8be
[Hardware] [HPU]add `mark_step` for hpu (#10239)
jikunshang Nov 17, 2024
dcccc62
[Bugfix] Fix mrope_position_delta in non-last prefill chunk (#10403)
imkero Nov 17, 2024
d97d269
[Misc] Enhance offline_inference to support user-configurable paramet…
wchen61 Nov 17, 2024
d3fe99b
[Misc] Add uninitialized params tracking for `AutoWeightsLoader` (#10…
Isotr0py Nov 18, 2024
1f165b6
[Bugfix] Ignore ray reinit error when current platform is ROCm or XPU…
HollowMan6 Nov 18, 2024
56b9c49
[4/N][torch.compile] clean up set_torch_compile_backend (#10401)
youkaichao Nov 18, 2024
663fb57
[VLM] Report multi_modal_placeholders in output (#10407)
lk-chen Nov 18, 2024
3c64fbb
[Model] Remove redundant softmax when using PoolingType.STEP (#10415)
Maybewuss Nov 18, 2024
20ea0c3
[Model][LoRA]LoRA support added for glm-4v (#10418)
B-201 Nov 18, 2024
b1d6a6a
[Model] Remove transformers attention porting in VITs (#10414)
Isotr0py Nov 18, 2024
5d6df78
[Doc] Update doc for LoRA support in GLM-4V (#10425)
B-201 Nov 18, 2024
b4c641f
[5/N][torch.compile] torch.jit.script --> torch.compile (#10406)
youkaichao Nov 18, 2024
c973945
[Doc] Add documentation for Structured Outputs (#9943)
ismael-dm Nov 18, 2024
e19e15a
Fix open_collective value in FUNDING.yml (#10426)
andrew Nov 18, 2024
a00d8c1
[Model][Bugfix] Support TP for PixtralHF ViT (#10405)
mgoin Nov 18, 2024
472b9cf
[Hardware][XPU] AWQ/GPTQ support for xpu backend (#10107)
yma11 Nov 18, 2024
9147b76
[Kernel] Explicitly specify other value in tl.load calls (#9014)
angusYuhao Nov 18, 2024
ce83e18
[Kernel] Initial Machete W4A8 support + Refactors (#9855)
LucasWilkinson Nov 18, 2024
bf5bf24
Squash 9522
fialhocoelho Nov 18, 2024
8b27172
Squash 6357
fialhocoelho Nov 18, 2024
e8fcdee
Squash 10235
fialhocoelho Nov 18, 2024
1e7f586
Squash 10400
fialhocoelho Nov 18, 2024
09f5b4a
Squash 10430
fialhocoelho Nov 18, 2024
33c1553
pin mistral and install adapter from branch :rocket:
fialhocoelho Nov 18, 2024
f931d71
Dockerfile.ubi: remove extra line continuation
dtrifiro Dec 4, 2024
87ebca7
Merge branch 'main' into ibm-dev-rebased
prashantgupta24 Dec 4, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .buildkite/run-hpu-test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -13,4 +13,4 @@ trap remove_docker_container EXIT
remove_docker_container

# Run the image and launch offline inference
docker run --runtime=habana --name=hpu-test --network=host -e VLLM_SKIP_WARMUP=true --entrypoint="" hpu-test-env python3 examples/offline_inference.py
docker run --runtime=habana --name=hpu-test --network=host -e HABANA_VISIBLE_DEVICES=all -e VLLM_SKIP_WARMUP=true --entrypoint="" hpu-test-env python3 examples/offline_inference.py
17 changes: 10 additions & 7 deletions .github/CODEOWNERS
Validating CODEOWNERS rules …
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,16 @@

# This lists cover the "core" components of vLLM that require careful review
/vllm/attention/backends/abstract.py @WoosukKwon @zhuohan123 @youkaichao @alexm-neuralmagic @comaniac @njhill
/vllm/core @WoosukKwon @zhuohan123 @youkaichao @alexm-neuralmagic @comaniac @njhill
/vllm/engine/llm_engine.py @WoosukKwon @zhuohan123 @youkaichao @alexm-neuralmagic @comaniac @njhill
/vllm/executor/executor_base.py @WoosukKwon @zhuohan123 @youkaichao @alexm-neuralmagic @comaniac @njhill
/vllm/worker/worker_base.py @WoosukKwon @zhuohan123 @youkaichao @alexm-neuralmagic @comaniac @njhill
/vllm/worker/worker.py @WoosukKwon @zhuohan123 @youkaichao @alexm-neuralmagic @comaniac @njhill
/vllm/model_executor/layers/sampler.py @WoosukKwon @zhuohan123 @youkaichao @alexm-neuralmagic @comaniac @njhill
CMakeLists.txt @tlrmchlsmth @WoosukKwon
/vllm/core @zhuohan123 @youkaichao @alexm-neuralmagic @comaniac @njhill
/vllm/engine/llm_engine.py @zhuohan123 @youkaichao @alexm-neuralmagic @comaniac @njhill
/vllm/executor/executor_base.py @zhuohan123 @youkaichao @alexm-neuralmagic @comaniac @njhill
/vllm/worker/worker_base.py @zhuohan123 @youkaichao @alexm-neuralmagic @comaniac @njhill
/vllm/worker/worker.py @zhuohan123 @youkaichao @alexm-neuralmagic @comaniac @njhill
/vllm/model_executor/layers/sampler.py @zhuohan123 @youkaichao @alexm-neuralmagic @comaniac @njhill
CMakeLists.txt @tlrmchlsmth

# vLLM V1
/vllm/v1 @WoosukKwon @robertgshaw2-neuralmagic @njhill @ywang96 @comaniac @alexm-neuralmagic

# Test ownership
/tests/async_engine @njhill @robertgshaw2-neuralmagic @simon-mo
Expand Down
2 changes: 1 addition & 1 deletion .github/FUNDING.yml
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
github: [vllm-project]
open_collective: [vllm]
open_collective: vllm
5 changes: 5 additions & 0 deletions .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
FILL IN THE PR DESCRIPTION HERE

FIX #xxxx (*link existing issues this PR will resolve*)

**BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing/overview.html **
25 changes: 21 additions & 4 deletions .github/scripts/cleanup_pr_body.sh
Original file line number Diff line number Diff line change
Expand Up @@ -15,19 +15,36 @@ NEW=/tmp/new_pr_body.txt
gh pr view --json body --template "{{.body}}" "${PR_NUMBER}" > "${OLD}"
cp "${OLD}" "${NEW}"

# Remove all lines after and including "**BEFORE SUBMITTING, PLEASE READ THE CHECKLIST BELOW AND FILL IN THE DESCRIPTION ABOVE**"
sed -i '/\*\*BEFORE SUBMITTING, PLEASE READ THE CHECKLIST BELOW AND FILL IN THE DESCRIPTION ABOVE\*\*/,$d' "${NEW}"

# Remove "FIX #xxxx (*link existing issues this PR will resolve*)"
sed -i '/FIX #xxxx.*$/d' "${NEW}"

# Remove "FILL IN THE PR DESCRIPTION HERE"
sed -i '/FILL IN THE PR DESCRIPTION HERE/d' "${NEW}"

# Remove all lines after and including "**BEFORE SUBMITTING, PLEASE READ THE CHECKLIST BELOW AND FILL IN THE DESCRIPTION ABOVE**"
sed -i '/\*\*BEFORE SUBMITTING, PLEASE READ.*\*\*/,$d' "${NEW}"

# Remove HTML <details> section that includes <summary> text of "PR Checklist (Click to Expand)"
python3 - <<EOF
import re

with open("${NEW}", "r") as file:
content = file.read()

pattern = re.compile(r'(---\n\n)?<details>.*?<summary>.*?PR Checklist \(Click to Expand\).*?</summary>.*?</details>', re.DOTALL)
content = re.sub(pattern, '', content)

with open("${NEW}", "w") as file:
file.write(content)
EOF

# Run this only if ${NEW} is different than ${OLD}
if ! cmp -s "${OLD}" "${NEW}"; then
echo "Updating PR body"
gh pr edit --body-file "${NEW}" "${PR_NUMBER}"
echo
echo "Updated PR body:"
echo
cat "${NEW}"
else
echo "No changes needed"
fi
8 changes: 6 additions & 2 deletions Dockerfile.ubi
Original file line number Diff line number Diff line change
Expand Up @@ -201,15 +201,19 @@ WORKDIR /home/vllm

ENTRYPOINT ["python3", "-m", "vllm.entrypoints.openai.api_server"]


FROM vllm-openai as vllm-grpc-adapter

USER root

RUN --mount=type=cache,target=/root/.cache/pip \
--mount=type=cache,target=/root/.cache/uv \
--mount=type=bind,from=build,src=/workspace/dist,target=/workspace/dist \
HOME=/root uv pip install $(echo dist/*.whl)'[tensorizer]' vllm-tgis-adapter==0.5.3
uv pip install $(echo /workspace/dist/*.whl)'[tensorizer]' --verbose && \
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's an issue in this Dockerfile at line 188 (not part of this PR) which prevents the dockerfile from building. You can either rebase on main or cherry pick 039e209 to fix the issue.

uv pip install \
"git+https://github.com/opendatahub-io/vllm-tgis-adapter@ibm-20241106-adapter" --verbose

RUN --mount=type=bind,from=build,src=/workspace/dist,target=/workspace/dist \
echo "Local dir and dist:" && pwd && ls -l /workspace/dist

ENV GRPC_PORT=8033 \
PORT=8000 \
Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,7 @@ vLLM is a community project. Our compute resources for development and testing a
- Dropbox
- Google Cloud
- Lambda Lab
- Nebius
- NVIDIA
- Replicate
- Roblox
Expand Down
13 changes: 13 additions & 0 deletions benchmarks/benchmark_serving.py
Original file line number Diff line number Diff line change
Expand Up @@ -251,6 +251,19 @@ def sample_hf_requests(
"url": f"data:image/jpeg;base64,{image_base64}"
},
}
elif "image" in data and isinstance(data["image"], str):
if (data["image"].startswith("http://") or \
data["image"].startswith("file://")):
image_url = data["image"]
else:
image_url = f"file://{data['image']}"

mm_content = {
"type": "image_url",
"image_url": {
"url": image_url
},
}
else:
mm_content = None

Expand Down
Loading
Loading