(DO NOT MERGE) 6587 fix#101

Closed

prashantgupta24 wants to merge 50 commits intomainfrom 6587-fix

+5,323-2,393

Commits on Jul 19, 2024

[Docs] Add Google Cloud to sponsor list (vllm-project#6450 )

WoosukKwon
authored and
fialhocoelho
committed
[Misc] Add CustomOp Interface to UnquantizedFusedMoEMethod (vllm-project#6289 )

WoosukKwon
authored and
fialhocoelho
committed
[CI/Build][TPU] Add TPU CI test (vllm-project#6277 )

authored and
fialhocoelho
committed
Pin sphinx-argparse version (vllm-project#6453 )

khluu
authored and
fialhocoelho
committed
[BugFix][Model] Jamba - Handle aborted requests, Add tests and fix cleanup bug (vllm-project#6425 )

authored and
fialhocoelho
committed
[Bugfix][CI/Build] Test prompt adapters in openai entrypoint tests (vllm-project#6419 )

g-eoj
authored and
fialhocoelho
committed
[Docs] Announce 5th meetup (vllm-project#6458 )

WoosukKwon
authored and
fialhocoelho
committed
[CI/Build] vLLM cache directory for images (vllm-project#6444 )

DarkLight1337
authored and
fialhocoelho
committed
[Frontend] Support for chat completions input in the tokenize endpoint (vllm-project#5923 )

sasha0552
authored and
fialhocoelho
committed
[Misc] Fix typos in spec. decode metrics logging. (vllm-project#6470 )

tdoublep
authored and
fialhocoelho
committed
[Core] Use numpy to speed up padded token processing (vllm-project#6442 )

peng1999
authored and
fialhocoelho
committed
[CI/Build] Remove "boardwalk" image asset (vllm-project#6460 )

DarkLight1337
authored and
fialhocoelho
committed
[doc][misc] remind to cancel debugging environment variables (vllm-project#6481 )

youkaichao
authored and
fialhocoelho
committed
[Hardware][TPU] Support MoE with Pallas GMM kernel (vllm-project#6457 )

WoosukKwon
authored and
fialhocoelho
committed
[Doc] Fix the lora adapter path in server startup script (vllm-project#6230 )

Jeffwan
authored and
fialhocoelho
committed
[Misc] Log spec decode metrics (vllm-project#6454 )

comaniac
authored and
fialhocoelho
committed
[Kernel][Attention] Separate Attention.kv_scale into k_scale and v_scale (vllm-project#6081 )

mgoin
authored and
fialhocoelho
committed
[ci][distributed] add pipeline parallel correctness test (vllm-project#6410 )

youkaichao
authored and
fialhocoelho
committed
[misc][distributed] improve tests (vllm-project#6488 )

youkaichao
authored and
fialhocoelho
committed
[misc][distributed] add seed to dummy weights (vllm-project#6491 )

youkaichao
authored and
fialhocoelho
committed
[Distributed][PP] only create embedding & lm head when necessary (vllm-project#6455 )

wushidonguc
authored and
fialhocoelho
committed
[ROCm] Cleanup Dockerfile and remove outdated patch (vllm-project#6482 )

hongxiayang
authored and
fialhocoelho
committed
[Misc][Speculative decoding] Typos and typing fixes (vllm-project#6467 )

ShangmingCai
authored and
fialhocoelho
committed
[Doc][CI/Build] Update docs and tests to use vllm serve (vllm-project#6431 )

DarkLight1337
authored and
fialhocoelho
committed
[Bugfix] Fix for multinode crash on 4 PP (vllm-project#6495 )

andoorve
authored and
fialhocoelho
committed
[TPU] Remove multi-modal args in TPU backend (vllm-project#6504 )

WoosukKwon
authored and
fialhocoelho
committed
[Misc] Use torch.Tensor for type annotation (vllm-project#6505 )

WoosukKwon
authored and
fialhocoelho
committed
[Core] Refactor _prepare_model_input_tensors - take 2 (vllm-project#6164 )

comaniac
authored and
fialhocoelho
committed
[DOC] - Add docker image to Cerebrium Integration (vllm-project#6510 )

milo157
authored and
fialhocoelho
committed
[Bugfix] Fix Ray Metrics API usage (vllm-project#6354 )

Yard1
authored and
fialhocoelho
committed
[Core] draft_model_runner: Implement prepare_inputs on GPU for advance_step (vllm-project#6338 )

alexm-neuralmagic
authored and
fialhocoelho
committed
[ Kernel ] FP8 Dynamic-Per-Token Quant Kernel (vllm-project#6511 )

authored and
fialhocoelho
committed
[Model] Pipeline parallel support for Mixtral (vllm-project#6516 )

comaniac
authored and
fialhocoelho
committed
[ Kernel ] Fp8 Channelwise Weight Support (vllm-project#6487 )

robertgshaw2-neuralmagic
authored and
fialhocoelho
committed
[core][model] yet another cpu offload implementation (vllm-project#6496 )

authored and
fialhocoelho
committed
[BugFix] Avoid secondary error in ShmRingBuffer destructor (vllm-project#6530 )

njhill
authored and
fialhocoelho
committed
[Core] Introduce SPMD worker execution using Ray accelerated DAG (vllm-project#6032 )

authored and
fialhocoelho
committed
[Misc] Minor patch for draft model runner (vllm-project#6523 )

comaniac
authored and
fialhocoelho
committed
[BugFix][Frontend] Use LoRA tokenizer in OpenAI APIs (vllm-project#6227 )

authored and
fialhocoelho
committed
[Bugfix] Update flashinfer.py with PagedAttention forwards - Fixes Gemma2 OpenAI Server Crash (vllm-project#6501 )

noamgat
authored and
fialhocoelho
committed
[TPU] Refactor TPU worker & model runner (vllm-project#6506 )

WoosukKwon
authored and
fialhocoelho
committed
[ Misc ] Improve Min Capability Checking in compressed-tensors (vllm-project#6522 )

robertgshaw2-neuralmagic
authored and
fialhocoelho
committed
[ci] Reword Github bot comment (vllm-project#6534 )

khluu
authored and
fialhocoelho
committed
Squash 6034
fialhocoelho
committed
Squash 6357
fialhocoelho
committed
Merge squash from allowed_token_ids branch
fialhocoelho
committed
Squash 6587
prashantgupta24
committed

Commits on Jul 20, 2024

Update Dockerfile.ubi to install vllm-tgis-adapter from fix branch
njhill
committed

Commits on Jul 22, 2024

Update Dockerfile.ubu to install vllm-tgis-adapter from main branch
njhill
committed

Commits on Jul 23, 2024

Add --system-site-packages to virtual env in image
njhill
committed