[pull] main from vllm-project:main#15
Closed
pull[bot] wants to merge 84 commits intoopendatahub-io:main from vllm-project:main
+8,986-2,936
Commits
Commits on Apr 30, 2024
- authored
- authored
- authored
Commits on May 1, 2024
- authored
- authored
- authored
- authored
- authored
- authored
[Bugfix] Fix the fp8 kv_cache check error that occurs when failing to obtain the CUDA version. (#4173)
authored- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on May 2, 2024
- authoredDanny Guinther
[Bug fix][Core] assert num_new_tokens == 1 fails when SamplingParams.n is not 1 and max_tokens is large & Add tests for preemption (#4451)
authored- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on May 3, 2024
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on May 4, 2024
- authored
- authored
[Kernel] Support MoE Fp8 Checkpoints for Mixtral (Static Weights with Dynamic/Static Activations) (#4527)
authored- authored
- authored
Commits on May 5, 2024
Commits on May 7, 2024
- authored
- authored
- authored
- authored
- authored
Commits on May 8, 2024
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on May 9, 2024
- authored
- authored
- authored