[pull] main from vllm-project:main#11
Merged
dtrifiro merged 65 commits intoopendatahub-io:main from vllm-project:mainMay 7, 2024
+7,912-2,484
Commits
Commits on Apr 30, 2024
- authored
- authored
- authored
Commits on May 1, 2024
- authored
- authored
- authored
- authored
- authored
- authored
[Bugfix] Fix the fp8 kv_cache check error that occurs when failing to obtain the CUDA version. (#4173)
authored- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on May 2, 2024
- authoredDanny Guinther
[Bug fix][Core] assert num_new_tokens == 1 fails when SamplingParams.n is not 1 and max_tokens is large & Add tests for preemption (#4451)
authored- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on May 3, 2024
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on May 4, 2024
- authored
- authored
[Kernel] Support MoE Fp8 Checkpoints for Mixtral (Static Weights with Dynamic/Static Activations) (#4527)
authored- authored
- authored