Sync with [email protected]#103
Merged
dtrifiro merged 99 commits intoopendatahub-io:mainfrom dtrifiro:sync-with-upstreamJul 23, 2024
+11,844-3,885
Commits
Commits on Jul 16, 2024
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Jul 17, 2024
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Jul 18, 2024
- authored
- authored
- authored
- authored
[Bugfix] Update flashinfer.py with PagedAttention forwards - Fixes Gemma2 OpenAI Server Crash (vllm-project#6501)
authored- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Jul 19, 2024
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Jul 20, 2024
- authored
- authored
- authored
- authored
- authored
[Bugfix][CI/Build][Hardware][AMD] Fix AMD tests, add HF cache, update CK FA, add partially supported model notes (vllm-project#6543)
authored- authored
- authored
- authored
Commits on Jul 21, 2024
[Spec Decode] Disable Log Prob serialization to CPU for spec decoding for both draft and target models. (vllm-project#6485)
authored- authored
- authored
- authored
Commits on Jul 22, 2024
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Jul 23, 2024
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- committed