NKI-based flash-attention kernel with paged KV cache

Co-authored-by: Jiangfei Duan <[email protected]> Signed-off-by: Liangfu Chen <[email protected]>
vllm-project · Jan 9, 2025 · c2af356 · c2af356
1 parent a732900
commit c2af356
Show file tree

Hide file tree

Showing 3 changed files with 1,126 additions and 1 deletion.
diff --git a/.buildkite/run-neuron-test.sh b/.buildkite/run-neuron-test.sh
@@ -51,4 +51,4 @@ docker run --rm -it --device=/dev/neuron0 --device=/dev/neuron1 --network host \
        -e "NEURON_COMPILE_CACHE_URL=${NEURON_COMPILE_CACHE_MOUNT}" \
        --name "${container_name}" \
        ${image_name} \
-       /bin/bash -c "python3 /workspace/vllm/examples/offline_inference/offline_inference_neuron.py"
+       /bin/bash -c "python3 /workspace/vllm/examples/offline_inference/offline_inference_neuron.py && python3 -m pytest /workspace/vllm/tests/neuron/ -v --capture=tee-sys"