Skip to content

Commit

Permalink
NKI-based flash-attention kernel with paged KV cache
Browse files Browse the repository at this point in the history
Co-authored-by: Jiangfei Duan <[email protected]>
Signed-off-by: Liangfu Chen <[email protected]>
  • Loading branch information
liangfu and JF-D committed Jan 9, 2025
1 parent a732900 commit c2af356
Show file tree
Hide file tree
Showing 3 changed files with 1,126 additions and 1 deletion.
2 changes: 1 addition & 1 deletion .buildkite/run-neuron-test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -51,4 +51,4 @@ docker run --rm -it --device=/dev/neuron0 --device=/dev/neuron1 --network host \
-e "NEURON_COMPILE_CACHE_URL=${NEURON_COMPILE_CACHE_MOUNT}" \
--name "${container_name}" \
${image_name} \
/bin/bash -c "python3 /workspace/vllm/examples/offline_inference/offline_inference_neuron.py"
/bin/bash -c "python3 /workspace/vllm/examples/offline_inference/offline_inference_neuron.py && python3 -m pytest /workspace/vllm/tests/neuron/ -v --capture=tee-sys"
Loading

0 comments on commit c2af356

Please sign in to comment.