sfc-gh-yewang

Follow

Ye Wang sfc-gh-yewang

Follow

Achievements

Achievements

Popular repositories Loading

flash-attention flash-attention Public

Forked from vllm-project/flash-attention

Fast and memory-efficient exact attention

C++
cutlass cutlass Public

Forked from NVIDIA/cutlass

CUDA Templates for Linear Algebra Subroutines

C++
onnxruntime onnxruntime Public

Forked from microsoft/onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

C++
TensorRT-LLM TensorRT-LLM Public

Forked from NVIDIA/TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++
vllm vllm Public

Forked from vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python
DeepSpeed DeepSpeed Public

Forked from microsoft/DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Python