We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug fix for GPU Burn test (Bug Fix - remove cp ptx file command in gpu burn test #567) Support INT8 in cublaslt function (Benchmarks: micro benchmarks - add int8 support for cublaslt function #574) Support cpu-gpu and gpu-cpu in ib-validation (Benchmarks: micro benchmark - Support cpu-gpu and gpu-cpu in ib-validation #581) Support graph mode in NCCL/RCCL benchmarks for latency metrics (Benchmarks: Micro benchmark - Add graph mode in NCCL/RCCL benchmarks for latency metrics #583) Benchmarks: Micro benchmark - Add one-to-all, all-to-one, all-to-all support to gpu_copy_bw_performance (Benchmarks: Micro benchmark - Add one-to-all, all-to-one, all-to-all support to gpu_copy_bw_performance #588) dist-inference cpp (Benchmarks: Microbenchmark - Add distributed inference benchmark cpp implementation #586) add msccl support (Benchmarks: Add MSCCL Support for Nvidia GPU #584) Support in-place for NCCL/RCCL benchmark (Benchmarks: Microbenchmark - Support in-place for NCCL/RCCL benchmark #591)
Change torch.distributed.launch to torchrun (Benchmarks: model benchmarks - change torch.distributed.launch to torchrun #556) Support Megatron-LM/Megatron-Deepspeed GPT pretrain benchmark (Benchmarks: Add benchmark: Megatron-LM/Megatron-Deepspeed GPT pretrain benchmark #582)
Update Docker image for H100 support (Dockerfile - Upgrade Docker image to CUDA 12.2 #577)
Add HPL random generator to gemm-flops with ROCm (Benchmarks: Micro benchmark - add initialization options for rocm gemm flops #578) Update MLC version into 3.10 for CUDA/ROCm dockerfile (Dockerfile - update mlc version into 3.10 for cuda and rocm dockerfiles #562) Add hipBLASLt function benchmark (Benchmarks: Micro benchmark - Add hipBLASLt function benchmark #576) Support cpu-gpu and gpu-cpu in ib-validation (Benchmarks: micro benchmark - Support cpu-gpu and gpu-cpu in ib-validation #581) Support graph mode in NCCL/RCCL benchmarks for latency metrics (Benchmarks: Micro benchmark - Add graph mode in NCCL/RCCL benchmarks for latency metrics #583) Benchmarks: Micro benchmark - Add one-to-all, all-to-one, all-to-all support to gpu_copy_bw_performance (Benchmarks: Micro benchmark - Add one-to-all, all-to-one, all-to-all support to gpu_copy_bw_performance #588) dist-inference cpp (Benchmarks: Microbenchmark - Add distributed inference benchmark cpp implementation #586) Support in-place for NCCL/RCCL benchmark (Benchmarks: Microbenchmark - Support in-place for NCCL/RCCL benchmark #591)
Support Monitoring for AMD GPUs (Monitor - Add support for AMD GPU. #580)
Support baseline generation from multiple nodes (Analyzer - Generate baseline given results from multiple nodes. #575)
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Test Cases
single-node test
A100 and H100 related
MI200 and MI300x
Result analysis
The text was updated successfully, but these errors were encountered: