Skip to content

JacoCheung/Awesome-GPU

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 

Repository files navigation

Awesome-GPU

Resources Management

Papers

  1. ASPLOS'17-Locality-Aware CTA Clustering for Modern GPUs
  2. ASPLOS'17-Dynamic Resource Management for Efficient Utilization of Multitasking GPUs
  3. HPCA'17-Dynamic GPGPU Power Management Using Adaptive Model Predictive Control
  4. ISCA'16-Transparent Offloading and Mapping (TOM): Enabling Programmer-Transparent Near-Data Processing in GPU Systems

Parallelism

Papers

  1. HPCA'17-Controlled Kernel Launch for Dynamic Parallelism in GPUs
  2. ISCA'16-LaPerm: Locality Aware Scheduler for Dynamic Parallelism on GPUs
  3. ISCA'16-Virtual Thread Maximizing Thread-Level Parallelism beyond GPU Scheduling Limit
  4. Berkeley TechRpts'16-Understanding Latency Hiding on GPUs

Slides

  1. GTC'17-COOPERATIVE GROUPS

Cache

Papers

  1. ISCA'16-APRES: Improving Cache Efficiency by Exploiting Load Characteristics on GPUs
  2. SC'15-Adaptive and Transparent Cache Bypassing for GPUs

Algorithm

Papers

  1. HPCA'17-Towards Pervasive and User Satisfactory CNN across GPU Microarchitectures
  2. ASPLOS'14-Paraprox: Pattern-Based Approximation for Data Parallel Applications

Slides

  1. GTC'18-CUTLASS: CUDA TEMPLATE LIBRARY FOR DENSE LINEAR ALGEBRA AT ALL LEVELS AND SCALES

Software

  1. CUTLASS

Performance Analysis

Papers

  1. GTC'18-Dissecting the NVIDIA Volta GPU Architecture via Microbenchmarking
  2. PLDI'18-GPU Code Optimization using Abstract Kernel Emulation and Sensitivity Analysis
  3. CGO'18-CUDAAdvisor: LLVM-based runtime profiling for modern GPUs
  4. CCGRID'18-Exposing Hidden Performance Opportunities in High Performance GPU Applications
  5. Euro-Par'15-Identifying Optimization Opportunities Within Kernel Execution in GPU Codes
  6. SC'13-Effective sampling-driven performance tools for GPU-accelerated supercomputers
  7. ISPASS'12-Lynx: A dynamic instrumentation system for data-parallel applications on GPGPU architectures
  8. ICPP'11-Parallel Performance Measurement of Heterogeneous Parallel Systems with GPUs
  9. ISPASS'10-Demystifying GPU Microarchitecture through Microbenchmarking
  10. ISPASS'10-Visualizing Complex Dynamics in Many-Core Accelerator Architectures
  11. ISPASS'09-Analyzing CUDA Workloads Using a Detailed GPU Simulator

Books

  1. Performance Analysis and Tuning for General Purpose Graphics Processing Units (GPGPU)
  2. Monitoring Heterogeneous Applications with the OpenMP Tools Interface

Slides

  1. ECP'19-Performance Tuning of Scientific Codes with the Roofline Model
  2. GTC'18-VOLTA Architecture and performance optimization
  3. SC'10-Fundamental_Optimizations

Software

  1. Vampir|Score-P
  2. TAU
  3. PAPI
  4. Allinea MAP
  5. Open|SpeedShop
  6. HPCToolkit
  7. NVIDIA Nsight Systems
  8. NVIDIA Nsight Compute

Compiler

  1. LLVM'17-Implementing implicit OpenMP data sharing on GPUs
  2. CGO'16-gpucc: An Open-Source GPGPU Compiler
  3. LLVM'16-Offloading Support for OpenMP in Clang and LLVM
  4. PMBS'15-Performance Analysis of OpenMP on a GPU using a CORAL Proxy Application
  5. LLVM'15-Integrating GPU Support for OpenMP Offloading Directives into Clang
  6. LLVM'14-Coordinating GPU Threads for OpenMP 4.0 in LLVM

GPU Binaries

Papers

  1. CGO'19-Decoding CUDA binary
  2. ISCA'15-Flexible software profiling of GPU architectures

Slides

  1. SASSI

Documentations

White Papers

  1. Ampere-NVIDIA A100 Tensor Core GPU Architecture
  2. Turing-NVIDIA TURING GPU ARCHITECTURE
  3. Volta-NVIDIA TESLA V100
  4. Pascal-NVIDIA TESLA P100
  5. Kepler-NVIDIA’s Next Generation CUDA Compute Architecture: Kepler
  6. Fermi-NVIDIA’s Next Generation CUDA Compute Architecture: Fermi

APIs

  1. CUDA Toolkit Documentation-CUDA Toolkit Documentation

GTC

  1. GTC-GPU Technology Conference

About

Awesome resources for GPUs

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published