Date | Paper Title | Presenter | Notes |
---|---|---|---|
07.14 | AKG: automatic kernel generation for neural processing units using polyhedral transformations (PLDI 2021) | Yuxian Qiu | Slides |
07.21 | Floating-Point Format and Quantization for Deep Learning Computation | Cong Guo | |
07.28 | P-OPT: Practical Optimal Cache Replacement for Graph Analytics | Yangjie Zhou | Slides |
08.04 | Rubik: A Hierarchical Architecture for Efficient Graph Neural Network Training | Zhihui Zhang | |
08.11 | A Useful Tool CKA: Similarity of Neural Network Representations Revisited and It's application: Uncovering How Neural Network Representations Vary with Width and Depth | Zhengyi Li | Slides |
08.18 | Ansor: Generating High-Performance Tensor Programs for Deep Learning | Zihan Liu | Slides |
Date | Paper Title | Presenter | Notes |
---|---|---|---|
10.11 | Adaptive numeric type for DNN quantization | Cong Guo | |
10.18 | Compiling Graph Applications for GPUs with GraphIt | Yangjie Zhou | Slides |
11.01 | TENET: A Framework for Modeling Tensor Dataflow Based on Relation-centric Notation | Zihan Liu | Slides |
11.08 | Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity | Zhengyi Li | Slides (code: zdea) |
11.22 | Dynamic Tensor Rematerialization Checkmate: Breaking The Memory Wall with Optimal Tensor Rematerialization |
Yue Guan | Slides Slides |
11.29 | GraphPulse: An Event-Driven Hardware Accelerator for Asynchronous Graph Processing | Zhihui Zhang | Presentation |
12.06 | CheckFreq: Frequent, Fine-Grained DNN Checkpointing | Guandong Lu | Slides |
12.13 | PipeDream: generalized pipeline parallelism for DNN training | Runzhe Chen | Slides |
12.20 | Towards Scalable Distributed Training of Deep Learning on Public Cloud Clusters | Yakai Wang | Slides |
Date | Paper Title | Presenter | Notes |
---|---|---|---|
3.10 | Speculation Attack: Meltdown, Spectre, Pinned-Loads | Zihan Liu | Slides |
3.24 | SparTA: Deep-Learning Model Sparsity via Tensor-with-Sparsity-Attribute | Yue Guan | |
3.31 | ROLLER: Fast and Efficient Tensor Compilation for Deep Learning | Yijia Diao | Link |
4.07 | Adaptable Register File Organization for Vector Processors | Zhihui Zhang | |
4.14 | CORTEX: A COMPILER FOR RECURSIVE DEEP LEARNING MODELS | Yangjie Zhou | Slides |
4.21 | Zero-Knowledge Succinct Non-Interactive Argument of Knowledge | Shuwen Lu | Slides |
5.05 | Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning | Runzhe Chen | Slides |
Date | Paper Title | Presenter | Notes |
---|---|---|---|
9.20 | ANT: Exploiting Adaptive Numerical Data Type for Low-bit Deep Neural Network Quantization | Cong Guo | Slides |
9.27 | X-cache: a modular architecture for domain-specific caches | Zihan Liu | Slides |
10.18 | Automatically Discovering ML Optimizations | Yangjie Zhou | Slides |
11.8 | Privacy Preserving Machine Learning--inference | Zhengyi Li | Slides |
11.15 | Dynamic Tensor Compilers | Yijia Diao | Slides |
Date | Paper Title | Presenter | Notes |
---|---|---|---|
03.14 | LLM Attack and Defense | Zhengyi Li | Slides |
03.21 | Transparent GPU Sharing in Container Clouds for Deep Learning Workloads | Yijia Diao | Link |
03.28 | DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving | Shuwen Lu | Slide |
05.09 | 8-bit Transformer Inference and Fine-tuning for Edge Accelerators | Weiming Hu | Slide |
Date | Paper Title | Presenter | Notes |
---|---|---|---|
07.26 | TEE-SGX Introduction | Zhengyi Li | Slides |
8.15 | Accelerating mixture of experts model Inference | Shuwen Lu | Slides |
09.05 | TCP: A Tensor Contraction Processor for AI Workloads | Weiming Hu | Slides |
11.15 | Opensora architecture and its computational reuse | Haosong Liu | Slides |
11.22 | LLM Quantization | Wenxuan Miao | Slides |
List Contributed by Jingwen Leng