Skip to content

Latest commit

 

History

History
14 lines (13 loc) · 1.83 KB

MoE.md

File metadata and controls

14 lines (13 loc) · 1.83 KB

MoE Related 

Title Key Words
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer [ICLR'17] RNN-based 137B model
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding [ICLR'21] First Transformer-MoE model
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity[JMLR'22] Top-1 gating
DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale[ICML'22]
Fastmoe: A fast mixture-of-expert training system
FasterMoE: modeling and optimizing training of large-scale dynamic pre-trained models [PPoPP'22]
SmartMoE: Efficiently Training Sparsely-Activated Models through Combining Offline and Online Parallelization [ATC'23]
Accelerating Distributed MoE Training and Inference with Lina [ATC'23]
Optimizing Dynamic Neural Networks with Brainstorm [OSDI'23]