GitHub - CASE-Lab-UMD/Router-Tuning-Mixture-of-Depths: The open-source Mixture of Depths code and the official implementation of the paper "Router-Tuning: A Simple and Effective Approach for Enabling Dynamic Depth in Transformers."

Router-Tuning: A Simple and Effective Approach for Enabling Dynamic-Depth in Transformers

Shwai He, Tao Ge, Guoheng Sun, Bowei Tian, Xiaoyang Wang, Ang Li, Dong Yu

TL;DR

The open-source Mixture of Depths code and the official implementation of the paper "Router-Tuning: A Simple and Effective Approach for Enabling Dynamic Depth in Transformers."

Introduction

Traditional transformer models allocate a fixed amount of computational resources to every input token, leading to inefficient and unnecessary computation. To address this inefficiency, Mixture of Depths (MoD) was introduced, dynamically adjusting computational depth by skipping less important layers. While promising, current MoD approaches face two significant challenges:

High Training Costs: Existing methods require training the entire model alongside routers, which determine which layers to skip, resulting in substantial computational overhead.
Risk of Performance Degradation: Bypassing important layers can lead to a drop in model performance.

To overcome these challenges, we introduce Router-Tuning, a method that fine-tunes only the router on a small dataset, drastically reducing the training costs. Additionally, we propose Mindskip (Attention with Dynamic Depths), which preserves model performance while significantly enhancing computational and memory efficiency.

Our approach delivers competitive results, achieving up to 21% speedup with only a 0.2% performance drop, demonstrating its effectiveness in balancing efficiency and performance.

News

Oct 2024: Published preprint on arXiv along with the related codebase.

Quick Start

Installation

conda create -n router-tuning python=3.10
conda activate router-tuning

git clone https://github.com/CASE-Lab-UMD/Router-Tuning

cd ./Router-Tuning
pip install -r requirements.txt

Train

sh /scripts/finetune_mindskip.sh

Evaluation

The evaluation code is based on EleutherAI/lm-evaluation-harness. To fully reproduce our results, please use this version. It samples few-shot based on the index of the samples, avoiding the issue of result variation with the number of processes during data parallel inference.

Citation

@misc{he2024routertuningsimpleeffectiveapproach,
      title={Router-Tuning: A Simple and Effective Approach for Enabling Dynamic-Depth in Transformers}, 
      author={Shwai He and Tao Ge and Guoheng Sun and Bowei Tian and Xiaoyang Wang and Ang Li and Dong Yu},
      year={2024},
      eprint={2410.13184},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2410.13184}, 
}

Contact Us

If you have any questions, please contact:

Shwai He: [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
ckpt		ckpt
configs		configs
data		data
entrypoints		entrypoints
scripts		scripts
utils		utils
.DS_Store		.DS_Store
README.md		README.md
customized_trainer.py		customized_trainer.py
mindskip.svg		mindskip.svg
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Router-Tuning: A Simple and Effective Approach for Enabling Dynamic-Depth in Transformers

TL;DR

Introduction

News

Quick Start

Installation

Train

Evaluation

Citation

Contact Us

About

Releases

Packages

Languages

CASE-Lab-UMD/Router-Tuning-Mixture-of-Depths

Folders and files

Latest commit

History

Repository files navigation

Router-Tuning: A Simple and Effective Approach for Enabling Dynamic-Depth in Transformers

TL;DR

Introduction

News

Quick Start

Installation

Train

Evaluation

Citation

Contact Us

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages