Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Optimizer Accumulation #3

Merged
merged 5 commits into from
Mar 11, 2024
Merged

Add Optimizer Accumulation #3

merged 5 commits into from
Mar 11, 2024

Conversation

warner-benjamin
Copy link
Owner

Optimizer accumulation allows gradient release to approximate gradient accumulation by accumulating gradients into the optimizer states.

Optimizer accumulation was proposed by Zhang et al in AdamAccumulation to Reduce Memory Footprints of both Activations and Gradients for Large-scale DNN Training.

optimi’s implementation enables AdamAccumulation for all optimi optimizers.

@warner-benjamin warner-benjamin merged commit 0bec5ca into main Mar 11, 2024
4 checks passed
@warner-benjamin warner-benjamin deleted the opt_accum branch March 11, 2024 04:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant