Skip to content
This repository has been archived by the owner on Aug 7, 2024. It is now read-only.

Checkpoint to reduce fp8_weight tensor saved for backwards #193

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

drisspg
Copy link
Contributor

@drisspg drisspg commented Jan 23, 2024

Summary

Take two of doing this: #186
Issue: #185

I have verified that the the transpose_cast in backwards is fusable with inductor, now we just gotta coax it into doing what we want..

cc @bdhirsh

Eager Logging mode for this PR, using this gist:

https://gist.github.com/drisspg/1b334d851ae0caea22eebb07d3455239

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 23, 2024
@drisspg
Copy link
Contributor Author

drisspg commented Feb 22, 2024

I figure this is worth landing over the other since we anticipate that this will be supported by min-cut-partioner @bdhirsh

I need to add the same for delayed scaling and ensure that that does indeed checkpoint correctly

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants