Add Param Cache For Recompilation #2000
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The parameter cache instance is needed to handle recompilation where we need to make sure the parameters we created in the first run are used, currently the use case does not fall into error even without param cache because we directly replace layers in layer cache in recompilation(parameters are replaced automatically because layers are replaced), but there are still some parameters which is not traced within a standalone module(like layernorm weight), it still works fine for now because we directly use the original parameter instead of creating new ones for initialization and weights loading if it is already on the current rank device, however, in cases where we need to support third-party backends like nanotron which has its own implementation of
NanotronParameter
, we do need to track all the newly created parameters so that no new parameter is created in recompilation.