forked from instructlab/training
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Implementing HF Padding-Free and GraniteLM Support (instructlab#257)
Updating the data collator for models with HF padding-free support, adding support for upcoming Granite HF model class, and updating flags/interface accordingly. ------------------------------------------------ * only compute lengths in the token dataset when it's not already present in the dataset Signed-off-by: aldo pareja-cardona <[email protected]> * Refactor padding function to support position_ids for FlashAttention - Added `supports_flash_attention` function to check GPU compatibility for FlashAttention. - Updated `make_collate_fn` to return `position_ids` instead of `attention_mask` when FlashAttention is supported. - Integrated the new padding logic into `setup_dataloader` to ensure compatibility with both Granite and non-Granite configurations. - Ensured backward compatibility by maintaining the original padding logic for GPUs that do not support FlashAttention. - Updated `main_ds.py` to use the new `supports_flash_attention` check for determining padding strategy. Signed-off-by: aldo pareja-cardona <[email protected]> * logging the global gradnorm now Signed-off-by: aldo pareja-cardona <[email protected]> * fixing deepspeed because it's not working with the scheduler we want Signed-off-by: aldo pareja-cardona <[email protected]> * fixing accelerate lr_scheduler Signed-off-by: aldo pareja-cardona <[email protected]> * fixing accelerate lr_scheduler Signed-off-by: aldo pareja-cardona <[email protected]> * samples seen was broken because now the samples are a single line Signed-off-by: aldo pareja-cardona <[email protected]> * find packing is wrong because when flash attention is supported padding should not be used when building the buckets Signed-off-by: aldo pareja-cardona <[email protected]> * black formatting Signed-off-by: aldo pareja-cardona <[email protected]> * it should not fail on granite 8b models anymore Signed-off-by: aldo pareja-cardona <[email protected]> * linting Signed-off-by: aldo pareja-cardona <[email protected]> * linting Signed-off-by: aldo pareja-cardona <[email protected]> * bug on padding when creating the multipack sampler Signed-off-by: aldo pareja-cardona <[email protected]> * linter Signed-off-by: aldo pareja-cardona <[email protected]> * linter Signed-off-by: aldo pareja-cardona <[email protected]> * Change old padding-free and granite flags to use_dolomite Signed-off-by: Mustafa Eyceoz <[email protected]> * Add safeguards and checks for flash attention when enabled/disabled Signed-off-by: Mustafa Eyceoz <[email protected]> * Rework flash attention checks for better modularity Signed-off-by: Mustafa Eyceoz <[email protected]> * Fix arg name Signed-off-by: Mustafa Eyceoz <[email protected]> * Update transformers to a version with Granite model class Signed-off-by: Mustafa Eyceoz <[email protected]> * Adding stateguards for dolomite and granite and model path check Signed-off-by: Mustafa Eyceoz <[email protected]> * Missing update Signed-off-by: Mustafa Eyceoz <[email protected]> * Clean up early validation checks and move to utils Signed-off-by: Mustafa Eyceoz <[email protected]> * Fix spelling mistake Signed-off-by: Mustafa Eyceoz <[email protected]> * Include AMD in flash attn check Signed-off-by: Mustafa Eyceoz <[email protected]> * Red-add is_padding_free with deprecation warning Signed-off-by: Mustafa Eyceoz <[email protected]> * Make use_dolomite default false Signed-off-by: Mustafa Eyceoz <[email protected]> * this is needed because the tag <MASK> is too common and some datasets will fail Signed-off-by: Mustafa Eyceoz <[email protected]> * added a warning in case the special tokens used for data processing are present in the dataset Signed-off-by: Mustafa Eyceoz <[email protected]> * added a warning in case the special tokens used for data processing are present in the dataset Signed-off-by: Mustafa Eyceoz <[email protected]> * Update valid data filter Signed-off-by: Mustafa Eyceoz <[email protected]> * Fix ruff formatting Signed-off-by: Mustafa Eyceoz <[email protected]> * Apply review feedback Signed-off-by: Mustafa Eyceoz <[email protected]> * Added comments Signed-off-by: Mustafa Eyceoz <[email protected]> --------- Signed-off-by: aldo pareja-cardona <[email protected]> Signed-off-by: Mustafa Eyceoz <[email protected]> Co-authored-by: aldo pareja-cardona <[email protected]> Co-authored-by: Mustafa Eyceoz <[email protected]>
- Loading branch information
1 parent
ed8d6e2
commit 03d1b62
Showing
6 changed files
with
237 additions
and
107 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.