-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ap/fix mistral template #183
Closed
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…of parallel processes
- Introduce TokenInfo class to manage special tokens with add_to_tokenizer flag - Update SpecialTokens dataclass to use TokenInfo for each token - Modify chat templates (ibm_generic_tmpl.py and mistral_tmpl.py) to use new TokenInfo structure - Refactor unmask_message_content function to handle token sequences instead of single tokens - Update get_sp_token to return token lists instead of single integers - Adjust data processing and tokenizer setup to work with new token structure - Improve error handling and add assertions for eos and pad tokens - Enhance chat template for Mistral to support system messages and strict role alternation These changes provide more flexibility in defining and using special tokens, especially for models like Mistral that use multi-token sequences for roles. The new structure allows for better control over which tokens are added to the tokenizer while still being recognized in the input.
@Maxusmusti @JamesKunstle @RobotSail this fixes #182 |
- Update unmask_message_content function for better token handling - Modify print_masked_samples to use a dedicated mask token - Adjust tokenizer setup to use AutoTokenizer and chat template - Update special token handling in setup_tokenizer - Minor code cleanup and optimization
fixes #184 |
@aldo-pareja can you rebase this on current main? (which now includes #169 ) |
Enhance training pipeline and add license headers - Add SPDX-License-Identifier headers to source files - Improve error handling and validation in data processing - Implement checkpoint saving at end of each epoch - Optimize multipack sampler for better GPU utilization - Refactor DeepSpeed configuration for ZeRO Stage 3 - Add support for Mixtral sparse MoE models - Improve logging and error messages throughout - Fix various minor bugs and typos
Closing as these changes are introduced in #213 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
this builds on top of #169, it changes the way we treat system, assistant and user tokens, and assume they are always lists (which could be empty) or single element. We are loosing testing of the alternation between user and assistant. But the jinja template of mistral enforces this, perhaps we could do the same with granite.