Ap/fix mistral template #183

aldopareja · 2024-08-27T07:03:12Z

this builds on top of #169, it changes the way we treat system, assistant and user tokens, and assume they are always lists (which could be empty) or single element. We are loosing testing of the alternation between user and assistant. But the jinja template of mistral enforces this, perhaps we could do the same with granite.

…of parallel processes

…atasets

- Introduce TokenInfo class to manage special tokens with add_to_tokenizer flag - Update SpecialTokens dataclass to use TokenInfo for each token - Modify chat templates (ibm_generic_tmpl.py and mistral_tmpl.py) to use new TokenInfo structure - Refactor unmask_message_content function to handle token sequences instead of single tokens - Update get_sp_token to return token lists instead of single integers - Adjust data processing and tokenizer setup to work with new token structure - Improve error handling and add assertions for eos and pad tokens - Enhance chat template for Mistral to support system messages and strict role alternation These changes provide more flexibility in defining and using special tokens, especially for models like Mistral that use multi-token sequences for roles. The new structure allows for better control over which tokens are added to the tokenizer while still being recognized in the input.

aldopareja · 2024-08-27T07:03:37Z

@Maxusmusti @JamesKunstle @RobotSail this fixes #182

- Update unmask_message_content function for better token handling - Modify print_masked_samples to use a dedicated mask token - Adjust tokenizer setup to use AutoTokenizer and chat template - Update special token handling in setup_tokenizer - Minor code cleanup and optimization

aldopareja · 2024-08-27T08:21:30Z

fixes #184

Maxusmusti · 2024-08-29T15:22:04Z

@aldo-pareja can you rebase this on current main? (which now includes #169 )

Enhance training pipeline and add license headers - Add SPDX-License-Identifier headers to source files - Improve error handling and validation in data processing - Implement checkpoint saving at end of each epoch - Optimize multipack sampler for better GPU utilization - Refactor DeepSpeed configuration for ZeRO Stage 3 - Add support for Mixtral sparse MoE models - Improve logging and error messages throughout - Fix various minor bugs and typos

Maxusmusti · 2024-09-24T22:46:42Z

Closing as these changes are introduced in #213

aldo-pareja added 7 commits August 14, 2024 11:23

more timeout

6d2f446

increased performance of data process and made a flag for the number …

8602d22

…of parallel processes

fixed multipack sampler and made more efficient parallel calls over d…

804512d

…atasets

fixed linting

3ea8438

fixed the cosine loss

b13fe5c

printing data arguments so input data is traceable

80f3cc6

aldopareja requested a review from Maxusmusti August 27, 2024 07:03

added MixtralForCausalLM as another supported model

246527c

mergify bot added the ci-failure label Aug 29, 2024

Maxusmusti closed this Sep 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ap/fix mistral template #183

Ap/fix mistral template #183

aldopareja commented Aug 27, 2024

aldopareja commented Aug 27, 2024

aldopareja commented Aug 27, 2024

Maxusmusti commented Aug 29, 2024

Maxusmusti commented Sep 24, 2024

Ap/fix mistral template #183

Ap/fix mistral template #183

Conversation

aldopareja commented Aug 27, 2024

aldopareja commented Aug 27, 2024

aldopareja commented Aug 27, 2024

Maxusmusti commented Aug 29, 2024

Maxusmusti commented Sep 24, 2024