Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return assistant generated tokens mask in apply_chat_template #30650

Merged

Conversation

yonigottesman
Copy link
Contributor

What does this PR do?

This PR addresses issue #28950 and enhances the functionality of the tokenizer.apply_chat_template method when finetuning on chat datasets.

The method tokenizer.apply_chat_template is recommended for maintaining consistency with the model's original template during both training and inference phases. This practice ensures that conversations are processed in a uniform manner.

Moreover, during the finetuning process on chat datasets, it is crucial to exclude tokens from the "user" or "system" segments of the conversation. This exclusion is necessary because including these tokens would train the model to predict not only the "assistant" responses but also potential user queries, which is undesirable (and strange).

Currently, the tokenizer.apply_chat_template method does not provide a way to identify which tokens belong to the "assistant" response. To address this, the PR introduces a new parameter called return_assistant_mask. This parameter returns a mask that identifies tokens generated by the assistant, allowing for the appropriate creation of a labels arrays with ignore (-100) values during training.

Additionally, this PR proposes the introduction of a new keyword generation (name open for discussion) in the jinja2 chat template. This keyword is used to encapsulate the assistant’s response within your chat template.

Here is an example of the new api:

template = (
    "{% for message in messages %}"
    "{% if (message['role'] != 'assistant') %}"
    "{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}"
    "{% elif (message['role'] == 'assistant')%}"
    "{{'<|im_start|>' + message['role'] + '\n'}}"
    "{% generation %}"
    "{{message['content'] + '<|im_end|>'}}"
    "{% endgeneration %}"
    "{{'\n'}}"
    "{% endif %}"
    "{% endfor %}"
)
dummy_conversation = [
      {"role": "system", "content": "system message"},
      {"role": "user", "content": "user message"},
      {"role": "assistant", "content": "assistant\nmessage"},
      {"role": "user", "content": "user message 2"},
      {"role": "assistant", "content": "assistant message 2"},
]

output = tokenizer_r.apply_chat_template(
    dummy_conversations,
    chat_template=dummy_template,
    tokenize=True,
    return_assistant_mask=True,
    return_dict=True,
  )

labels = [output["input_ids"][index] if mask == 1 else -100 for index, mask in enumerate(output["assistant_mask"])]

There are some issues I would want to discuss during this pr:

  • Is this API fine? maybe we should return the a labels key in the dict already and not bother with the intermediate mask.
  • Name of the new tag? currently generation but maybe should be assistant_response? or anything you like.
  • I think maybe I should add a warning if a user runs with return_assistant_mask but the tokenizer chat template hasn't changed yet to support this new tag. That way users will know the are probably training on wrong tokens.
  • In 99% of finetuning examples I see people using the trl trainer with packing=True. My new changes wont be usable easily if people use that parameter and maybe we should think of my API while taking into consideration a refactor of the packing affect.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@yonigottesman yonigottesman marked this pull request as draft May 4, 2024 09:53
@yonigottesman yonigottesman marked this pull request as ready for review May 4, 2024 09:53
@yonigottesman yonigottesman force-pushed the apply-chat-template-assistant-mask branch 3 times, most recently from c14a7f6 to 4f77aca Compare May 5, 2024 08:44
@LysandreJik LysandreJik requested a review from Rocketknight1 May 6, 2024 09:40
@Rocketknight1
Copy link
Member

cc @lewtun and @xenova to this as well!

@Rocketknight1
Copy link
Member

My thoughts on your questions:

Is this API fine? maybe we should return the a labels key in the dict already and not bother with the intermediate mask.

I prefer just returning the labels with masking applied, rather than returning the mask for the user to apply.

Name of the new tag? currently generation but maybe should be assistant_response? or anything you like.

I think generation is fine - assistant_response is very long!

I think maybe I should add a warning if a user runs with return_assistant_mask but the tokenizer chat template hasn't changed yet to support this new tag. That way users will know the are probably training on wrong tokens.

Agreed! I guess the easiest way to check this is to just do a string search for {% generation %} tags? Be careful, because you'll also need to check for variants like {-

n 99% of finetuning examples I see people using the trl trainer with packing=True. My new changes wont be usable easily if people use that parameter and maybe we should think of my API while taking into consideration a refactor of the packing affect.

Yes, there's already a DataCollatorForCompletionOnlyLM which also requires packing=False. I feel like we can slot in with that easily enough!

I want to hear from @xenova and ideally someone using minijinja as well, though - how easily can we support this extension? Since it's only useful in training, maybe it's less critical to have it in huggingface/jinja or TGI, but at the very least we should be able to gracefully ignore the generation tags.

@yonigottesman
Copy link
Contributor Author

I prefer just returning the labels with masking applied, rather than returning the mask for the user to apply.

I agree, but then what should be the ignore label? -100 (pytorch)?. Im not sure its a good idea to add another parameter ignore_label

@Rocketknight1
Copy link
Member

I think -100 is correct, yes! This is the standard value for Torch and Transformers, so we don't need an extra arg to change it.

@yonigottesman
Copy link
Contributor Author

yea i just thought of non pytorch users where -100 is not the default.
Anyways I updated the code to return labels

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@psinger
Copy link

psinger commented Jun 3, 2024

@yonigottesman Thanks for working on this, this is a feature I am very much looking forward to. Hope this can be merged soon.

@Rocketknight1
Copy link
Member

Yes, sorry for not checking in @yonigottesman! Do you have other features you want to add, or should we treat this as review-ready?

@yonigottesman
Copy link
Contributor Author

@Rocketknight1 this is ready to be reviewed yes :)

@Rocketknight1
Copy link
Member

On it!

@Rocketknight1
Copy link
Member

Rocketknight1 commented Jun 7, 2024

@yonigottesman while I'm reviewing can you rebase/resolve the merge conflict? It's nothing major, but it'll block us merging the PR until it's ready. (Edit: Probably better to rebase because your branch is a little out of date by now, a rebase will catch any other issues before merging)

Copy link
Member

@Rocketknight1 Rocketknight1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just did a review! Here's my comments:

  • Overall looks good, and it's a really clever solution
  • Performance is very good, the test runs in milliseconds on my machine
  • Should we add support for this to one or more of the DataCollator classes?
  • Some changes should be reverted, see the specific code comments

Finally, I'm not sure if we should be returning labels or an assistant_mask from apply_chat_template(). I think it makes sense to return masked labels from a data collator that supports this, but not from apply_chat_template() itself, because it's kind of weird for apply_chat_template() to be handling labels at all! I think it might be better if apply_chat_template() just returns a simpler mask, and then the data collators use that to do masking.

src/transformers/tokenization_utils_base.py Outdated Show resolved Hide resolved
src/transformers/tokenization_utils_base.py Outdated Show resolved Hide resolved
src/transformers/tokenization_utils_base.py Outdated Show resolved Hide resolved
@yonigottesman yonigottesman force-pushed the apply-chat-template-assistant-mask branch from 494c95e to d698ec7 Compare June 10, 2024 09:06
@yonigottesman
Copy link
Contributor Author

I agree it should be assistant_mask and not labels. I feel like the collator should be added here and not trl what do you think?

@Rocketknight1
Copy link
Member

Yes, agree! It's also fine to leave that for a separate PR, and just add the mask functionality in this PR.

@yonigottesman
Copy link
Contributor Author

ok. fixed to now return mask

@yonigottesman yonigottesman force-pushed the apply-chat-template-assistant-mask branch 2 times, most recently from a454b39 to 7bd0140 Compare June 14, 2024 06:04
@Rocketknight1
Copy link
Member

Got it! Ping me whenever you're ready for re-review.

@yonigottesman
Copy link
Contributor Author

ready 😀

Copy link
Member

@Rocketknight1 Rocketknight1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good now! I made a couple of small suggestions, but I think we're ready for core maintainer review now, cc @amyeroberts. There are also some failing tests, but these are unrelated, and should be fixed if you rebase.

Also, to make Amy's job easier, a quick explanation: This PR allows chat templates to mark assistant generations in the template. Very often, training pipelines only want to train on those tokens, and not compute loss on other tokens (e.g. control tokens, user messages, system messages).

The way it works is by adding a small Jinja extension to support {% generation %} blocks, and then combining the string offsets from these blocks with the string offsets from tokenization to create a mask array, which is included as one of the tokenization outputs alongside input_ids and attention_mask.

src/transformers/tokenization_utils_base.py Outdated Show resolved Hide resolved
src/transformers/tokenization_utils_base.py Outdated Show resolved Hide resolved
@amyeroberts
Copy link
Collaborator

@yonigottesman There's been a update on main which should fix the hub tests. Could you try rebasing, this should hopefully resolve

@yonigottesman yonigottesman force-pushed the apply-chat-template-assistant-mask branch from 37d8630 to f17588b Compare July 22, 2024 17:10
@amyeroberts amyeroberts merged commit 74d0eb3 into huggingface:main Jul 22, 2024
20 checks passed
@yonigottesman yonigottesman deleted the apply-chat-template-assistant-mask branch July 22, 2024 17:25
return self._rendered_blocks or self._generation_indices

@contextmanager
def activate_tracker(self, rendered_blocks: list[int], generation_indices: list[int]):
Copy link
Contributor

@harupy harupy Jul 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amyeroberts @yonigottesman

In mlflow/mlflow#12757, we found this line throws in python 3.8.

https://github.com/mlflow/mlflow/actions/runs/10056412801/job/27795200814?pr=12757#step:12:1016

    class AssistantTracker(Extension):
        # This extension is used to track the indices of assistant-generated tokens in the rendered chat
        tags = {"generation"}
    
        def __init__(self, environment: ImmutableSandboxedEnvironment):
            # The class is only initiated by jinja.
            super().__init__(environment)
            environment.extend(activate_tracker=self.activate_tracker)
            self._rendered_blocks = None
            self._generation_indices = None
    
        def parse(self, parser: jinja2.parser.Parser) -> jinja2.nodes.CallBlock:
            lineno = next(parser.stream).lineno
            body = parser.parse_statements(["name:endgeneration"], drop_needle=True)
            return nodes.CallBlock(self.call_method("_generation_support"), [], [], body).set_lineno(lineno)
    
        @jinja2.pass_eval_context
        def _generation_support(self, context: jinja2.nodes.EvalContext, caller: jinja2.runtime.Macro) -> str:
            rv = caller()
            if self.is_active():
                # Only track generation indices if the tracker is active
                start_index = len("".join(self._rendered_blocks))
                end_index = start_index + len(rv)
                self._generation_indices.append((start_index, end_index))
            return rv
    
        def is_active(self) -> bool:
            return self._rendered_blocks or self._generation_indices
    
        @contextmanager
>       def activate_tracker(self, rendered_blocks: list[int], generation_indices: list[int]):
E       TypeError: 'type' object is not subscriptable

__init__   = <function PreTrainedTokenizerBase._compile_jinja_template.<locals>.AssistantTracker.__init__ at 0x7f013dc78940>
__module__ = 'transformers.tokenization_utils_base'
__qualname__ = 'PreTrainedTokenizerBase._compile_jinja_template.<locals>.AssistantTracker'
_generation_support = <function PreTrainedTokenizerBase._compile_jinja_template.<locals>.AssistantTracker._generation_support at 0x7f013dc78790>
is_active  = <function PreTrainedTokenizerBase._compile_jinja_template.<locals>.AssistantTracker.is_active at 0x7f013dc785e0>
parse      = <function PreTrainedTokenizerBase._compile_jinja_template.<locals>.AssistantTracker.parse at 0x7f013dc78820>
tags       = {'generation'}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def activate_tracker(self, rendered_blocks: list[int], generation_indices: list[int]):
def activate_tracker(self, rendered_blocks: List[int], generation_indices: List[int]):

or from __future__ import annotations needs to be added.

Copy link
Collaborator

@amyeroberts amyeroberts Jul 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for flagging! Opening a PR to fix

https://github.com/huggingface/transformers/pull/32155/files

MHRDYN7 pushed a commit to MHRDYN7/transformers that referenced this pull request Jul 23, 2024
…gface#30650)

return assistant generated tokens mask in apply_chat_template
zucchini-nlp pushed a commit to zucchini-nlp/transformers that referenced this pull request Jul 24, 2024
…gface#30650)

return assistant generated tokens mask in apply_chat_template
itazap pushed a commit that referenced this pull request Jul 25, 2024
return assistant generated tokens mask in apply_chat_template
qubvel added a commit to qubvel/transformers that referenced this pull request Aug 6, 2024
commit 37c5ca5eb9012a1009cf23b892828902f6a8799a
Author: Raushan Turganbay <[email protected]>
Date:   Tue Aug 6 10:24:19 2024 +0500

    Cache: create docs (#32150)

    * draft

    * updates

    * works?

    * try adding python example in hidden section

    * another try

    * hwo do i render python

    * format as html code?

    * Update docs/source/en/kv_cache.md

    Co-authored-by: Joao Gante <[email protected]>

    * Update docs/source/en/kv_cache.md

    Co-authored-by: Joao Gante <[email protected]>

    * Update docs/source/en/kv_cache.md

    Co-authored-by: Joao Gante <[email protected]>

    * Update docs/source/en/kv_cache.md

    Co-authored-by: Joao Gante <[email protected]>

    * Update docs/source/en/kv_cache.md

    Co-authored-by: Joao Gante <[email protected]>

    * one more small update

    * should render hidden secrtion now

    * add outputs

    * fix links

    * check links

    * update all links

    * update with offloaded cache

    * all cache is importable, so they appear in docs

    * fix copies

    * docstring...

    ---------

    Co-authored-by: Joao Gante <[email protected]>

commit 13dc6b0853c3cb54e79b18105c0528bc9e84881c
Author: Francisco Kurucz <[email protected]>
Date:   Mon Aug 5 19:14:50 2024 -0300

    Fix documentation links and code reference to model llava-next (#32434)

commit 7e5d46ded433605a906fcab6be43ac85307cca9b
Author: amyeroberts <[email protected]>
Date:   Mon Aug 5 16:33:19 2024 +0100

    Respect the config's attn_implementation if set (#32383)

    * Respect the config's attn if set

    * Update test - can override in from_config

    * Fix

commit 458b0cd2c544cdd6c700f9b0c21077c889bcee6c
Author: Sai-Suraj-27 <[email protected]>
Date:   Mon Aug 5 19:49:42 2024 +0530

    fix: Updated `test_embeded_special_tokens` for luke and mluke models (#32413)

    Fixed tokenizertests for luke, mluke models.

commit baf7e5c927744122c89ab1270c6c312541c7eb41
Author: Abdi <[email protected]>
Date:   Mon Aug 5 21:15:36 2024 +0800

    Persist embedding type of BART and mBART models after resize (#32242)

    * fix: persist embedding type of MBartConditonalGeneration after resize

    * fix: persist embedding type of BartConditonalGeneration after resize

commit f5f1e52f6cf13cdf63ff25c311d33e2f2a842911
Author: Francisco Kurucz <[email protected]>
Date:   Mon Aug 5 05:18:28 2024 -0300

    Fix documentation references to google/bit-50 model (#32407)

commit ea5da52ebc062ff56f0e3aa05b0e3cc981731e14
Author: Nicholas Broad <[email protected]>
Date:   Mon Aug 5 00:51:58 2024 -0700

    add values for neftune (#32399)

    I always forget what typical values are, and I have to look at the paper everytime. This will be a helpful reminder.

commit 3d7c2f9dea45338b7ebcd459b452e2fad7abfa1f
Author: Ita Zaporozhets <[email protected]>
Date:   Mon Aug 5 09:22:48 2024 +0200

    * save total_vocab_size = vocab_size + user added tokens to speed up operation

    * updating length when added_tokens_decoder is set

    * add test len(tokenizer)

commit 3bb646a54f42030e9bafa47cd3f64367691a3bc5
Author: Raushan Turganbay <[email protected]>
Date:   Mon Aug 5 11:58:42 2024 +0500

    Phi3 tests: fix typing for Python 3.8 (#32388)

    fix phi

commit 05ae3a300d6f3534eeb99a08828a5bae6dd973db
Author: TechInterMezzo <[email protected]>
Date:   Mon Aug 5 08:40:58 2024 +0200

    fix: SeamlessM4TFeatureExtractor stride remainder (#32088)

    * fix: SeamlessM4TFeatureExtractor stride remainder

    * Added attention mask size test

    * Reran ruff for style correction

commit 847bb856d55e3664150e408448fa59d0705b4d60
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date:   Mon Aug 5 08:38:34 2024 +0200

    Bump keras from 2.8.0 to 2.13.1 in /examples/research_projects/decision_transformer (#32393)

    Bump keras in /examples/research_projects/decision_transformer

    Bumps [keras](https://github.com/keras-team/keras) from 2.8.0 to 2.13.1.
    - [Release notes](https://github.com/keras-team/keras/releases)
    - [Commits](https://github.com/keras-team/keras/compare/v2.8.0...v2.13.1)

    ---
    updated-dependencies:
    - dependency-name: keras
      dependency-type: direct:production
    ...

    Signed-off-by: dependabot[bot] <[email protected]>
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

commit 621fb3c0edddf98f3272f3b197e772af4fa30b6c
Author: Xueshen Liu <[email protected]>
Date:   Sat Aug 3 14:07:55 2024 -0400

    MixtralFlashAttention2: put "plus 1" inside parentheses when calculating rotary_seq_len, allowing None position_ids input. (#31500)

    * Mixtral: remove unnecessary plus 1 when calculating rotary_seq_len, allowing position_ids=None (no auto position_ids generation could be unsafe)

    * fix typo [:-1] to [:, -1]

    * to meet formatting requirement

    * to meet formatting requirement

    * remove white space

    * MixtralFlashAttention2: put "+ 1" inside parentheses when calculating rotary_seq_len, allowing None position_ids input. Fix format/style issue.

    * propagate to startcoder2, phi3, mixtral and qwen2

    * update qwen2_moe

commit 7c31d05b59a9dce24b8ddc4b2bb8c8cf6bb5fd77
Author: Shaopeng Fu <[email protected]>
Date:   Sat Aug 3 19:24:11 2024 +0300

    fix: (issue #32124) Exception raised when running `transformers/examples/flax/language-modeling/t5_tokenizer_model.py`. (#32157)

    fix: Exception raised when running .

commit c1aa0edb48217f416f4bbe6e3a9db1500284513b
Author: Sanchit Gandhi <[email protected]>
Date:   Fri Aug 2 17:32:50 2024 +0800

    [generate] only require an attention mask for mps with torch<2.4 (#32367)

    * up

    * style

    * stopping

commit 083e13b7c47f674b11c74d1b7c7ee7cd1241b406
Author: Joao Gante <[email protected]>
Date:   Fri Aug 2 09:39:45 2024 +0100

    RoPE: Add numerical tests ✨  (#32380)

    tests! :D

commit 2af199c42b545f6248475ce456dd6c2a351b8522
Author: Raushan Turganbay <[email protected]>
Date:   Fri Aug 2 09:54:16 2024 +0500

    Update docs (#32368)

    nits

commit 82efc53513a51660e629c7eca8210af1d67df00b
Author: Zach Mueller <[email protected]>
Date:   Thu Aug 1 15:18:43 2024 -0400

    Yell at the user if zero-3 init wasn't performed, but expected to have been done (#32299)

    * Test this zach

    * Test for improper init w/o zero3

    * Move back

    * Apply suggestions from code review

    Co-authored-by: amyeroberts <[email protected]>

    * Get rid of stars in warning

    * Make private

    * Make clear

    ---------

    Co-authored-by: amyeroberts <[email protected]>

commit 51ab25e2932da15511ced35bcbdfa92d25c4794c
Author: OsamaS99 <[email protected]>
Date:   Thu Aug 1 14:57:42 2024 +0200

    Fixed Hybrid Cache Shape Initialization. (#32163)

    * fixed hybrid cache init, added test

    * Fix Test Typo

    ---------

    Co-authored-by: Aaron Haag <[email protected]>

commit e3d8285a84f803e962050e2c2283f3362e36bfbc
Author: Joao Gante <[email protected]>
Date:   Thu Aug 1 13:46:11 2024 +0100

    Docker: add `speech` dep to the consistency docker image (#32374)

commit ca59d6f77c9fda197222f9aa9205d8c7b5dff34e
Author: Nikos Karampatziakis <[email protected]>
Date:   Thu Aug 1 05:42:07 2024 -0700

    Offloaded KV Cache (#31325)

    * Initial implementation of OffloadedCache

    * enable usage via cache_implementation

    * Address feedback, add tests, remove legacy methods.

    * Remove flash-attn, discover synchronization bugs, fix bugs

    * Prevent usage in CPU only mode

    * Add a section about offloaded KV cache to the docs

    * Fix typos in docs

    * Clarifications and better explanation of streams

commit b4727a1216bb21df2795e973063ed07202235d7e
Author: Omar Salman <[email protected]>
Date:   Thu Aug 1 17:32:13 2024 +0500

    Fix conflicting key in init kwargs in PreTrainedTokenizerBase (#31233)

    * Fix conflicting key in init kwargs in PreTrainedTokenizerBase

    * Update code to check for callable key in save_pretrained

    * Apply PR suggestions

    * Invoke CI

    * Updates based on PR suggestion

commit db8c7caeb6b3969a2153b36ba3e5fdef6534c1d6
Author: Viktor Scherbakov <[email protected]>
Date:   Thu Aug 1 14:30:10 2024 +0200

    Empty list in defaults for LLaMA special tokens during weights conversion (#32342)

    empty list in defaults

commit 2229ebe7220fb54bc5f91f575c2d7a988e7122cb
Author: Ita Zaporozhets <[email protected]>
Date:   Thu Aug 1 13:57:41 2024 +0200

    update clean_up_tokenization_spaces warning (#32371)

commit 05c1f9af9a5ebd213dd923e97f6fbed4c115f3c6
Author: Hanna Yukhymenko <[email protected]>
Date:   Thu Aug 1 13:52:05 2024 +0200

    Check device map for saving tokenizer config on TPU (fix for issue #31971) (#32043)

    * Remove TPU device map for saving tokenizer config

    * Update tokenization_utils_base.py

    * Fix error msg when passing non-string device into tokenizer

    * Fix error message for non-string tokenizer device

    * Print out tokenizer device type in error msg

    * Update tokenization_utils_base.py

commit 9e2828403218da16d9759c9be020b70f51df373d
Author: nv-guomingz <[email protected]>
Date:   Thu Aug 1 19:51:20 2024 +0800

    add missing attribute _supports_param_buffer_assignment for gpt-j. (#32359)

    Co-authored-by: Guoming Zhang <[email protected]>

commit 48ed24c50ab29bf690f2ab030721e6a8b0aa5205
Author: Lunwen He <[email protected]>
Date:   Thu Aug 1 04:49:00 2024 -0700

    Remove size check between attn_weights and kv_seq_len for phi3 (#32339)

    * Remove size check between attn_weights and kv_seq_len

    * add unit tests

commit e234061cddd28bb8b82144833241883816289e40
Author: Sanchit Gandhi <[email protected]>
Date:   Thu Aug 1 18:10:56 2024 +0800

    [whisper] compile compatibility with long-form decoding (#31772)

    * [whisper] compile compatibility with long-form decoding

    * clarify comment

    * fix after rebase

    * finalise

    * fix bsz

    * fix cache split

    * remove contiguous

    * style

    * finish

    * update doc

    * prevent cuda graph trace

commit 9451a385261b30e7319a2c93285ab76161e8c003
Author: Sanchit Gandhi <[email protected]>
Date:   Thu Aug 1 16:05:27 2024 +0800

    [enc-dec cache] fix bug in indexing (#32370)

commit 453e74884fb7e2613e7b45033fbb3c1cadb638b4
Author: Raushan Turganbay <[email protected]>
Date:   Thu Aug 1 09:48:03 2024 +0500

    LLaVa: add cache class attribute (#32278)

    cache class flag

commit 14ee2326e51cb210cec72f31b248cb722e9d5d1f
Author: Ricardo <[email protected]>
Date:   Thu Aug 1 06:34:22 2024 +0800

    fix: warmup_steps check for training_args (#32236)

commit 53f0c9c2906e0b0f1623bfdfb420fca1e655098d
Author: Sai-Suraj-27 <[email protected]>
Date:   Thu Aug 1 01:26:50 2024 +0530

    fix: Removed unnecessary `@staticmethod` decorator (#32361)

    * Fixed staticmethods with self as first argument.

    * Fixed staticmethods with self as first argument.

    * Fixed staticmethods with self as first argument.

    * Fixed staticmethods with self as first argument.

commit 92abe6033491dcaa958235e551f40f6b417d3771
Author: fxmarty <[email protected]>
Date:   Wed Jul 31 20:03:07 2024 +0200

    >3-5x faster torch.compile forward compilation for autoregressive decoder models (#32227)

    * draft

    * apply changes to all relevant archs

    * rerun ci - check_docstrings.py failing?

    * fix docstring

    * move 2D->4D mask creation to modeling file

    * repo consistency

    * fix the batch size = 1 case - calling contiguous is not enough

    * nit

    * style

    * propagate to gemma/gemma-2

    * prepare inputs for gemma generation

    * implement test and tiny fix in gemma2

    * Update src/transformers/models/bloom/modeling_bloom.py

    Co-authored-by: Arthur <[email protected]>

    * fix copies

    * ci pass

    * fix gemma's test_compile_static_cache tests

    * flacky

    * retrigger ci

    ---------

    Co-authored-by: sanchit-gandhi <[email protected]>
    Co-authored-by: Arthur <[email protected]>

commit b46bd8b9d2ac991c0c04674957ebc0a65fb3f42b
Author: Aymeric Roucher <[email protected]>
Date:   Wed Jul 31 18:44:53 2024 +0200

    Fix error when streaming to gradio with non-string tool arguments (#32360)

    Fix error when streaming agent run to gradio with non-string tool arguments

commit ef177a5e1cdf0ca53e24e6d76e813198f7300dc4
Author: Joao Gante <[email protected]>
Date:   Wed Jul 31 16:04:48 2024 +0100

    Gemma 2: support assisted generation (#32357)

commit 5f1fcc299cb00c1edce5eb1efb8bacdde2365690
Author: amyeroberts <[email protected]>
Date:   Wed Jul 31 14:51:04 2024 +0100

    [Idefics2] - Fix FA2 call for Perceiver layer (#32275)

    * Fix FA2 call for Perciever layer

    * [run_slow] idefics2

    * [run_slow] idefics2

    * [run_slow] idefics2

    * Fix up

    * [run_slow] idefics2

    * [run_slow] idefics2

    * [run_slow] idefics2

commit b75ad56620431984a44a962c98136c8571b4fca9
Author: Joao Gante <[email protected]>
Date:   Wed Jul 31 11:12:46 2024 +0100

    Llama 3.1: Fix incorrect `inv_freq` assignment (#32330)

    fix 💩

commit 7f552e28e0aca00ce60868c7620f7463eab60e14
Author: Raushan Turganbay <[email protected]>
Date:   Wed Jul 31 10:33:38 2024 +0500

    Gemma2 and flash-attention (#32188)

    * enable flash-attn & static cache

    * this works, not the prev

    * fix for sliding window layers

    * not needed anymore

commit a3264332cfb5ab8675ddb42740a75aeee1782a74
Author: Raushan Turganbay <[email protected]>
Date:   Wed Jul 31 10:01:12 2024 +0500

    LLaVA-NeXT: fix anyres shapes (#32314)

    fix

commit 6e2d04e429dc4ce240c99bd14b7b84550b79fd73
Author: Joshua Lochner <[email protected]>
Date:   Tue Jul 30 23:36:38 2024 +0200

    Fix slow GemmaTokenizer and improve SPM slow -> fast conversion process (#32191)

    * Remove user-defined tokens which can be obtained through merges

    * Remove debug line

    * formatting

    * Refactor spm slow -> fast converter

    * revert unnecessary refactor

    * set comprehension

    * remove test files

    * Use `vocab_scores`

    * Always replace spiece underline with space in decode

    * we no longer need token filtering

    * Add save fast load slow unit test

    * Remove tokenizers version check

    * Remove duplicate code

    * Make `<start_of_turn>` and `<end_of_turn>` special tokens

    * Bias merge priority with length if score is the same

    * Add unit test for merge priority

    * CI

commit 026a173a64372e9602a16523b8fae9de4b0ff428
Author: Joao Gante <[email protected]>
Date:   Tue Jul 30 18:56:10 2024 +0100

    Repo checks: skip docstring checks if not in the diff (#32328)

    * tmp

    * skip files not in the diff

    * use git.Repo instead of an external subprocess

    * add tiny change to confirm that the diff is working on pushed changes

    * add make quality task

    * more profesh main commit reference

commit 516af4bb63538edc448f814e3690dd5171c4f311
Author: fkrasnov2 <[email protected]>
Date:   Tue Jul 30 20:21:45 2024 +0300

    fixes #32329 : The Torch code is correct - to get an average of 10% o… (#32335)

    fixes #32329 : The Torch code is correct - to get an average of 10% of the total, we want to take 50% of the remainder after we've already masked 80% with [MASK] in the previous step.

commit 62c60a30181a65e1a3a7f19c3055a240a6a21335
Author: Wing Lian <[email protected]>
Date:   Tue Jul 30 12:55:59 2024 -0400

    fixes to properly shard FSDP across cpu and meta for cpu_efficient_loading for prequantized 4bit (#32276)

commit 16271080333ad52be5349fb31d789fb232b68760
Author: Sai-Suraj-27 <[email protected]>
Date:   Tue Jul 30 22:23:03 2024 +0530

    fix: Added missing raise keyword for few exceptions (#32333)

    Fixed raising of few exceptions.

commit bd54ed2ed7f578e4122f3e6d536fbe3c9bc76de1
Author: plaggy <[email protected]>
Date:   Tue Jul 30 18:48:18 2024 +0200

    Alternative agent plan (#32295)

    * new agent plan

    * plan type assertion

    * style corrections

    * better prompt naming

    * make fixup

commit e68ec18ce224af879f22d904c7505a765fb77de3
Author: Joao Gante <[email protected]>
Date:   Tue Jul 30 15:49:14 2024 +0100

    Docs: formatting nits (#32247)

    * doc formatting nits

    * ignore non-autodocs

    * Apply suggestions from code review

    Co-authored-by: amyeroberts <[email protected]>

    * Update src/transformers/models/esm/modeling_esm.py

    Co-authored-by: amyeroberts <[email protected]>

    * Update src/transformers/models/esm/modeling_esm.py

    Co-authored-by: amyeroberts <[email protected]>

    * make fixup

    ---------

    Co-authored-by: amyeroberts <[email protected]>

commit 2fbbcf5007509c66b02924ce6dcff66f58e7f58c
Author: Yoach Lacombe <[email protected]>
Date:   Tue Jul 30 16:00:13 2024 +0200

    Fix M4T for ASR pipeline (#32296)

    * tentative fix

    * do the same for M4T

commit 084b5094eb490319719cc11cb05b751e0b419d49
Author: Luc Georges <[email protected]>
Date:   Tue Jul 30 14:49:26 2024 +0200

    feat(ci): set `fetch-depth: 0` in trufflehog checkout step (#31663)

commit 20528f067cf9204cea5178ce0f837245e146e159
Author: Teddy Ferdinan <[email protected]>
Date:   Tue Jul 30 11:25:54 2024 +0200

    Cast epochs_trained to int when resuming training (#32286)

    * fix epochs_trained as int when resuming training

    * refactor

    ---------

    Co-authored-by: teddyferdinan <[email protected]>

commit 934fe1504e6d5e87e01d96305f4d97faa63cf4c1
Author: Isotr0py <[email protected]>
Date:   Tue Jul 30 17:01:00 2024 +0800

    Fix GGUF dequantize for `gguf==0.9.1` (#32298)

    * fix gguf dequantize for gguf==0.9.1

    * fix old version

    * make style

commit 3e8106d2533cbd890ddd1e919bd62132cd4718c3
Author: Gilad Turok <[email protected]>
Date:   Tue Jul 30 03:19:24 2024 -0400

    Docs: fix GaLore optimizer code example (#32249)

    Docs: fix GaLore optimizer example

    Fix incorrect usage of GaLore optimizer in Transformers trainer code example.

    The GaLore optimizer uses low-rank gradient updates to reduce memory usage. GaLore is quite popular and is implemented by the authors in [https://github.com/jiaweizzhao/GaLore](https://github.com/jiaweizzhao/GaLore). A few months ago GaLore was added to the HuggingFace Transformers library in https://github.com/huggingface/transformers/pull/29588.

    Documentation of the Trainer module includes a few code examples of how to use GaLore. However, the `optim_targe_modules` argument to the `TrainingArguments` function is incorrect, as discussed in https://github.com/huggingface/transformers/pull/29588#issuecomment-2006289512. This pull request fixes this issue.

commit f0bc49e7f61f74f055c47ad40e6010f57eed0b0b
Author: Yih-Dar <[email protected]>
Date:   Mon Jul 29 22:12:21 2024 +0200

    use torch 2.4 in 2 CI jobs (#32302)

    Co-authored-by: ydshieh <[email protected]>

commit a24a9a66f446dcb9277e31d16255536c5ce27aa6
Author: Aymeric Roucher <[email protected]>
Date:   Mon Jul 29 20:12:44 2024 +0200

    Add stream messages from agent run for gradio chatbot (#32142)

    * Add stream_to_gradio method for running agent in gradio demo

commit 811a9caa2141bc98f96b36c69abcf1f934bd1fd2
Author: Guang Yang <[email protected]>
Date:   Mon Jul 29 10:19:15 2024 -0700

    Make static cache compatible with torch.export (#32168)

commit 7f5d644e69068825bb5b6e84cdc56b3d3a9bd04f
Author: Sanchit Gandhi <[email protected]>
Date:   Mon Jul 29 21:24:42 2024 +0800

    [pipeline] fix padding for 1-d tensors (#31776)

    * [pipeline] fix padding for 1-d tensors

    * add test

    * make style

    * Update tests/pipelines/test_pipelines_automatic_speech_recognition.py

    Co-authored-by: Kamil Akesbi <[email protected]>

    * Update tests/pipelines/test_pipelines_automatic_speech_recognition.py

    ---------

    Co-authored-by: Kamil Akesbi <[email protected]>

commit 3fbaaaa64d1ef3d8327adb577994d3d11277c77a
Author: Kamil Akesbi <[email protected]>
Date:   Mon Jul 29 11:19:52 2024 +0100

    Whisper tokenizer word level timestamps (#32197)

    * fix _fix_key in PreTrainedModel

    * fix _find_longest_common_sequence

    * add test

    * remove result.json

    * nit

    * update test

commit 7ffe25f2b935dcaf65079b04c5f91c8a42a99e28
Author: Joao Gante <[email protected]>
Date:   Mon Jul 29 10:52:13 2024 +0100

    Generate: end-to-end compilation (#30788)

    * mvp

    * added test (a few models need fixes)

    * fix a few test cases

    * test nits

    * harder test 😈

    * revert changes in stablelm

    * test with improved condition

    * add todo

    * tmp commit

    * merged with main

    * nits

    * add todo

    * final corrections

    * add docs for generation compilation

    * docs nits

    * add  tip

    * PR suggestions

    * add more details to the compilation docs

    * fix cache positions

    * cache is now init in generate; update docs

    * tag test as flaky

    * docs

    * post rebase make fixup and other nits

    * remove unintended changes

    * whisper (encoder-decoder) not supported

    * move token default updates to ; add tests for token defaults

    * push changes

    * manual rebase

    * chameleon doesn't support this

    * fix test_static_cache_mha_mqa_gqa (broken in another PR)

    * docs: dynamic is better with end-to-end compilation

commit 49928892d6491ff5a49c12cbc34695f6fa7ac0ed
Author: Sai-Suraj-27 <[email protected]>
Date:   Mon Jul 29 15:20:43 2024 +0530

    fix(docs): Fixed a link in docs (#32274)

    Fixed a link in docs.

commit 6494479f1de9fe16e9c6f89e52eb0cf81f864a7c
Author: Fanli Lin <[email protected]>
Date:   Mon Jul 29 17:29:11 2024 +0800

    make `p_mask` a numpy array before passing to `select_starts_ends` (#32076)

    * fix

    * bug fix

    * refine

    * fix

commit 535fe78b9f1d148684723e51f00645351880c47a
Author: Joao Gante <[email protected]>
Date:   Mon Jul 29 10:06:05 2024 +0100

    Repo: remove exceptions in `check_docstrings` (#32259)

    remove exceptions

commit a2ad9d5ad53f68c1ad268f7f46538eac6f5b631b
Author: Sai-Suraj-27 <[email protected]>
Date:   Mon Jul 29 14:13:09 2024 +0530

    fix: Fixed wrong argument passed to `convert_blip_checkpoint` function call (#32262)

    Removed one wrong argument passed to convert_blip_checkpoint function call.

commit 5019aabfacf7599b9a6b4e7a1adc1fb5c9017727
Author: leejet <[email protected]>
Date:   Mon Jul 29 15:51:43 2024 +0800

    Optimize t5 tokenize logic to avoid redundant calls (#32270)

    * Optimize t5 tokenize logic to avoid redundant calls

    * fix and overwrite copies

commit f2122cc6eb8e50e4d1b45da54b43bba59a458b30
Author: Yih-Dar <[email protected]>
Date:   Mon Jul 29 09:42:54 2024 +0200

    Upload new model failure report to Hub (#32264)

    upload

    Co-authored-by: ydshieh <[email protected]>

commit f7396876849926afa87c9412d67c43618dad403d
Author: Raushan Turganbay <[email protected]>
Date:   Mon Jul 29 10:58:59 2024 +0500

    🚨 Bloom support for cache class (#31445)

    * bloom dynamic cache

    * bloom follows standard cache format

    * no skips for bloom anymore

    * use cache position when possible

    * clean up

    * codestyle

    * Update src/transformers/models/bloom/modeling_bloom.py

    Co-authored-by: amyeroberts <[email protected]>

    * Update src/transformers/models/bloom/modeling_bloom.py

    Co-authored-by: amyeroberts <[email protected]>

    * Update src/transformers/models/bloom/modeling_bloom.py

    Co-authored-by: amyeroberts <[email protected]>

    * pr comments

    * isinstance fix

    * address comments

    * make musicgen test happy

    * [run-slow] bloom

    ---------

    Co-authored-by: amyeroberts <[email protected]>

commit 44f6fdd74f84744b159fa919474fd3108311a906
Author: Joao Gante <[email protected]>
Date:   Sat Jul 27 10:19:46 2024 +0100

    Llama 3.1: replace for loop by tensor ops at inv_freq initialization (#32244)

    * replace for loop by tensor ops

    * rm assert; readability

commit 8da90687308a10b33c5553b8a506cc04aab31702
Author: Yih-Dar <[email protected]>
Date:   Fri Jul 26 20:52:45 2024 +0200

    More flexible trigger condition (#32251)

    update

    Co-authored-by: ydshieh <[email protected]>

commit 81233c069c166af033794134bd8888783ac49ebe
Author: Raushan Turganbay <[email protected]>
Date:   Fri Jul 26 14:45:55 2024 +0500

    Flash-Attn: fix generation when no attention mask or no pading (#32241)

    * fix

    * fix prev test (half of failures)

    * [run-slow] llama, gemma2

    * [run-slow] llama, gemma2

commit 27c7f971c0dcd3bb423ea221fe2bce751d313119
Author: Fanli Lin <[email protected]>
Date:   Fri Jul 26 17:41:27 2024 +0800

    [tests] fix `static` cache implementation is not compatible with `attn_implementation==flash_attention_2` (#32039)

    * add flash attention check

    * fix

    * fix

commit 5f841c74b62754f186a8c06a684d491524b7bc03
Author: Connor Anderson <[email protected]>
Date:   Fri Jul 26 05:05:46 2024 -0400

    Add check for `target_sizes is None` in `post_process_image_guided_detection` for owlv2 (#31934)

    * Add check for target_sizes is None in post_process_image_guided_detection

    * Make sure Owlvit and Owlv2 in sync

    * Fix incorrect indentation; add check for correct size of target_sizes

commit f9756d9edb23354e3df50f7eb3f6b3129a25e453
Author: Rohit Dwivedula <[email protected]>
Date:   Fri Jul 26 04:05:38 2024 -0500

    Adds: extra_repr for RMSNorm layers in most models (#32204)

    * adds: extra_repr() to RMSNorm layers in multiple models

    * adds: extra_repr for deprecated models as well

    * formatting as per style guide

commit b8e5cd5396f7c0cc2d5e10be6696ea38742abf51
Author: Sai-Suraj-27 <[email protected]>
Date:   Fri Jul 26 14:03:02 2024 +0530

    Refactor: Removed un-necessary `object` base class (#32230)

    * Refactored to remove un-necessary object base class.

    * small fix.

commit 1c7ebf1d6eaf0ed0fd4101fd6eb7e64601429cfe
Author: João Nadkarni <[email protected]>
Date:   Fri Jul 26 09:38:59 2024 +0200

    don't log base model architecture in wandb if log model is false (#32143)

    * don't log base model architecture in wandb is log model is false

    * Update src/transformers/integrations/integration_utils.py

    Co-authored-by: amyeroberts <[email protected]>

    * convert log model setting into an enum

    * fix formatting

    ---------

    Co-authored-by: amyeroberts <[email protected]>

commit c46edfb8230bcc3152e8338742dc4822289acb3d
Author: Raushan Turganbay <[email protected]>
Date:   Fri Jul 26 10:52:06 2024 +0500

    Resize embeds with DeepSpeed  (#32214)

    * fix resize when deepspeed

    * deepsped uses new embeds

    * we needed this

commit fad15fba78e4603cd20695757ad899a6687485f9
Author: Raushan Turganbay <[email protected]>
Date:   Fri Jul 26 10:17:27 2024 +0500

    Llava: generate without images (#32183)

    * llava w/o images

    * tests

commit 4ab33c2d81866d4dd2f29df07f1a35491acbb39b
Author: Raushan Turganbay <[email protected]>
Date:   Fri Jul 26 10:16:06 2024 +0500

    Generation: stop at `eos` for assisted decoding (#31301)

    * fix

    * move changes to prompt lookup

    * add test

    * set eos in assistant model

    * style

    * fix flakiness

    * changes for new `main`

    * Update tests/generation/test_utils.py

    Co-authored-by: amyeroberts <[email protected]>

    * Update tests/generation/test_utils.py

    Co-authored-by: amyeroberts <[email protected]>

    * add comment to explain

    ---------

    Co-authored-by: amyeroberts <[email protected]>

commit 9d6c0641c4a3c2c5ecf4d49d7609edd5b745d9bc
Author: Pavel Iakubovskii <[email protected]>
Date:   Thu Jul 25 19:20:47 2024 +0100

    Fix code snippet for Grounding DINO (#32229)

    Fix code snippet for grounding-dino

commit 3a83ec48a63a8298c8193be48cf00785674bfb70
Author: jrhe <[email protected]>
Date:   Thu Jul 25 17:16:13 2024 +0100

    Allow a specific microphone to be used by the ffmpeg audio pipeline utility functions. Default to using the currently active microphone on Mac (#31846)

    * use currently active microphone on mac for ffmpeg_microphone

    * Allow ffmpeg_microphone device to be specified

    Co-authored-by: amyeroberts <[email protected]>

    ---------

    Co-authored-by: amyeroberts <[email protected]>

commit 6ed0bf1e8543a7d8e6640bbf9a655c5e1401f7de
Author: Huazhong Ji <[email protected]>
Date:   Fri Jul 26 00:01:06 2024 +0800

    translate philosophy.md to chinese (#32177)

    * translate philosophy.md to chinese

    * add the missing link

commit df6eee9201e4ba2b80cea021a18e95ada26ca2cc
Author: Yih-Dar <[email protected]>
Date:   Thu Jul 25 16:12:23 2024 +0200

    Follow up for #31973 (#32025)

    * fix

    * [test_all] trigger full CI

    ---------

    Co-authored-by: ydshieh <[email protected]>

commit de2318894e4f971ea2273c653a702dc93db2bd6a
Author: Kashif Rasul <[email protected]>
Date:   Thu Jul 25 15:12:23 2024 +0200

    [warnings] fix E721 warnings (#32223)

    fix E721 warnings

commit 9b9a54e61bf8749588178b37c23d77b90679fd10
Author: Kashif Rasul <[email protected]>
Date:   Thu Jul 25 15:11:43 2024 +0200

    [BigBird Pegasus] set _supports_param_buffer_assignment to False (#32222)

    set _supports_param_buffer_assignment to False

commit 1ecedf1d9ee927bac5b5bae8cb1892d936a5b622
Author: Austin <[email protected]>
Date:   Thu Jul 25 07:20:27 2024 -0500

    Update question_answering.py (#32208)

commit f53a5dec7b03eb195dc89c82ae761b033db1ceb6
Author: Huazhong Ji <[email protected]>
Date:   Thu Jul 25 17:04:04 2024 +0800

    remove unnecessary guard code related with pytorch versions 1.4.2 ~ 1.7.0 (#32210)

    remove unnecessary guard code related with pytorch versions 1.4.2 ~
    1.7.0

commit 5658e749adbaaf883caec003cecae8ce0a4261a6
Author: Sanchit Gandhi <[email protected]>
Date:   Thu Jul 25 16:58:02 2024 +0800

    [whisper] fix short-form output type (#32178)

    * [whisper] fix short-form output type

    * add test

    * make style

    * update long-form tests

    * fixes

    * last fix

    * finalise test

commit 85a1269e19af022e04bc2aad82572cd5a9e8cdd9
Author: Sai-Suraj-27 <[email protected]>
Date:   Wed Jul 24 22:30:21 2024 +0530

    fix: Replaced deprecated `unittest method` with the correct one (#32198)

    Replaced deprecated unittest method with the correct one.

commit edd68f4ed8db241bd3e9dc6c4ed96d471f243c9a
Author: Matt <[email protected]>
Date:   Wed Jul 24 17:36:32 2024 +0100

    :rotating_light: No more default chat templates (#31733)

    * No more default chat templates

    * Add the template to the GPT-SW3 tests since it's not available by default now

    * Fix GPT2 test

    * Fix Bloom test

    * Fix Bloom test

    * Remove default templates again

commit 1c122a46dc3c4448901f8d2f3018d9d58b846ba5
Author: Penut Chen <[email protected]>
Date:   Wed Jul 24 23:59:59 2024 +0800

    Support dequantizing GGUF FP16 format (#31783)

    * support gguf fp16

    * support gguf bf16 with pytorch

    * add gguf f16 test

    * remove bf16

commit af0e4b7b37b2d7eefe7531cf5201a5d6bae85525
Author: Marc Sun <[email protected]>
Date:   Wed Jul 24 17:14:05 2024 +0200

    Fix float8_e4m3fn in modeling_utils (#32193)

    * Fix float8_e4m3fn in modeling_utils

    * style

    * fix

    * comment

commit 1392a6867f40a55dfabaf306745c67627598b1af
Author: Raushan Turganbay <[email protected]>
Date:   Wed Jul 24 19:26:20 2024 +0500

    Fix resize embedding with Deepspeed (#32192)

    fix resize when deepspeed

commit 8d2534c4d0ab94a97a72d2ce6bb9ccd201abadb3
Author: Arthur <[email protected]>
Date:   Wed Jul 24 16:06:39 2024 +0200

    let's not warn when someone is running a forward  (#32176)

    * let's not warn when someone is running a foward without cache + self.training

    * more models

    * fixup

commit e0182f3bd7f4753c1e378e052ceea67898d97359
Author: Joao Gante <[email protected]>
Date:   Wed Jul 24 15:00:48 2024 +0100

    RoPE: relaxed rope validation (#32182)

    * relaxed rope check

    * lets also accept rope_type=None, defaulting to the original implementation

    * type and rope_type can coexist

commit 165116bc145dcc186fa287e624b28a9ab3a79955
Author: amyeroberts <[email protected]>
Date:   Wed Jul 24 14:03:40 2024 +0100

    Remove conversational pipeline tests (#32099)

    Remove conversation pipeline tests

commit 5f4ee98a7ade33e1c54fdd6181d04ee7b426b392
Author: Dr. Artificial曾小健 <[email protected]>
Date:   Wed Jul 24 18:54:41 2024 +0800

    Update qwen2.md (#32108)

    * Update qwen2.md

    outdated description

    * Update qwen2.md

    amended

    * Update qwen2.md

    Update

    * Update qwen2.md

    fix wrong version code, now good to go

commit 8678879f1dc2578cec18232146bf19de97aecaa1
Author: 조준래 <[email protected]>
Date:   Wed Jul 24 19:38:49 2024 +0900

    fix: default value reflects the runtime environment variables rather than the ones present at import time. (#32153)

    * fix: default value reflects the runtime environment variables rather than the ones present at import time.

    * Fix: Change `deterministic` to None by default; use env var if None

commit 01be5b48790f113b7d71943b580c842e3e097988
Author: Rohit Dwivedula <[email protected]>
Date:   Wed Jul 24 02:09:59 2024 -0500

    adds: extra_repr() to MambaRMSNorm to include hidden size / size of weights in the layer (#32171)

    * adds: extra_repr() to MambaRMSNorm to include the hidden size of the layer

    * style fix with ruff:

commit c85510f958e6955d88ea1bafb4f320074bfbd0c1
Author: Fanli Lin <[email protected]>
Date:   Wed Jul 24 00:47:51 2024 +0800

    [docs] change temperature to a positive value (#32077)

    fix

commit bc2adb0112b6677b0dfb4105c74570a0f92183eb
Author: Sai-Suraj-27 <[email protected]>
Date:   Tue Jul 23 21:22:41 2024 +0530

    fix: Fixed an if condition that is always evaluating to true (#32160)

    Fixed an if condition always evaluating to true.

commit 23f6a43f82fb2980f4b30cf3f95eb3a940384895
Author: Joao Gante <[email protected]>
Date:   Tue Jul 23 16:48:16 2024 +0100

    fix (#32162)

commit d5a99dfcee6e94065cb7c83cc8ab6fc5daa0cc4e
Author: Lysandre <[email protected]>
Date:   Tue Jul 23 16:58:17 2024 +0200

    Llama 3.1 conversion

    Co-authored-by: Arthur Zucker <[email protected]>

commit ff0d708fe627d6715f9a3e97d0a7947f70437447
Author: Lysandre <[email protected]>
Date:   Tue Jul 23 17:12:47 2024 +0200

    Dev version: v4.44.0.dev0

commit d2c687b3f1859b5c61258af14abba5312c0e6201
Author: Sai-Suraj-27 <[email protected]>
Date:   Tue Jul 23 20:37:31 2024 +0530

    Updated `ruff` to the latest version (#31926)

    * Updated ruff version and fixed the required code accorindg to the latest version.

    * Updated ruff version and fixed the required code accorindg to the latest version.

    * Added noqa directive to ignore 1 error shown by ruff

commit 9cf4f2aa9a9cecbb22e813931ef3bb72fc773540
Author: RhuiDih <[email protected]>
Date:   Tue Jul 23 21:56:41 2024 +0800

    Enhancing SFT Training Efficiency Using Packing and FlashAttention2 with Position IDs (#31629)

    * add DataCollatorBatchFlattening

    * Update data_collator.py

    * change name

    * new FA2 flow if position_ids is provided

    * add comments

    * minor fix

    * minor fix data collator

    * add test cases for models

    * add test case for data collator

    * remove extra code

    * formating for ruff check and check_repo.py

    * ruff format

    ruff format tests src utils

    * custom_init_isort.py

commit 7d92009af647167bae338e9d4af8bc0452c62fbf
Author: Deep Gandhi <[email protected]>
Date:   Tue Jul 23 19:11:52 2024 +0530

    Added additional kwarg for successful running of optuna hyperparameter search (#31924)

    Update integration_utils.py

    Added additional kwarg

commit 63700628adb91600c84fe3bbbc4c667cd3e3aa71
Author: Alvaro Moran <[email protected]>
Date:   Tue Jul 23 14:18:19 2024 +0200

    feat(cache): StaticCache uses index_copy_ to avoid useless copy (#31857)

    * feat(cache): StaticCache uses index_copy_ to avoid useless copy

    Using index_copy_ allows for explicit in-place change of the tensor.
    Some backends (XLA) will otherwise copy the tensor, making the code
    slower and using more memory.

    Proposed implementation will end up using less memory and on XLA will
    result in less compilation, but the change is also quite generic, making
    no change whatsoever on CUDA or CPU backend.

    * feat(cache): SlidingWindowCache uses index_copy_ to avoid useless copy

    Applying the same change done in StaticCache.

    * fix(cache): fallback of index_copy_ when not implemented

    * fix(cache): in index_copy_ ensure tensors are on same device

    * [run slow] llama

    * fix(cache): add move of cache_position to same device in SlidingWindowCache

    * Revert "[run slow] llama"

    This reverts commit 02608dd14253ccd464e31c108e0cd94364f0e8b9.

commit a009fbdab32a4b068c24052a4dfe7a7bc0fc89f9
Author: amyeroberts <[email protected]>
Date:   Tue Jul 23 12:23:34 2024 +0100

    Fix typing to be compatible with later py versions (#32155)

commit 3263b3435473cbb5dc66925bc29c1d32b5b8d431
Author: Sanchit Gandhi <[email protected]>
Date:   Tue Jul 23 18:34:30 2024 +0800

    Revert "Incorrect Whisper long-form decoding timestamps " (#32148)

    Revert "Incorrect Whisper long-form decoding timestamps  (#32003)"

    This reverts commit cd48553fc8375e1a28d4d82cfe231dedf6a23af8.

commit 034b47784765e37ecc20f7ad43640f1a2c0094fd
Author: Amit Garg <[email protected]>
Date:   Tue Jul 23 03:33:22 2024 -0700

    Rename Phi-3 rope scaling type (#31436)

    * renamed phi3 rope_scaling type

    * fixed trailing whitespaces

    * fixed test

    * added warning

    * fixed format

commit bab32d6fe932a3372fbd6d5a84e3cacb12a61ae0
Author: Alexandre TL <[email protected]>
Date:   Tue Jul 23 12:32:19 2024 +0200

    Added mamba.py backend (#30139)

    * Update README.md

    * tests: forward ok

    * backward test done

    * done testing

    * removed check. scripts

    * Update README.md

    * added use_mambapy arg

    * fixed typo in warning

    * protected imports w/ mambapy package

    * delete pscan.py + raise rather than assert

    * Update import_utils.py

    * fix whitespaces and unused import

    * trailing whitespace + import block unformatted

    * Update modeling_mamba.py

    * transpose before pscan

    * shape comment

    * ran make style

    * use_mambapy=False by default

    Co-authored-by: Arthur <[email protected]>

    * ran make fix-copies

    ---------

    Co-authored-by: Arthur <[email protected]>

commit 9ced33ca7f909d9ace743dac083daba99c904d46
Author: Merve Noyan <[email protected]>
Date:   Tue Jul 23 13:23:23 2024 +0300

    Fix video batching to videollava (#32139)

    ---------

    Co-authored-by: Merve Noyan <[email protected]>

commit a5b226ce9811aa6b31af0bc9c09c54493a4e67c1
Author: Cyril Vallez <[email protected]>
Date:   Tue Jul 23 12:21:23 2024 +0200

    Fix flash attention speed issue (#32028)

    Add the lru_cache for speed

commit a1844a3209eb7e75582684809203bc189931a90c
Author: Ita Zaporozhets <[email protected]>
Date:   Tue Jul 23 11:45:54 2024 +0200

    gguf conversion add_prefix_space=None for llama3 (#31937)

    * gguf conversion forces add_prefix_space=False for llama3, this is not required and forces from_slow, which fails. changing to None + test

    * typo

    * clean test

commit 2e113422b3504fe6de821bb9911b24273b11aa9c
Author: Joao Gante <[email protected]>
Date:   Tue Jul 23 10:42:55 2024 +0100

    Llama: RoPE refactor (#32135)

    Co-authored-by: amyeroberts <[email protected]>
    Co-authored-by: Arthur <[email protected]>

commit 5a4a76edb7ac6bbc764392e89adc11adda91f3e5
Author: bayllama <[email protected]>
Date:   Tue Jul 23 02:28:44 2024 -0700

    Modify resize_token_embeddings to ensure output type is same as input (#31979)

    * Change resize_token_embeddings to make it return same Class that is passed to it

    * Add explanatory comment as requested in review

    * Add explanatory comments for add resizing function in lxmert

    * Add comment for padding_idx and moving _resize_bias in lxmert to LxmertForPreTraining

    ---------

    Co-authored-by: Prashanth Sateesh <[email protected]>
    Co-authored-by: Prashanth Sateesh <[email protected]>

commit 1535a2c93d325e529dc9a1907f99247fdf8a58e7
Author: Daniel Lok <[email protected]>
Date:   Tue Jul 23 17:26:00 2024 +0800

    Disable quick init for TapasPreTrainedModel (#32149)

    add attribute to model

    Signed-off-by: Daniel Lok <[email protected]>

commit 34b43211d782c00da6fef778dbfaff69bbf3f115
Author: mig-mfreitas <[email protected]>
Date:   Tue Jul 23 10:07:58 2024 +0100

    Add YaRN and Dynamic-YaRN RoPE Scaling Methods (#30910)

    * Add YaRN and Dynamic-YaRN RoPE Scaling Methods

    YaRN (Yet another RoPE extension method) combines the NTK-By-Parts
    Interpolation and Attention Scaling methods, improving upon existing
    RoPE interpolation methods for longer context window sizes.

    Fine-tuned models maintain their original performance across benchmarks
    while enabling efficient extrapolation and transfer learning for
    quicker convergence, especially in compute-limited environments.

    We implement YaRN and Dynamic-YaRN for the following list of models:

     - LLaMA
     - Falcon
     - GPT-NeoX
     - Olmo
     - Persimmon
     - Phi
     - StableLM
     - OpenLLaMA

    New unit tests are added to assert YaRN's correct behavior on both
    short and long sequence inputs.

    For more details, please refer to https://arxiv.org/abs/2309.00071.

    Co-authored-by: Miguel Almeida <[email protected]>

    * Refactor YaRN implementation for LLaMA

    Iterate on YaRN implementation for LLaMA and remove diff from remaining
    models for increased PR modularity.

    This commit includes the following changes:
    - Merge 'yarn_rope_scaling' and 'rope_scaling' dictionaries
    - Remove unnecessary attributes ('extrapolation_factor' and 'finetuned')
      from YaRN classes
    - Inherit 'forward' method in YaRN classes from superclass
    - Rename 'yarn' method to 'compute_yarn_scaling'
    - Extend YaRN tests with further assertions
    - Fix style inconsistencies

    Co-authored-by: Miguel Monte e Freitas <[email protected]>

    * Refactor Tensor Building Logic for YaRN

    - Comply with the the tensor building logic introduced in #30743
    - Add referencing to the optimized Attention Factor equation
    - Remove Dynamic YaRN for a more agile deployment

    Co-authored-by: mig-mfreitas <[email protected]>

    * remove unwanted file

    ---------

    Co-authored-by: Miguel Almeida <[email protected]>
    Co-authored-by: mig-mfreitas <[email protected]>
    Co-authored-by: Joao Gante <[email protected]>

commit 7405c1c77e4637768ea0ad5d27d8a4d8d67bfb19
Author: KonradSzafer <[email protected]>
Date:   Tue Jul 23 10:56:21 2024 +0200

    Add method to retrieve used chat template (#32032)

    encapsulate chat template logic

commit 605f3245dcca34381c35520c35ba0b701ed80d58
Author: Anton Vlasjuk <[email protected]>
Date:   Tue Jul 23 10:11:12 2024 +0200

    Fix mask creations of `GPTNeoX` and `GPT2` (#31944)

    * fix mask creation of gpt2 and gpt_neox caused by me

    * forgot the reshape of masks when shape > 2

    * add tests for gpt neox and gpt2

    * nit on a comment

commit 2782aadae2b0b0c313eac3ee70f84f0335577635
Author: Sanchit Gandhi <[email protected]>
Date:   Tue Jul 23 14:55:16 2024 +0800

    [modelling] remove un-necessary transpose for fa2 attention (#31749)

    * [whisper] remove un-necessary transpose for fa2 attention

    * propagate

commit f83c6f1d02fba5e5ced9357b9c9196c76d937af3
Author: Sanchit Gandhi <[email protected]>
Date:   Tue Jul 23 14:54:38 2024 +0800

    Remove `trust_remote_code` when loading Libri Dummy (#31748)

    * [whisper integration] use parquet dataset for testing

    * propagate to others

    * more propagation

    * last one

commit 3aefb4ec7f957f9561a410eabc6f9d57b2f0384f
Author: Raushan Turganbay <[email protected]>
Date:   Tue Jul 23 10:23:55 2024 +0500

    LLaVaNeXT: pad on right if training (#32134)

    * pad on right if training

    * docs

    * add tests

commit 251a2409c694c29ee28e66c954670c483cf54961
Author: James Thewlis <[email protected]>
Date:   Tue Jul 23 01:12:16 2024 -0400

    Add llama3-llava-next-8b to llava_next conversion script (#31395)

    * Add llama3-llava-next-8b to llava_next conversion script

    Adds support for the lmms-lab/llama3-llava-next-8b model to the
    convert_llava_next_weights_to_hf.py script, along with an example
    prompt generated from the llava_llama_3 conv_template in the LLaVA-NeXT
    repo.

    * Exclude <|begin_of_text|> from prompt example

    This token gets added automatically, so it should not be included in the
    prompt example.

    * Add llava-next-72b and llava-next-110b

    Adds the Qwen-based LLaVA-Next models to the conversion script, along
    with changes to load the models on multiple GPUs for inference.

    * Add llama3 and qwen prompt formats to docs

    * Chat prompt and padding side left for llama3 batched

    * update

    * Update src/transformers/models/llava_next/convert_llava_next_weights_to_hf.py

    Co-authored-by: amyeroberts <[email protected]>

    * Update src/transformers/models/llava_next/convert_llava_next_weights_to_hf.py

    Co-authored-by: amyeroberts <[email protected]>

    * remove code

    * better naming

    ---------

    Co-authored-by: raushan <[email protected]>
    Co-authored-by: Raushan Turganbay <[email protected]>
    Co-authored-by: amyeroberts <[email protected]>

commit 96a074fa7e2c04b904f72d9e827398d4c5f90f25
Author: Marc Sun <[email protected]>
Date:   Mon Jul 22 20:21:59 2024 +0200

    Add new quant method (#32047)

    * Add new quant method

    * update

    * fix multi-device

    * add test

    * add offload

    * style

    * style

    * add simple example

    * initial doc

    * docstring

    * style again

    * works ?

    * better docs

    * switch to non persistant

    * remove print

    * fix init

    * code review

commit bd9dca3b855b5a20ea11097b89c40f34d775f1c7
Author: Arthur <[email protected]>
Date:   Mon Jul 22 19:42:47 2024 +0200

    set warning level to info for special tokens have been added (#32138)

    fixes #7002

commit 817a676bd711f9626e13578068b36ef09cf572dc
Author: amyeroberts <[email protected]>
Date:   Mon Jul 22 18:29:50 2024 +0100

    Don't default to other weights file when use_safetensors=True (#31874)

    * Don't default to other weights file when use_safetensors=True

    * Add tests

    * Update tests/utils/test_modeling_utils.py

    * Add clarifying comments to tests

    * Update tests/utils/test_modeling_utils.py

    * Update tests/utils/test_modeling_utils.py

commit 74d0eb3fedf353bd670aa85ae8fcf4c85f287b5b
Author: Yoni Gottesman <[email protected]>
Date:   Mon Jul 22 20:24:43 2024 +0300

    Return assistant generated tokens mask in apply_chat_template  (#30650)

    return assistant generated tokens mask in apply_chat_template

commit 7987710696803c74ce1b5e7f9dfa055096a6c00e
Author: Bertrand Thia <[email protected]>
Date:   Mon Jul 22 13:08:27 2024 -0400

    [RoBERTa] Minor clarifications to model doc (#31949)

    * minor edits and clarifications

    * address comment

    Co-authored-by: Steven Liu <[email protected]>

    ---------

    Co-authored-by: Steven Liu <[email protected]>

commit 12b6880c81db7742a29ea425dcb9e63b7dbdc449
Author: Sai-Suraj-27 <[email protected]>
Date:   Mon Jul 22 22:16:17 2024 +0530

    fix: Fixed raising `TypeError` instead of `ValueError` for invalid type (#32111)

    * Raised TypeError instead of ValueError for invalid types.

    * Updated formatting using ruff.

    * Retrieved few changes.

    * Retrieved few changes.

    * Updated tests accordingly.

commit d1ec36b94f5ba45fb2423e74074cfedab48cfe73
Author: Woojun Jung <[email protected]>
Date:   Tue Jul 23 00:27:13 2024 +0900

    Update `ko/_toctree.yml` and remove `custom_tools.md` to reflect latest changes (#31969)

    update `ko/_toctree.yml` and remove `custom_tools.md`

commit 7ba028fccb82cbee792b67d596120da8ae9397c9
Author: Matt <[email protected]>
Date:   Mon Jul 22 16:07:29 2024 +0100

    Fix failing test with race condition (#32140)

    * Fix failing test with race condition

    * make fixup

    * monotonic_ns instead of randint

    * uuid4 instead of monotonic_ns

    * Add a finally cleanup step

commit 5a649ff3ecd70599dd0fea7ee430ba47b51a4556
Author: Sanchit Gandhi <[email protected]>
Date:   Mon Jul 22 21:18:48 2024 +0800

    [generate] fix eos/pad id check on mps devices (#31695)

    Co-authored-by: Joao Gante <[email protected]>

commit f2a1e3ca684df624016285266a0ae519e4483be7
Author: Lucain <[email protected]>
Date:   Mon Jul 22 15:14:47 2024 +0200

    Mention model_info.id instead of model_info.modelId (#32106)

commit 0fcfc5ccc968ff5a1a439db04a94f566a0bd1d89
Author: Sai-Suraj-27 <[email protected]>
Date:   Mon Jul 22 18:43:39 2024 +0530

    fix: Replaced deprecated `mktemp()` function (#32123)

    Replaced deprecated mktemp function.

commit c38c55f4fbc0163cc02ef4588fe2ec391171a2f0
Author: Joao Gante <[email protected]>
Date:   Mon Jul 22 14:06:49 2024 +0100

    Generate: store special token tensors under a unique variable name (#31980)

    * rename stuff

    * english; this one shouldn't be changed

    * add a _ to the new var names

    * musicgen

    * derp

commit aa8f86a421e23fe41b6333efc11ea4248e098d83
Author: Brian <[email protected]>
Date:   Mon Jul 22 08:06:22 2024 -0400

    Fix shard order (#32023)

commit b3818805978b411713725a1b7470dc1bda073c29
Author: Aymeric Roucher <[email protected]>
Date:   Mon Jul 22 10:49:57 2024 +0200

    Agents planning (#31702)

    * Allow planning for agents

commit 0fdea8607d7e01eb0e38a1ebeb7feee30a22f0cf
Author: Lucain <[email protected]>
Date:   Fri Jul 19 20:32:39 2024 +0200

    Fix tests after `huggingface_hub` 0.24 (#32054)

    * adapt tests

    * style

    * comment

commit fe008d6ebea1f5770b740991daeefd9322fa434a
Author: Raushan Turganbay <[email protected]>
Date:   Fri Jul 19 19:21:45 2024 +0500

    Chameleon: not supported with fast load (#32091)

    fixes

commit 62aa270f2ab3acca2a58cde8f08400ec49330b03
Author: Zach Mueller <[email protected]>
Date:   Fri Jul 19 08:58:53 2024 -0400

    Disable quick init for deepspeed (#32066)

    Disable via deepspeed

commit 89575b567e061fd87bdd655ba188b6c7a922d54a
Author: Kamil Akesbi <[email protected]>
Date:   Fri Jul 19 13:42:22 2024 +0100

    Support generating with fallback for short form audio in Whisper (#30984)

    * remove is_shortform

    * adapt _retrieve_max_frames_and_seek for short_form

    * return bos token in short and long form

    * add decoder_input_ids to short form audios

    * add eos token for  short form

    * handle short form token_timestamps

    * no need to return scores

    * add is_shortform conditions

    * handle when max_new_tokens is None - short form

    * handle assistant decoding

    * fix

    * handle return_dict_in_generate

    * handle split_by_batch for encoder_attentions attribute

    * handle num_beams>1

    * handle num_return_sequences>1 in generate_with_fallback

    * handle num_return_sequences>1 with return_dict_in_generate=True

    * raise error if max_new_tokens + decoder_inputs_ids > max_target_pos

    * fix

    * apply review suggestions

    * fix

    * Update src/transformers/models/whisper/generation_whisper.py

    Co-authored-by: Sanchit Gandhi <[email protected]>

    * Update src/transformers/models/whisper/generation_whisper.py

    Co-authored-by: Sanchit Gandhi <[email protected]>

    * Update src/transformers/models/whisper/generation_whisper.py

    Co-authored-by: Sanchit Gandhi <[email protected]>

    * fix

    * logits for both short form and long form

    * handle if logits_processor is None

    * test

    * apply review changes to num_return_sequences

    * add _expand_variables_for_generation

    * remove short form commented section

    * update comments

    * uncomment num_beams line in generate_with_fallback

    * update assistant decoding

    * handle return_segment with short form generation

    * up

    * fix output format is_shortform

    * overwrite beam_sample test

    * update _set_return_timestamps

    * apply review suggestions

    * apply review suggestions

    * remove seek_outputs_short_form

    * fix _stack_split_outputs

    * fix stack dim in _stack_split_outputs

    * update tests

    * fix past_key_values + beam tests

    * fix

    * clean _expand_variables_for_generation

    * make style

    * fix slow tests

    * make style

    * max_length condition

    * make style

    * add slow tests for shortform fallback

    * Update src/transformers/models/whisper/generation_whisper.py

    Co-authored-by: Sanchit Gandhi <[email protected]>

    * Update src/transformers/models/whisper/generation_whisper.py

    Co-authored-by: Sanchit Gandhi <[email protected]>

    * apply review changes

    * Update src/transformers/models/whisper/generation_whisper.py

    Co-authored-by: Sanchit Gandhi <[email protected]>

    * up

    * fix slow tests

    * apply review suggestions

    * update test

    * make style

    * small fix

    * fix

    * fix test_new_cache_format

    * fix past_key_values

    * fix

    * make style

    * fix slow tests

    * fix

    ---------

    Co-authored-by: Sanchit Gandhi <[email protected]>

commit 46835ec6aed62e9a73784f1b6a43030afd601e5e
Author: Merve Noyan <[email protected]>
Date:   Fri Jul 19 15:40:40 2024 +0300

    Add image-text-to-text task guide (#31777)

    * Add image-text-to-text task page

    * Update docs/source/en/tasks/image_text_to_text.md

    Co-authored-by: Steven Liu <[email protected]>

    * Update docs/source/en/tasks/image_text_to_text.md

    Co-authored-by: Steven Liu <[email protected]>

    * Update docs/source/en/tasks/image_text_to_text.md

    Co-authored-by: Steven Liu <[email protected]>

    * Update docs/source/en/tasks/image_text_to_text.md

    Co-authored-by: Steven Liu <[email protected]>

    * Update docs/source/en/tasks/image_text_to_text.md

    Co-authored-by: Steven Liu <[email protected]>

    * Update docs/source/en/tasks/image_text_to_text.md

    Co-authored-by: Steven Liu <[email protected]>

    * Update docs/source/en/tasks/image_text_to_text.md

    Co-authored-by: Steven Liu <[email protected]>

    * Update docs/source/en/tasks/image_text_to_text.md

    Co-authored-by: Steven Liu <[email protected]>

    * Update docs/source/en/tasks/image_text_to_text.md

    Co-authored-by: Steven Liu <[email protected]>

    * Update docs/source/en/tasks/image_text_to_text.md

    Co-authored-by: Steven Liu <[email protected]>

    * Update docs/source/en/tasks/image_text_to_text.md

    Co-authored-by: Steven Liu <[email protected]>

    * Address comments

    * Fix heading

    * Update docs/source/en/tasks/image_text_to_text.md

    Co-authored-by: amyeroberts <[email protected]>

    * Update docs/source/en/tasks/image_text_to_text.md

    Co-authored-by: amyeroberts <[email protected]>

    * Update docs/source/en/tasks/image_text_to_text.md

    Co-authored-by: amyeroberts <[email protected]>

    * Update docs/source/en/tasks/image_text_to_text.md

    Co-authored-by: amyeroberts <[email protected]>

    * Update docs/source/en/tasks/image_text_to_text.md

    Co-authored-by: amyeroberts <[email protected]>

    * Update docs/source/en/tasks/image_text_to_text.md

    Co-authored-by: amyeroberts <[email protected]>

    * Address comments

    * Update image_text_to_text.md

    ---------

    Co-authored-by: Steven Liu <[email protected]>
    Co-authored-by: amyeroberts <[email protected]>

commit 4bd8f12972c6ad06e264baa39f17ec9dfa9a5cb2
Author: Merve Noyan <[email protected]>
Date:   Fri Jul 19 14:50:34 2024 +0300

    Fixes to chameleon docs (#32078)

    * Fixes

    * Let's not use auto

commit 566b0f1fbf5feb53a18591ca215a8d1245a790ef
Author: Keith Stevens <[email protected]>
Date:   Fri Jul 19 03:56:45 2024 -0700

    Fix progress callback deepcopy (#32070)

    * Replacing ProgressCallbacks deepcopy with a shallowcopy

    * Using items instead of entries

    * code cleanup for copy in trainer callback

    * Style fix for ProgressCallback

commit e316c5214fe51de0bf8e824245bfd6225c9925aa
Author: Raushan Turganbay <[email protected]>
Date:   Fri Jul 19 15:38:01 2024 +0500

    VideoLLaVa: fix chat format in docs (#32083)

    fix chat format

commit 22f888b3fab3d914882b8f44896a5658712f535c
Author: Joshua Lochner <[email protected]>
Date:   Fri Jul 19 11:19:35 2024 +0200

    [mistral] Fix FA2 attention reshape for Mistral Nemo (#32065)

    * [mistral] Fix FA2 attention reshape

    * [run-slow] mistral

commit cd48553fc8375e1a28d4d82cfe231dedf6a23af8
Author: Kamil Akesbi <[email protected]>
Date:   Fri Jul 19 09:26:38 2024 +0100

    Incorrect Whisper long-form decoding timestamps  (#32003)

    * fix lo form timestamps in decode_batch

    * Update src/transformers/models/whisper/tokenization_whisper.py

    Co-authored-by: Yoach Lacombe <[email protected]>

    * Update src/transformers/models/whisper/tokenization_whisper.py

    Co-authored-by: Yoach Lacombe <[email protected]>

    * add test

    * make style

    * fix copies

    * Update src/transformers/models/whisper/tokenization_whisper_fast.py

    Co-authored-by: amyeroberts <[email protected]>

    * Update src/transformers/models/whisper/tokenization_whisper.py

    Co-authored-by: amyeroberts <[email protected]>

    * Update src/transformers/models/whisper/processing_whisper.py

    Co-authored-by: amyeroberts <[email protected]>

    * Update src/transformers/models/whisper/tokenization_whisper.py

    Co-authored-by: amyeroberts <[email protected]>

    * apply review suggestions

    * fix

    * fix copies

    * fix

    * Update src/transformers/models/whisper/tokenization_whisper_fast.py

    Co-authored-by: amyeroberts <[email protected]>

    * fix-copies

    ---------

    Co-authored-by: Yoach Lacombe <[email protected]>
    Co-authored-by: amyeroberts <[email protected]>

commit 56a7745704261919dd8117e3a8aa4fb43fade30e
Author: NielsRogge <[email protected]>
Date:   Fri Jul 19 10:20:03 2024 +0200

    [Chameleon, Hiera] Improve docs (#32038)

    * Improve docs

    * Fix docs

    * Fix code snippet

commit b873234cb649a24865021f0d598627ce2b24d34a
Author: Raushan Turganbay <[email protected]>
Date:   Fri Jul 19 10:08:56 2024 +0500

    Llava: add default chat templates (#31691)

    * add default chat templates

    * Update src/transformers/models/llava/processing_llava.py

    Co-authored-by: amyeroberts <[email protected]>

    * Update src/transformers/models/llava_next/processing_llava_next.py

    Co-authored-by: amyeroberts <[email protected]>

    * more clear docstring and docs

    * Update docs/source/en/model_doc/llava.md

    Co-authored-by: NielsRogge <[email protected]>

    * Update docs/source/en/model_doc/llava_next.md

    Co-authored-by: NielsRogge <[email protected]>

    * Update docs/source/en/model_doc/vipllava.md

    Co-authored-by: NielsRogge <[email protected]>

    * add tests

    * remove default templates (see #31733)

    * load chat template from another file

    * Update docs/source/en/model_doc/llava_next.md

    Co-authored-by: amyeroberts <[email protected]>

    * revert some changes in docs

    * forgot vipllava

    * chat template file is not temporary hack

    * warn if loading from processor

    * not that file

    * similarly modify `save_pretrained`

    * Update tests/models/llava_next/test_processor_llava_next.py

    Co-authored-by: amyeroberts <[email protected]>

    * Update tests/models/vipllava/test_processor_vipllava.py

    Co-authored-by: amyeroberts <[email protected]>

    * Update docs/source/en/model_doc/vipllava.md

    Co-authored-by: amyeroberts <[email protected]>

    * Update src/transformers/processing_utils.py

    Co-authored-by: amyeroberts <[email protected]>

    * Update src/transformers/processing_utils.py

    Co-authored-by: amyeroberts <[email protected]>

    * Update docs/source/en/model_doc/vipllava.md

    Co-authored-by: amyeroberts <[email protected]>

    * Update docs/source/en/model_doc/llava.md

    Co-authored-by: amyeroberts <[email protected]>

    * Update docs/source/en/model_doc/llava.md

    Co-authored-by: amyeroberts <22614925+amyeroberts@use…
@stceum
Copy link

stceum commented Aug 7, 2024

Amazing contribution! 🎉 🎉 🎉 It helps me a lot!

Here is an example of the new api:

template = (
    "{% for message in messages %}"
    "{% if (message['role'] != 'assistant') %}"
    "{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}"
    "{% elif (message['role'] == 'assistant')%}"
    "{{'<|im_start|>' + message['role'] + '\n'}}"
    "{% generation %}"
    "{{message['content'] + '<|im_end|>'}}"
    "{% endgeneration %}"
    "{{'\n'}}"
    "{% endif %}"
    "{% endfor %}"
)
dummy_conversation = [
      {"role": "system", "content": "system message"},
      {"role": "user", "content": "user message"},
      {"role": "assistant", "content": "assistant\nmessage"},
      {"role": "user", "content": "user message 2"},
      {"role": "assistant", "content": "assistant message 2"},
]

output = tokenizer_r.apply_chat_template(
    dummy_conversations,
    chat_template=dummy_template,
    tokenize=True,
    return_assistant_mask=True,
    return_dict=True,
  )

labels = [output["input_ids"][index] if mask == 1 else -100 for index, mask in enumerate(output["assistant_mask"])]

Some spelling mistakes in this example:

output = tokenizer_r.apply_chat_template(
dummy_conversations,
chat_template=dummy_template,
tokenize=True,
return_assistant_mask=True,
return_dict=True,
)

dummy_conversation instead of dummy_conversations, template instead of dummy_template

labels = [output["input_ids"][index] if mask == 1 else -100 for index, mask in enumerate(output["assistant_mask"])]

assistant_masks instead of assistant_mask

@avicooper1
Copy link

Thank you for your work on this!

I'm having some issues though. When I run the example script from the tests, I don't seem to get any assistant tokens:

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3.1-8B-Instruct")

dummy_template = (
    "{% for message in messages %}"
    "{% if (message['role'] != 'assistant') %}"
    "{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}"
    "{% elif (message['role'] == 'assistant')%}"
    "{{'<|im_start|>' + message['role'] + '\n'}}"
    "{% generation %}"
    "{{message['content'] + '<|im_end|>'}}"
    "{% endgeneration %}"
    "{{'\n'}}"
    "{% endif %}"
    "{% endfor %}"
)
conversations = [
    [
        {"role": "system", "content": "system message"},
        {"role": "user", "content": "user message"},
        {"role": "assistant", "content": "start turn 1 assistant message. end turn 1"},
        {"role": "user", "content": "user message 2"},
        {"role": "assistant", "content": "start turn 2 assistant message. end turn 2"},
    ],
    [
        {"role": "system", "content": "system message 3"},
        {"role": "user", "content": "user message 3"},
        {"role": "assistant", "content": "start turn 3 assistant message. end turn 3"},
        {"role": "user", "content": "user message 4"},
        {"role": "assistant", "content": "start turn 4 assistant message. end turn 4"},
    ],
]

output = tokenizer.apply_chat_template(
    conversations[0],
    chat_template=dummy_template,
    tokenize=True,
    return_assistant_tokens_mask=True,
    return_dict=True,
)
print("".join(map(str, output["assistant_masks"])))

For me, this prints out 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

I think this bug is being caused by other tokens being printed out before the {% generation %}, within the same turn. For example, if I change the chat template to:

dummy_template = (
    "{% for message in messages %}"
    "{% if (message['role'] != 'assistant') %}"
    "{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}"
    "{% elif (message['role'] == 'assistant')%}"
    "{% generation %}"
    "{{'<|im_start|>' + message['role'] + '\n'}}"
    "{{message['content'] + '<|im_end|>'}}"
    "{% endgeneration %}"
    "{{'\n'}}"
    "{% endif %}"
    "{% endfor %}"
)

it works correctly, printing out 0000000000000000000000000000000011111111111111111111111110000000000000000001111111111111111111111111

I am running the latest version of transformers, 4.44.0

@yonigottesman
Copy link
Contributor Author

@avicooper1 there is a bug but its not about tokens before "generation" in the same turn. If you try a different tokenizer it will work.
There is something strange about the llama3 tokenizer (PreTrainedTokenizerFast) for some reason the char_to_token function isn't working as expected and my implementation is based on its result.
I opened an issue huggingface/tokenizers#1620.

@Boltzmachine
Copy link

Given the issue, is there a workaround to get the assistant mask?

@yonigottesman
Copy link
Contributor Author

sadly for llama3 i dont think so :(
other models that use the same tokenizer class PreTrainedTokenizerFast (but different config) do work for example tiiuae/falcon-mamba-7b-instruct. so i guess its something specific to the llama3 configuration

@psinger
Copy link

psinger commented Aug 12, 2024

Unfortunately the fact that the template needs to contain the {% generation %} part makes it very inflexible to use. Would it be somehow possible to just generate the mask base on the provided user assistant inputs?

@yonigottesman
Copy link
Contributor Author

@psinger there were several issues where some solutions were discussed but nothing was flexible enough as every model can have its own chat template and special tokens. see #28950 and #27609.
having a {% generation %} keyword was the only thing i could come up with, if you have any better idea that could be great ill be happy to try and implement it

@zjysteven
Copy link

I can also confirm that there's something strange about llama3's tokenizer such that it just can't work regardless of the template.

@Boltzmachine
Copy link

So in that case I should manually add {% generation %} for each tokenizer's template?

@thepowerfuldeez
Copy link

Hi! I could confirm this works on Mistral7B tokenizer, but doesn't work on any of the Llama tokenizers (tried 3 and 3.1 LlamaTokenizer).

@kwanUm
Copy link

kwanUm commented Oct 30, 2024

Same here, tried 3.2 model as well.

@yonigottesman
Copy link
Contributor Author

@thepowerfuldeez @kwanUm can you update the tokenizers package to latest and check? there was a fix huggingface/tokenizers#1640 than should have fixed this issue.

@kwanUm
Copy link

kwanUm commented Oct 31, 2024

Still doesn't work @yonigottesman, appereantly I need to add the {% generation %} tag manually? is there an existing fucntionality for it in hf or somewhere else?

Here's a reproduction

from transformers import AutoTokenizer

# Load the tokenizer for LLaMA 3.1
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.1-8B-Instruct")

# Define the conversation
conversation = [
    {"role": "system", "content": "blablabla"},
    {"role": "user", "content": "blablabla"},
    {"role": "assistant", "content": "blablabla"},
]



# Apply the chat template with tokenization and assistant mask
output = tokenizer.apply_chat_template(
    conversation,
    tokenize=True,
    return_assistant_tokens_mask=True,
    return_dict=True,
)

# Print the assistant_mask
print("With default chat template:")
print(output['assistant_masks'])

# Define the chat template with the {% generation %} tag
chat_template = (
    "{% for message in messages %}"
    "{% if message['role'] != 'assistant' %}"
    "{{ '<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>\n' }}"
    "{% elif message['role'] == 'assistant' %}"
    "{{ '<|im_start|>' + message['role'] + '\n' }}"
    "{% generation %}"
    "{{ message['content'] + '<|im_end|>\n' }}"
    "{% endgeneration %}"
    "{% endif %}"
    "{% endfor %}"
)

# Apply the chat template with tokenization and assistant mask
output = tokenizer.apply_chat_template(
    conversation,
    chat_template=chat_template,
    tokenize=True,
    return_assistant_tokens_mask=True,
    return_dict=True,
)


# Print the assistant_mask
print("With custom chat template containing the {% generation %} tag:")
print(output['assistant_masks'])

Output:

With default chat template:
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
With custom chat template containing the {% generation %} tag:
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]



pip list | grep -E "token|transfor|accele|sentence"

accelerate 1.0.1
asttokens 2.4.1
tiktoken 0.8.0
tokenizers 0.20.1
transformers 4.46.0

@yonigottesman
Copy link
Contributor Author

until tokenizers start adding the generation token to the tokenizer_config.json, you will have to add this manually

@ospanbatyr
Copy link

The following chat template is what I am using for Llama 3.2 1B and 3B models. While there are a few differences between Llama 3.1 and 3.2 chat templates, one can use this prompt as a reference point.

LLAMA32_CHAT_TEMPLATE = """{{- bos_token }}
{%- if custom_tools is defined %}
    {%- set tools = custom_tools %}
{%- endif %}
{%- if not tools_in_user_message is defined %}
    {%- set tools_in_user_message = true %}
{%- endif %}
{%- if not date_string is defined %}
    {%- if strftime_now is defined %}
        {%- set date_string = strftime_now("%d %b %Y") %}
    {%- else %}
        {%- set date_string = "26 Jul 2024" %}
    {%- endif %}
{%- endif %}
{%- if not tools is defined %}
    {%- set tools = none %}
{%- endif %}

{#- This block extracts the system message, so we can slot it into the right place. #}
{%- if messages[0]['role'] == 'system' %}
    {%- set system_message = messages[0]['content']|trim %}
    {%- set messages = messages[1:] %}
{%- else %}
    {%- set system_message = "" %}
{%- endif %}

{#- System message #}
{{- "<|start_header_id|>system<|end_header_id|>\n\n" }}
{%- if tools is not none %}
    {{- "Environment: ipython\n" }}
{%- endif %}
{{- "Cutting Knowledge Date: December 2023\n" }}
{{- "Today Date: " + date_string + "\n\n" }}
{%- if tools is not none and not tools_in_user_message %}
    {{- "You have access to the following functions. To call a function, please respond with JSON for a function call." }}
    {{- 'Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}.' }}
    {{- "Do not use variables.\n\n" }}
    {%- for t in tools %}
        {{- t | tojson(indent=4) }}
        {{- "\n\n" }}
    {%- endfor %}
{%- endif %}
{{- system_message }}
{{- "<|eot_id|>" }}

{#- Custom tools are passed in a user message with some extra guidance #}
{%- if tools_in_user_message and not tools is none %}
    {#- Extract the first user message so we can plug it in here #}
    {%- if messages | length != 0 %}
        {%- set first_user_message = messages[0]['content']|trim %}
        {%- set messages = messages[1:] %}
    {%- else %}
        {{- raise_exception("Cannot put tools in the first user message when there's no first user message!") }}
{%- endif %}
    {{- '<|start_header_id|>user<|end_header_id|>\n\n' -}}
    {{- "Given the following functions, please respond with a JSON for a function call " }}
    {{- "with its proper arguments that best answers the given prompt.\n\n" }}
    {{- 'Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}.' }}
    {{- "Do not use variables.\n\n" }}
    {%- for t in tools %}
        {{- t | tojson(indent=4) }}
        {{- "\n\n" }}
    {%- endfor %}
    {{- first_user_message + "<|eot_id|>"}}
{%- endif %}

{%- for message in messages %}
    {%- if not (message.role == 'ipython' or message.role == 'tool' or 'tool_calls' in message) %}
            {%- if message.role  != 'assistant' %}
                  {{- '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' }}
            {%- elif message.role  == 'assistant' %}
                  {{- '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'}}
                 {% generation %}
                 {{- message['content'] | trim + '<|eot_id|>' }}
                 {% endgeneration %}
           {%- endif %}
    {%- elif 'tool_calls' in message %}
        {%- if not message.tool_calls|length == 1 %}
            {{- raise_exception("This model only supports single tool-calls at once!") }}
        {%- endif %}
        {%- set tool_call = message.tool_calls[0].function %}
        {{- '<|start_header_id|>assistant<|end_header_id|>\n\n' -}}
        {{- '{"name": "' + tool_call.name + '", ' }}
        {{- '"parameters": ' }}
        {{- tool_call.arguments | tojson }}
        {{- "}" }}
        {{- "<|eot_id|>" }}
    {%- elif message.role == "tool" or message.role == "ipython" %}
        {{- "<|start_header_id|>ipython<|end_header_id|>\n\n" }}
        {%- if message.content is mapping or message.content is iterable %}
            {{- message.content | tojson }}
        {%- else %}
            {{- message.content }}
        {%- endif %}
        {{- "<|eot_id|>" }}
    {%- endif %}
{%- endfor %}
{%- if add_generation_prompt %}
    {{- '<|start_header_id|>assistant<|end_header_id|>\n\n' }}
{%- endif %}"""

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.