Update transformers requirement from <4.47,>=4.43.2 to >=4.43.2,<4.48 #1699

dependabot · 2024-12-16T01:06:56Z

Updates the requirements on transformers to permit the latest version.

Release notes

v4.47.0: PaliGemma-2, I-JEPA, OLMo-2, LayerSkip, Tensor Parallel

New models

PaliGemma-2

PaliGemma 2 and PaliGemma are lightweight open vision-language models (VLM) inspired by PaLI-3, and based on open components like the SigLIP vision model and the Gemma language model. PaliGemma takes both images and text as inputs and can answer questions about images with detail and context, meaning that PaliGemma can perform deeper analysis of images and provide useful insights, such as captioning for images and short videos, object detection, and reading text embedded within images.

PaliGemma 2 is available in 3B, 10B, and 28B parameter sizes, which are based on Gemma 2 2B, 9B, and 27B models, respectively. The original PaliGemma models are available in the 3B size. For more information on Gemma model variants, see the Gemma models list. PaliGemma model variants support different pixel resolutions for image inputs, including 224 x 224, 448 x 448, and 896 x 896 pixels.

I-JEPA

The I-JEPA model was proposed in Image-based Joint-Embedding Predictive Architecture by Mahmoud Assran, Quentin Duval, Ishan Misra, Piotr Bojanowski, Pascal Vincent, Michael Rabbat, Yann LeCun, Nicolas Ballas. I-JEPA is a self-supervised learning method that predicts the representations of one part of an image based on other parts of the same image. This approach focuses on learning semantic features without relying on pre-defined invariances from hand-crafted data transformations, which can bias specific tasks, or on filling in pixel-level details, which often leads to less meaningful representations.

Add I-JEPA by @jmtzt in #33125

OLMo 2

The OLMo2 model is the successor of the OLMo model, which was proposed in OLMo: Accelerating the Science of Language Models.

The architectural changes from the original OLMo model to this model are:

RMSNorm is used instead of standard layer norm.

Norm is applied to attention queries and keys.

Norm is applied after attention/feedforward layers rather than before.

Commits:

Add OLMo November 2024 by @2015aroras in #34551

Rename OLMo November to OLMo2 by @2015aroras in #34864

Layer-Skip Llama

We add support for Meta's Layer-Skip Llama 3.2 1B model.

The Llama3.2 1B model was continually pretrained with LayerSkip recipe, early exit loss and layer dropout, as presented in Layer Skip: Enabling Early Exit Inference and Self-Speculative Decoding and is capable of performing self-speculative decoding: decode with earlier layers and verify with remaining layers.

Self-speculation (Layer-Skip Llama) by @ArthurZucker in #34240

Tensor Parallel implementation

This PR uses the torch.distributed.tensor.parallel subpackage to implement Tensor Parallel for Llama (as an example).

... (truncated)

Commits

5d7739f Release: v4.47.0
a5bb528 Fix signatures for processing kwargs (#35105)
e27465c Adaptive dynamic number of speculative tokens (#34156)
b0a51e5 Fix flaky Hub CI (test_trainer.py) (#35062)
a928d9c [trainer] fix the GA model_accepts_loss_kwargs (#34915)
e682c17 BLIP: this is correct now (#35081)
50189e3 Add I-JEPA (#33125)
95a855e Deprecate quanto and switch to optimum-quanto (#35001)
482cb28 Fix tie_word_embeddings handling for GGUF models (#35085)
3544705 Update Mistral conversion script (#34829)
Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR
@dependabot recreate will recreate this PR, overwriting any edits that have been made to it
@dependabot merge will merge this PR after your CI passes on it
@dependabot squash and merge will squash and merge this PR after your CI passes on it
@dependabot cancel merge will cancel a previously requested merge and block automerging
@dependabot reopen will reopen this PR if it is closed
@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
@dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Updates the requirements on [transformers](https://github.com/huggingface/transformers) to permit the latest version. - [Release notes](https://github.com/huggingface/transformers/releases) - [Commits](huggingface/transformers@v4.43.2...v4.47.0) --- updated-dependencies: - dependency-name: transformers dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]>

dependabot bot requested a review from a team as a code owner December 16, 2024 01:06

dependabot bot added the dependencies Pull requests that update a dependency file label Dec 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update transformers requirement from <4.47,>=4.43.2 to >=4.43.2,<4.48 #1699

Update transformers requirement from <4.47,>=4.43.2 to >=4.43.2,<4.48 #1699

dependabot bot commented on behalf of github Dec 16, 2024

Update transformers requirement from <4.47,>=4.43.2 to >=4.43.2,<4.48 #1699

Are you sure you want to change the base?

Update transformers requirement from <4.47,>=4.43.2 to >=4.43.2,<4.48 #1699

Conversation

dependabot bot commented on behalf of github Dec 16, 2024

v4.47.0: PaliGemma-2, I-JEPA, OLMo-2, LayerSkip, Tensor Parallel

New models

PaliGemma-2

I-JEPA

OLMo 2

Layer-Skip Llama

Tensor Parallel implementation