Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(deps): update dependency lm-eval to v0.4.5 #241

Closed
wants to merge 1 commit into from

Conversation

red-hat-konflux[bot]
Copy link

@red-hat-konflux red-hat-konflux bot commented Nov 23, 2024

This PR contains the following updates:

Package Update Change
lm-eval patch ==0.4.4 -> ==0.4.5

Release Notes

EleutherAI/lm-evaluation-harness (lm-eval)

v0.4.5

Compare Source

lm-eval v0.4.5 Release Notes

New Additions

Prototype Support for Vision Language Models (VLMs)

We're excited to introduce prototype support for Vision Language Models (VLMs) in this release, using model types hf-multimodal and vllm-vlm. This allows for evaluation of models that can process text and image inputs and produce text outputs. Currently we have added support for the MMMU (mmmu_val) task and we welcome contributions and feedback from the community!

New VLM-Specific Arguments

VLM models can be configured with several new arguments within --model_args to support their specific requirements:

  • max_images (int): Set the maximum number of images for each prompt.
  • interleave (bool): Determines the positioning of image inputs. When True (default) images are interleaved with the text. When False all images are placed at the front of the text. This is model dependent.

hf-multimodal specific args:

  • image_token_id (int) or image_string (str): Specifies a custom token or string for image placeholders. For example, Llava models expect an "<image>" string to indicate the location of images in the input, while Qwen2-VL models expect an "<|image_pad|>" sentinel string instead. This will be inferred based on model configuration files whenever possible, but we recommend confirming that an override is needed when testing a new model family
  • convert_img_format (bool): Whether to convert the images to RGB format.
Example usage:
  • lm_eval --model hf-multimodal --model_args pretrained=llava-hf/llava-1.5-7b-hf,attn_implementation=flash_attention_2,max_images=1,interleave=True,image_string=<image> --tasks mmmu_val --apply_chat_template

  • lm_eval --model vllm-vlm --model_args pretrained=llava-hf/llava-1.5-7b-hf,max_images=1,interleave=True --tasks mmmu_val --apply_chat_template

Important considerations
  1. Chat Template: Most VLMs require the --apply_chat_template flag to ensure proper input formatting according to the model's expected chat template.
  2. Some VLM models are limited to processing a single image per prompt. For these models, always set max_images=1. Additionally, certain models expect image placeholders to be non-interleaved with the text, requiring interleave=False.
  3. Performance and Compatibility: When working with VLMs, be mindful of potential memory constraints and processing times, especially when handling multiple images or complex tasks.
Tested VLM Models

We have currently most notably tested the implementation with the following models:

  • llava-hf/llava-1.5-7b-hf
  • llava-hf/llava-v1.6-mistral-7b-hf
  • Qwen/Qwen2-VL-2B-Instruct
  • HuggingFaceM4/idefics2 (requires the latest transformers from source)

New Tasks

Several new tasks have been contributed to the library for this version!

New tasks as of v0.4.5 include:

As well as several slight fixes or changes to existing tasks (as noted via the incrementing of versions).

Backwards Incompatibilities

Finalizing group versus tag split

We've now fully deprecated the use of group keys directly within a task's configuration file. The appropriate key to use is now solely tag for many cases. See the v0.4.4 patchnotes for more info on migration, if you have a set of task YAMLs maintained outside the Eval Harness repository.

Handling of Causal vs. Seq2seq backend in HFLM

In HFLM, logic specific to handling inputs for Seq2seq (encoder-decoder models like T5) versus Causal (decoder-only autoregressive models, and the vast majority of current LMs) models previously hinged on a check for self.AUTO_MODEL_CLASS == transformers.AutoModelForCausalLM. Some users may want to use causal model behavior, but set self.AUTO_MODEL_CLASS to a different factory class, such as transformers.AutoModelForVision2Seq.

As a result, those users who subclass HFLM but do not call HFLM.__init__() may now also need to set the self.backend attribute to either "causal" or "seq2seq" during initialization themselves.

While this should not affect a large majority of users, for those who subclass HFLM in potentially advanced ways, see https://github.com/EleutherAI/lm-evaluation-harness/pull/2353 for the full set of changes.

Future Plans

We intend to further expand our multimodal support to a wider set of vision-language tasks, as well as a broader set of model types, and are actively seeking user feedback!

Thanks, the LM Eval Harness team (@​baberabb @​haileyschoelkopf @​lintangsutawika)

What's Changed

New Contributors

Full Changelog: EleutherAI/lm-evaluation-harness@v0.4.4...v0.4.5


Configuration

📅 Schedule: Branch creation - "after 5am on saturday" (UTC), Automerge - At any time (no schedule defined).

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.


  • If you want to rebase/retry this PR, check this box

This PR has been generated by Renovate Bot.

Copy link

openshift-ci bot commented Nov 23, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: red-hat-konflux[bot]
Once this PR has been reviewed and has the lgtm label, please assign dtrifiro for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

openshift-ci bot commented Nov 23, 2024

Hi @red-hat-konflux[bot]. Thanks for your PR.

I'm waiting for a opendatahub-io member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Signed-off-by: red-hat-konflux <126015336+red-hat-konflux[bot]@users.noreply.github.com>
@red-hat-konflux red-hat-konflux bot force-pushed the konflux/mintmaker/main/lm-eval-0.x branch from 24ef480 to 08fbc46 Compare November 30, 2024 08:13
@dtrifiro dtrifiro closed this Dec 10, 2024
Copy link
Author

Renovate Ignore Notification

Because you closed this PR without merging, Renovate will ignore this update (==0.4.5). You will get a PR once a newer version is released. To ignore this dependency forever, add it to the ignoreDeps array of your Renovate config.

If you accidentally closed this PR, or if you changed your mind: rename this PR to get a fresh replacement PR.

@red-hat-konflux red-hat-konflux bot deleted the konflux/mintmaker/main/lm-eval-0.x branch December 10, 2024 12:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant