fix pixtral processor #34486

molbap · 2024-10-29T10:50:00Z

What does this PR do?

Should fix #34204 .

HuggingFaceDocBuilderDev · 2024-10-29T11:16:17Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

molbap · 2024-10-29T16:29:32Z

Tests (nonslow, just launched the slow ones) are green - there was a mismatch in the tests between images per batch and images per sample which I think I fiixed, added a test as well for previous issue. cc @ArthurZucker for review

molbap · 2024-10-29T19:29:15Z

all green incl. slow now - fixed slow tests that were broken before (well one fix for a wrong config key at init, one that actually can't work, and a skip for torchscript) lmk!

ArthurZucker

Thanks, LGTM let's not revert integration test

ArthurZucker · 2024-10-30T07:07:50Z

tests/models/pixtral/test_modeling_pixtral.py

-
-
-@require_torch
-class PixtralVisionModelIntegrationTest(unittest.TestCase):


why did this one have to go away ? 😓

many reasons, it was looking up an image url that did not exist (pixtral-vl instead of llava-vl) to predict something that pixtral model does not predict, also it was testing VisionModel that does not support generate

Yep it needs to be in the Llava tests! Can you move it around 👀

transformers/tests/models/llava/test_modeling_llava.py

Lines 620 to 663 in 913330c

def test_pixtral(self):

model_id = "hf-internal-testing/pixtral-12b"

model = LlavaForConditionalGeneration.from_pretrained(model_id)

processor = AutoProcessor.from_pretrained(model_id)

IMG_URLS = [

Image.open(requests.get("https://picsum.photos/id/237/400/300", stream=True).raw),

Image.open(requests.get("https://picsum.photos/id/231/200/300", stream=True).raw),

Image.open(requests.get("https://picsum.photos/id/27/500/500", stream=True).raw),

Image.open(requests.get("https://picsum.photos/id/17/150/600", stream=True).raw),

]

PROMPT = "<s>[INST]Describe the images.\n[IMG][IMG][IMG][IMG][/INST]"

# image = Image.open(requests.get(url, stream=True).raw)

inputs = processor(text=PROMPT, images=IMG_URLS, return_tensors="pt").to("cuda")

generate_ids = model.generate(**inputs, max_new_tokens=500)

ouptut = processor.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]

# fmt: off

EXPECTED_GENERATION = """

Describe the images.

Sure, let's break down each image description:

1. **Image 1:**

- **Description:** A black dog with a glossy coat is sitting on a wooden floor. The dog has a focused expression and is looking directly at the camera.

- **Details:** The wooden floor has a rustic appearance with visible wood grain patterns. The dog's eyes are a striking color, possibly brown or amber, which contrasts with its black fur.

2. **Image 2:**

- **Description:** A scenic view of a mountainous landscape with a winding road cutting through it. The road is surrounded by lush green vegetation and leads to a distant valley.

- **Details:** The mountains are rugged with steep slopes, and the sky is clear, indicating good weather. The winding road adds a sense of depth and perspective to the image.

3. **Image 3:**

- **Description:** A beach scene with waves crashing against the shore. There are several people in the water and on the beach, enjoying the waves and the sunset.

- **Details:** The waves are powerful, creating a dynamic and lively atmosphere. The sky is painted with hues of orange and pink from the setting sun, adding a warm glow to the scene.

4. **Image 4:**

- **Description:** A garden path leading to a large tree with a bench underneath it. The path is bordered by well-maintained grass and flowers.

- **Details:** The path is made of small stones or gravel, and the tree provides a shaded area with the bench invitingly placed beneath it. The surrounding area is lush and green, suggesting a well-kept garden.

Each image captures a different scene, from a close-up of a dog to expansive natural landscapes, showcasing various elements of nature and human interaction with it.

"""

# fmt: on

# check that both inputs are handled correctly and generate the same output

self.assertListEqual(ouptut, EXPECTED_GENERATION)

looks like pixtral is already tested under llava tests properly - since it's a llava model it's enough to test it there, no?

ah then good job and sorry!

no worries- it's hard to see when tests are measuring a model through another one, but it's convenient on our side. I can move all pixtral related tests to pixtral testing so we don't forget it, wdyt? It's a bit more loc but it's easier to track down

ArthurZucker

Good to go ! Thanks 🤗

* fix pixtral processor * test out full length batches + remove undue ValueError * fix up processing * fix tests * fix * last fixup * style * [run-slow] pixtral * [run-slow] pixtral * fix config key * skip torchscript tests * [run-slow] pixtral * add missing key * [run-slow] pixtral * fix docs * [run-slow] pixtral * fix wrong url for integration test * [run-slow] pixtral * pixtralVisionModel does not have a lm head * [run-slow] pixtral

molbap added 2 commits October 29, 2024 11:48

fix pixtral processor

c3407ad

test out full length batches + remove undue ValueError

c2f4d58

ArthurZucker mentioned this pull request Oct 29, 2024

Update processing_pixtral.py #34451

Open

fix up processing

bd6688b

molbap requested a review from ArthurZucker October 29, 2024 14:32

molbap added 5 commits October 29, 2024 15:33

fix tests

5d945f0

fix

9a79e48

last fixup

b464296

style

8144685

[run-slow] pixtral

2d0f6c8

molbap added the run-slow label Oct 29, 2024

Merge branch 'main' into fix-pixtral-processor-2

08290d5

molbap added 12 commits October 29, 2024 17:35

[run-slow] pixtral

d03fac0

fix config key

a890ab7

skip torchscript tests

57b7c06

[run-slow] pixtral

bbfab74

add missing key

3ec54d5

[run-slow] pixtral

2cb22dc

fix docs

032e4b4

[run-slow] pixtral

784a36e

fix wrong url for integration test

78e5a17

[run-slow] pixtral

01c1ca1

pixtralVisionModel does not have a lm head

a72d1cb

[run-slow] pixtral

cd71c40

ArthurZucker reviewed Oct 30, 2024

View reviewed changes

molbap requested a review from ArthurZucker October 30, 2024 12:09

molbap mentioned this pull request Oct 30, 2024

Fix PixtralProcessor to return input IDs for all prompts and images in batch #34491

Open

ArthurZucker approved these changes Oct 30, 2024

View reviewed changes

molbap merged commit 241d790 into main Oct 30, 2024
20 checks passed

molbap deleted the fix-pixtral-processor-2 branch October 30, 2024 13:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix pixtral processor #34486

fix pixtral processor #34486

molbap commented Oct 29, 2024

HuggingFaceDocBuilderDev commented Oct 29, 2024

molbap commented Oct 29, 2024 •

edited

Loading

molbap commented Oct 29, 2024

ArthurZucker left a comment

ArthurZucker Oct 30, 2024

molbap Oct 30, 2024

ArthurZucker Oct 30, 2024

molbap Oct 30, 2024

molbap Oct 30, 2024

ArthurZucker Oct 30, 2024

molbap Oct 30, 2024

ArthurZucker left a comment



		@require_torch
		class PixtralVisionModelIntegrationTest(unittest.TestCase):

	def test_pixtral(self):
	model_id = "hf-internal-testing/pixtral-12b"
	model = LlavaForConditionalGeneration.from_pretrained(model_id)
	processor = AutoProcessor.from_pretrained(model_id)

	IMG_URLS = [
	Image.open(requests.get("https://picsum.photos/id/237/400/300", stream=True).raw),
	Image.open(requests.get("https://picsum.photos/id/231/200/300", stream=True).raw),
	Image.open(requests.get("https://picsum.photos/id/27/500/500", stream=True).raw),
	Image.open(requests.get("https://picsum.photos/id/17/150/600", stream=True).raw),
	]
	PROMPT = "<s>[INST]Describe the images.\n[IMG][IMG][IMG][IMG][/INST]"

	# image = Image.open(requests.get(url, stream=True).raw)
	inputs = processor(text=PROMPT, images=IMG_URLS, return_tensors="pt").to("cuda")
	generate_ids = model.generate(**inputs, max_new_tokens=500)
	ouptut = processor.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]

	# fmt: off
	EXPECTED_GENERATION = """
	Describe the images.
	Sure, let's break down each image description:

	1. Image 1:
	- Description: A black dog with a glossy coat is sitting on a wooden floor. The dog has a focused expression and is looking directly at the camera.
	- Details: The wooden floor has a rustic appearance with visible wood grain patterns. The dog's eyes are a striking color, possibly brown or amber, which contrasts with its black fur.

	2. Image 2:
	- Description: A scenic view of a mountainous landscape with a winding road cutting through it. The road is surrounded by lush green vegetation and leads to a distant valley.
	- Details: The mountains are rugged with steep slopes, and the sky is clear, indicating good weather. The winding road adds a sense of depth and perspective to the image.

	3. Image 3:
	- Description: A beach scene with waves crashing against the shore. There are several people in the water and on the beach, enjoying the waves and the sunset.
	- Details: The waves are powerful, creating a dynamic and lively atmosphere. The sky is painted with hues of orange and pink from the setting sun, adding a warm glow to the scene.

	4. Image 4:
	- Description: A garden path leading to a large tree with a bench underneath it. The path is bordered by well-maintained grass and flowers.
	- Details: The path is made of small stones or gravel, and the tree provides a shaded area with the bench invitingly placed beneath it. The surrounding area is lush and green, suggesting a well-kept garden.

	Each image captures a different scene, from a close-up of a dog to expansive natural landscapes, showcasing various elements of nature and human interaction with it.
	"""
	# fmt: on
	# check that both inputs are handled correctly and generate the same output
	self.assertListEqual(ouptut, EXPECTED_GENERATION)

fix pixtral processor #34486

fix pixtral processor #34486

Conversation

molbap commented Oct 29, 2024

What does this PR do?

HuggingFaceDocBuilderDev commented Oct 29, 2024

molbap commented Oct 29, 2024 • edited Loading

molbap commented Oct 29, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

ArthurZucker Oct 30, 2024

Choose a reason for hiding this comment

molbap Oct 30, 2024

Choose a reason for hiding this comment

ArthurZucker Oct 30, 2024

Choose a reason for hiding this comment

molbap Oct 30, 2024

Choose a reason for hiding this comment

molbap Oct 30, 2024

Choose a reason for hiding this comment

ArthurZucker Oct 30, 2024

Choose a reason for hiding this comment

molbap Oct 30, 2024

Choose a reason for hiding this comment

ArthurZucker left a comment

Choose a reason for hiding this comment

molbap commented Oct 29, 2024 •

edited

Loading