Fuyu Multi-image interleaved processor #27587

cliangyu · 2023-11-19T09:30:23Z

What does this PR do?

Fuyu Multi-image interleaved processor. Test example:

from transformers import FuyuProcessor, FuyuForCausalLM
from PIL import Image
import requests
import torch

# load model and processor
model_id = "adept/fuyu-8b"
processor = FuyuProcessor.from_pretrained(model_id)
model = FuyuForCausalLM.from_pretrained(model_id, device_map="cuda:0", torch_dtype=torch.bfloat16)

def convert(list_of_dicts):# Convert to a dictionary of lists
    dict_of_lists = {}
    for d in list_of_dicts:
        for key, value in d.items():
            if key not in dict_of_lists:
                dict_of_lists[key] = []
            dict_of_lists[key].append(value)
    return dict_of_lists

text_prompt1 = "|IMAGESTART| Generate a coco-style caption. |IMAGESTART| Be reminded that the caption should be longer than 2000 words but shorter than 1 million words. \n"
url1 = "https://huggingface.co/adept/fuyu-8b/resolve/main/bus.png"
image1 = Image.open(requests.get(url1, stream=True).raw)

text_prompt2 = "What doesn this chart describe?\n"
url2 = "https://huggingface.co/adept/fuyu-8b/resolve/main/chart.png"
image2 = Image.open(requests.get(url2, stream=True).raw)

test_examples = [
    # {"text": "|IMAGESTART| Generate a coco-style caption. |IMAGESTART| Be reminded that the caption should be longer than 2000 words but shorter than 1 million words. \n", "images": image1}, # should assert error
    {"text": text_prompt1, "images": [image1, image2]}, # normal
    {"text": text_prompt2, "images": [image2 for i in range(40)]}, # should add indicator
    {"text": "|IMAGESTART||IMAGESTART| Generate a coco-style caption. Be reminded that the caption should be longer than 2000 words but shorter than 1 million words. \n", "images": [image1, image2]}, # normal
    {"text": " Generate a coco-style caption. Be reminded that the caption should be longer than 2000 words but shorter than 1 million words. \n|IMAGESTART||IMAGESTART|", "images": [image1, image2]}, # normal
    # {"text": " Generate a coco-style caption. Be reminded that the caption should be longer than 2000 words but shorter than 1 million words.", "images": None}, # no image, we had error with this case
    {"text": None, "images": [image1]}, # no text
    
]
inputs_to_model = processor(**convert(test_examples), return_tensors="pt", truncation=True).to("cuda:0")

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@amyeroberts, @ArthurZucker and @younesbelkada

ArthurZucker · 2023-11-20T07:24:44Z

Hey! Feel free to ping @molbap once this is ready for review and cis are green or me if you need help with the CIs!

cliangyu · 2023-12-03T12:08:48Z

Hi Seungyoun, Thanks for working on this. I suggest integrating the changes with mine first. Thank you! Best regards, Liangyu Chen MMLab, School of Computer Science and Engineering, Nanyang Technological University P: 65-82811955 | E: ***@***.*** https://cliangyu.com/

…

On Sat, 2 Dec 2023 at 19:41, Seungyoun, Shin ***@***.***> wrote: Hi @cliangyu <https://github.com/cliangyu>, I've developed enhancements for multi-device support in PR #27587 <#27587>, building upon your work. Before proceeding with a new PR, I'd like to discuss integrating these changes with yours. I can submit a PR to your fork or detail the changes here for your review. Please let me know your preferred approach. — Reply to this email directly, view it on GitHub <#27587 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AKYMSERPXSTHWHLG37KE6HDYHMHU3AVCNFSM6AAAAAA7RTLUBOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMZXGEZDOOBXGM> . You are receiving this because you authored the thread.Message ID: ***@***.***>

ArthurZucker

btw, let's not push the .gitignore, and it might be good to think about processing in the modeling code like what we are doing in #27662 ! 🤗 cc @molbap we can benchmark anyway before anything

github-actions · 2024-03-20T08:06:15Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

cliangyu and others added 6 commits November 17, 2023 21:29

dev_interleaved

d386ab1

added more comment to debug

4369a5e

fuyu interleaved dev finished

6f1f7ae

fix model inference

3c2f2c6

truncation

56f9bb5

test and debug for image-text and pure image input

6870cf8

complete fuyu interleaved input

c6b992a

ArthurZucker reviewed Dec 5, 2023

View reviewed changes

huggingface deleted a comment from github-actions bot Jan 3, 2024

huggingface deleted a comment from github-actions bot Jan 28, 2024

huggingface deleted a comment from github-actions bot Feb 23, 2024

github-actions bot closed this Mar 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fuyu Multi-image interleaved processor #27587

Fuyu Multi-image interleaved processor #27587

cliangyu commented Nov 19, 2023 •

edited

Loading

ArthurZucker commented Nov 20, 2023

cliangyu commented Dec 3, 2023 via email

ArthurZucker left a comment

github-actions bot commented Mar 20, 2024

Fuyu Multi-image interleaved processor #27587

Fuyu Multi-image interleaved processor #27587

Conversation

cliangyu commented Nov 19, 2023 • edited Loading

What does this PR do?

Before submitting

Who can review?

ArthurZucker commented Nov 20, 2023

cliangyu commented Dec 3, 2023 via email

ArthurZucker left a comment

Choose a reason for hiding this comment

github-actions bot commented Mar 20, 2024

cliangyu commented Nov 19, 2023 •

edited

Loading