Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Kosmos-2.5 #31711

Open
wants to merge 283 commits into
base: main
Choose a base branch
from
Open

Support Kosmos-2.5 #31711

wants to merge 283 commits into from

Conversation

tic-top
Copy link

@tic-top tic-top commented Jun 29, 2024

What does this PR do?

#30877 Implementation of Kosmos-2.5 in transformers.
https://huggingface.co/kirp/kosmos2_5/blob/main/README.md

Usage

from PIL import Image
import requests
import torch
from transformers import AutoProcessor, AutoModelForVision2Seq, AutoConfig
import re

repo = "kirp/kosmos2_5"
device = "cuda:0"
config = AutoConfig.from_pretrained(repo)

NAME = {
    "f" : "flash_attention_2",
    "s" : "sdpa",
    "e" : "eager",
}

# all sdpa fp16
dtype = torch.float16
config._attn_implementation = NAME["s"]
config.vision_config._attn_implementation = NAME["s"]
config.text_config._attn_implementation = NAME["s"]

# # all sdpa fp16
# dtype = torch.float16
# config._attn_implementation = NAME["s"]
# config.text_config._attn_implementation = NAME["s"]
# config.vision_config._attn_implementation = NAME["s"]

# # all eager bf16
# dtype = torch.bfloat16
# config._attn_implementation = NAME["e"]
# config.text_config._attn_implementation = NAME["e"]
# config.vision_config._attn_implementation = NAME["e"]


model = AutoModelForVision2Seq.from_pretrained(repo, device_map = device, torch_dtype=dtype, config=config)
processor = AutoProcessor.from_pretrained(repo)

url = "https://huggingface.co/kirp/kosmos2_5/resolve/main/receipt_00008.png"
image = Image.open(requests.get(url, stream=True).raw)
prompt = "<ocr>" # <md>

inputs = processor(text=prompt, images=image, return_tensors="pt")
height, width = inputs.pop("height"), inputs.pop("width")
raw_width, raw_height = image.size
scale_height = raw_height / height
scale_width = raw_width / width

inputs = {k: v.to(device) if v is not None else None for k, v in inputs.items()}
inputs["flattened_patches"] = inputs["flattened_patches"].to(dtype)

generated_ids = model.generate(
    **inputs,
    max_new_tokens=1024,
)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)

def postprocess(y, scale_height, scale_width):
    y = y.replace(prompt, "")
    if "<md>" in prompt:
        return y
    pattern = r"<bbox><x_\d+><y_\d+><x_\d+><y_\d+></bbox>"
    bboxs_raw = re.findall(pattern, y)
    lines = re.split(pattern, y)[1:]
    bboxs = [re.findall(r"\d+", i) for i in bboxs_raw]
    bboxs = [[int(j) for j in i] for i in bboxs]
    info = ""
    for i in range(len(lines)):
        box = bboxs[i]
        x0, y0, x1, y1 = box
        if not (x0 >= x1 or y0 >= y1):
            x0 = int(x0 * scale_width)
            y0 = int(y0 * scale_height)
            x1 = int(x1 * scale_width)
            y1 = int(y1 * scale_height)
            info += f"{x0},{y0},{x1},{y0},{x1},{y1},{x0},{y1},{lines[i]}"
    return info

output_text = postprocess(generated_text[0], scale_height, scale_width)
print(output_text)

@amyeroberts
Copy link
Collaborator

cc @ydshieh

@ydshieh ydshieh self-assigned this Jul 1, 2024
Copy link
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot! ❤️ Just a few more things that can be removed now but looks nice overall to me. Looking forward for changes to accommodate generate() from mixin and should be okay

Comment on lines 1505 to 1514
return_legacy_cache = False
if (
use_cache and not isinstance(past_key_values, Cache) and not self.training
): # kept for BC (non `Cache` `past_key_values` inputs)
return_legacy_cache = True
past_key_values = DynamicCache.from_legacy_cache(past_key_values)
logger.warning_once(
"We detected that you are passing `past_key_values` as a tuple and this is deprecated and will be removed in v4.43. "
"Please use an appropriate `Cache` class (https://huggingface.co/docs/transformers/v4.41.3/en/internal/generation_utils#transformers.Cache)"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm i've been removing deprecation for new models. Oke, if core maintainer agrees we can leave but maybe change the deprecation version to around v4.50 or smth?

Comment on lines 79 to 84
self.boi = tokenizer.convert_tokens_to_ids("<image>")
self.eoi = tokenizer.convert_tokens_to_ids("</image>")
self.pad = tokenizer.convert_tokens_to_ids("<pad>")
self.bos = tokenizer.convert_tokens_to_ids("<s>")
self.eos = tokenizer.convert_tokens_to_ids("</s>")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: these can be saved as special tokens in tokenizer so you can do tokenizer.boi_token_id instead of hardcoding the string of each token. Not super necessary, simply FYI :)

for that when converting you need to add

# bos/eos/pad tokens are already in tokenizer so no need to add them
extra_special_tokens = {"boi_token": "<image>", "eoi_token": </image>}
tokenizer = AutoTokenizer.from_pretrained(model_id, extra_special_tokens=extra_special_tokens)

@ydshieh
Copy link
Collaborator

ydshieh commented Dec 17, 2024

Hi @zucchini-nlp Great reviews! Could you check the changes shown below? I think all comments addressed.

(and also the removed generate)

https://github.com/tic-top/transformers/compare/0ec499a841b35ce9c77e072362a09a20a287a2f5..8fc9699655c5b80b0c5cbb1fadfc4837daf1ad90

Copy link
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for iterating and removing the overwritten generate()!

@ydshieh
Copy link
Collaborator

ydshieh commented Dec 18, 2024

run slow

Copy link

This comment contains run-slow, running the specified jobs: ['models/kosmos2_5'] ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants