Add CogVLM #27718

NielsRogge · 2023-11-27T08:16:56Z

What does this PR do?

This PR adds CogVLM natively into the Transformers library (it's already usable with trust_remote_code=True, but with this PR one can run it without the xformers, einops and triton dependencies).

To do:

remove triton dependency for rotary embeddings (or make FastRotaryEmbedding optional)
wait for Make image processors more general #27690 to be merged
decide on attributes to be saved for multimodal processors: see Save Processor #27761

NielsRogge · 2023-11-28T12:42:13Z

src/transformers/models/cogvlm/processing_cogvlm.py

+    def __init__(self, image_processor, tokenizer, image_size, patch_size):
+        super().__init__(image_processor, tokenizer)
+        self.image_size = image_size
+        self.patch_size = patch_size


cc @ydshieh for this model, I need to store 2 attributes to the processor, however we currently don't have a processor_config.json file. Can we add support for this in from_pretrained and save_pretrained?

NielsRogge · 2023-11-28T12:43:32Z

src/transformers/models/cogvlm/processing_cogvlm.py

+    def __init__(self, image_processor, tokenizer, image_size, patch_size):
+        super().__init__(image_processor, tokenizer)
+        self.image_size = image_size
+        self.patch_size = patch_size


cc @ydshieh for this model, I need to store 2 attributes to the processor, however we currently don't have a processor_config.json file. Can we add support for this in from_pretrained and save_pretrained?

NielsRogge · 2023-12-11T19:27:41Z

A cleaner implementation I'm working on is here: https://github.com/NielsRogge/transformers/tree/add_cogvlm_cleaner. It implements the model like llava, by adding the image tokens inside the model, rather than creating them in the processor class.

NielsRogge · 2023-12-22T08:38:17Z

Closing this one in favor of the PR above.

NielsRogge mentioned this pull request Nov 28, 2023

Make image processors more general #27690

Merged

NielsRogge commented Nov 28, 2023

View reviewed changes

NielsRogge added 25 commits December 6, 2023 11:47

First draft

1e16091

Improve conversion script

c75720b

More improvements

ed527a0

More improvements

e633dca

Add config attributes, improve conversion script

988d430

Make conversion work

79cd06c

Rename images to pixel_values

8be1ded

Add processor

202fcc2

Remove einops dependency

b76b1b9

Remove xformers dependency

185151d

Improve vision config

1b4de2a

Update test

98d47a2

Fix more tests, update conversion script

4f1aa8b

Fix more tests

17581cc

Fix more tests, add docstrings

d10cbca

Improve variable names, docstrings

5efde22

Improve more variable names

7ddd120

Leverage _prepare_4d_causal_attention_mask

4071e89

Rename classes

e6bd4ed

Remove script

a80529f

Update README and docs

38ed9bf

Use native torch rotary embeddings

79f981d

Remove triton dependency

2ea6b18

Remove file

7f1e274

Make fixup

d3c5fc3

NielsRogge force-pushed the add_cogvlm branch from fc83063 to d3c5fc3 Compare December 6, 2023 10:56

Make fixup

456a439

Merge branch 'main' into add_cogvlm

3410c80

NielsRogge added 2 commits December 16, 2023 17:49

More improvements

cc5e3ad

Add print statements

2cba884

ydshieh mentioned this pull request Dec 19, 2023

Save Processor #27761

Merged

Debug more

1cfabaf

NielsRogge mentioned this pull request Dec 22, 2023

Add CogVLM (cleaner) #28196

Closed

5 tasks

NielsRogge closed this Dec 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add CogVLM #27718

Add CogVLM #27718

NielsRogge commented Nov 27, 2023 •

edited

Loading

NielsRogge Nov 28, 2023

NielsRogge Nov 28, 2023

NielsRogge commented Dec 11, 2023

NielsRogge commented Dec 22, 2023

Add CogVLM #27718

Add CogVLM #27718

Conversation

NielsRogge commented Nov 27, 2023 • edited Loading

What does this PR do?

NielsRogge Nov 28, 2023

Choose a reason for hiding this comment

NielsRogge Nov 28, 2023

Choose a reason for hiding this comment

NielsRogge commented Dec 11, 2023

NielsRogge commented Dec 22, 2023

NielsRogge commented Nov 27, 2023 •

edited

Loading