Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CogVLM (cleaner) #28196

Closed
wants to merge 129 commits into from
Closed
Show file tree
Hide file tree
Changes from 125 commits
Commits
Show all changes
129 commits
Select commit Hold shift + click to select a range
1e16091
First draft
NielsRogge Nov 23, 2023
c75720b
Improve conversion script
NielsRogge Nov 23, 2023
ed527a0
More improvements
NielsRogge Nov 23, 2023
e633dca
More improvements
NielsRogge Nov 23, 2023
988d430
Add config attributes, improve conversion script
NielsRogge Nov 24, 2023
79cd06c
Make conversion work
NielsRogge Nov 24, 2023
8be1ded
Rename images to pixel_values
NielsRogge Nov 24, 2023
202fcc2
Add processor
NielsRogge Nov 25, 2023
b76b1b9
Remove einops dependency
NielsRogge Nov 25, 2023
185151d
Remove xformers dependency
NielsRogge Nov 25, 2023
1b4de2a
Improve vision config
NielsRogge Nov 25, 2023
98d47a2
Update test
NielsRogge Nov 25, 2023
4f1aa8b
Fix more tests, update conversion script
NielsRogge Nov 26, 2023
17581cc
Fix more tests
NielsRogge Nov 26, 2023
d10cbca
Fix more tests, add docstrings
NielsRogge Nov 26, 2023
5efde22
Improve variable names, docstrings
NielsRogge Nov 26, 2023
7ddd120
Improve more variable names
NielsRogge Nov 26, 2023
4071e89
Leverage _prepare_4d_causal_attention_mask
NielsRogge Nov 26, 2023
e6bd4ed
Rename classes
NielsRogge Nov 26, 2023
a80529f
Remove script
NielsRogge Nov 26, 2023
38ed9bf
Update README and docs
NielsRogge Nov 27, 2023
79f981d
Use native torch rotary embeddings
NielsRogge Nov 28, 2023
2ea6b18
Remove triton dependency
NielsRogge Dec 6, 2023
7f1e274
Remove file
NielsRogge Dec 6, 2023
d3c5fc3
Make fixup
NielsRogge Dec 6, 2023
456a439
Make fixup
NielsRogge Dec 6, 2023
3410c80
Merge branch 'main' into add_cogvlm
younesbelkada Dec 7, 2023
c52848d
Add cleaner implementation
NielsRogge Dec 11, 2023
660cc0f
More improvements
NielsRogge Dec 12, 2023
57e433d
Add position_ids
NielsRogge Dec 14, 2023
0f27526
Add print statements
NielsRogge Dec 16, 2023
dda271e
Add improvement
NielsRogge Dec 16, 2023
7383c72
Update conversion script
NielsRogge Dec 16, 2023
8209714
Fix generation
NielsRogge Dec 16, 2023
4f81841
Set use_cache to False for now
NielsRogge Dec 16, 2023
8191980
Test
NielsRogge Dec 16, 2023
8edee20
Add print statements
NielsRogge Dec 18, 2023
5bdce06
Fix use_cache
NielsRogge Dec 21, 2023
3e81dbe
Fix more tests
NielsRogge Dec 21, 2023
92657d1
Fix more tests
NielsRogge Dec 21, 2023
281a592
Make sure model works with pipeline
NielsRogge Dec 22, 2023
c26ec6c
Update auto mappings
NielsRogge Dec 22, 2023
64e0cd6
Remove print statements
NielsRogge Dec 22, 2023
de77924
Fix all tests
NielsRogge Dec 27, 2023
e58d765
Convert more checkpoints
NielsRogge Dec 28, 2023
81c0e46
Convert more checkpoints
NielsRogge Dec 28, 2023
efe5a9b
Update year
NielsRogge Jan 13, 2024
e7cd72d
Merge remote-tracking branch 'upstream/main' into add_cogvlm_cleaner
NielsRogge Jan 13, 2024
5430d7f
Improve conversion script
NielsRogge Jan 14, 2024
067ce32
Address comments
NielsRogge Feb 5, 2024
5e82fda
More improvements
NielsRogge Feb 5, 2024
330673d
Update device
NielsRogge Feb 12, 2024
2f7c4a8
Fix merge
NielsRogge Feb 12, 2024
9163e9e
Add copied from
NielsRogge Feb 12, 2024
b285247
Replace assert
NielsRogge Feb 12, 2024
69e45ca
Use torch.full
NielsRogge Feb 12, 2024
c43c3c6
Remove todo, apply black
NielsRogge Feb 12, 2024
d4394c0
Add docstrings, remove one-letter variable names
NielsRogge Feb 12, 2024
f190a38
Use meta device
NielsRogge Feb 12, 2024
9e6fe19
Improve conversion script
NielsRogge Feb 12, 2024
f25640c
Remove attention_fn
NielsRogge Feb 12, 2024
b660626
Remove attention function part 2
NielsRogge Feb 12, 2024
df4accc
Fix test
NielsRogge Feb 12, 2024
8802afb
Add copied from
NielsRogge Feb 12, 2024
58d6bb2
Remove unused variables
NielsRogge Feb 12, 2024
202f425
Merge remote-tracking branch 'upstream/main' into add_cogvlm_cleaner
NielsRogge Feb 12, 2024
dd4dc8f
Add rotary embedding class
NielsRogge Feb 12, 2024
f1ef596
Remove class
NielsRogge Feb 12, 2024
20da998
Leverage accelerate, add copied from
NielsRogge Feb 12, 2024
f7aa502
Fix style
NielsRogge Feb 12, 2024
ff4e851
Fix docs
NielsRogge Feb 12, 2024
de3548c
Add lowercase
NielsRogge Feb 15, 2024
b5deca5
Address comment
NielsRogge Feb 20, 2024
1fafa2c
Add improvement
NielsRogge Feb 21, 2024
b04d1b2
Use buffers
NielsRogge Feb 21, 2024
f8b4093
Merge remote-tracking branch 'upstream/main' into add_cogvlm_cleaner
NielsRogge Feb 21, 2024
fd568f0
Fix merge
NielsRogge Feb 21, 2024
5e2f09d
Remove llm_forward
NielsRogge Feb 21, 2024
8ce297f
Make fixup
NielsRogge Feb 21, 2024
549011e
Merge remote-tracking branch 'upstream/main' into add_cogvlm_cleaner
NielsRogge Feb 26, 2024
23dcdfa
Debug
NielsRogge Mar 12, 2024
1ba87ce
More debugging
NielsRogge Mar 17, 2024
5b1bd42
Fix merge
NielsRogge Mar 25, 2024
9513c05
Match logits
NielsRogge Mar 25, 2024
cf3bca3
Improve rotary
NielsRogge Mar 25, 2024
b80db66
Comment out hf_hub_download
NielsRogge Mar 25, 2024
b581776
Make fixup
NielsRogge Mar 25, 2024
5513c78
Add sdpa attention class
NielsRogge Mar 25, 2024
6a9adcb
Fix more tests
NielsRogge Mar 25, 2024
95ba976
Remove print statements
NielsRogge Mar 25, 2024
7be620e
More improvements
NielsRogge Mar 25, 2024
98098db
Merge remote-tracking branch 'upstream/main' into add_cogvlm_cleaner
NielsRogge Mar 29, 2024
61bcc7a
Fix caching
NielsRogge Mar 30, 2024
c7ef905
Support new cache format
NielsRogge Mar 30, 2024
9e85143
Make fixup
NielsRogge Mar 30, 2024
3a99604
Remove COGVLM_PRETRAINED_MODEL_ARCHIVE_LIST
NielsRogge Mar 30, 2024
6fba288
remove xformers dependency, and make eager/sdpa mathematically equiva…
fxmarty Apr 11, 2024
d884f47
Revert, something is still wrong
fxmarty Apr 11, 2024
9ba436e
implement eager attention
fxmarty Apr 17, 2024
3e27232
cleanup
fxmarty Apr 17, 2024
8879f1b
cleanup, matching at 1e-4
fxmarty Apr 17, 2024
5c08552
Fix merge
NielsRogge Apr 24, 2024
fbf9a73
Fix copies
NielsRogge Apr 24, 2024
c8baca6
Fix all tests
NielsRogge Apr 24, 2024
9c88570
Merge remote-tracking branch 'upstream/main' into add_cogvlm_cleaner
NielsRogge Apr 24, 2024
4296841
Remove cogvlm from doctests, don't use xformers
NielsRogge Apr 25, 2024
d54fcb6
Update cogvlm.md
NielsRogge Apr 25, 2024
16ae014
Remove script
NielsRogge Apr 25, 2024
1eed441
Address comments
NielsRogge May 1, 2024
c7ab4a9
Add copied from
NielsRogge May 1, 2024
ae612b1
Remove unused argument
NielsRogge May 1, 2024
4424455
Prepare everything in the processor
NielsRogge May 27, 2024
f3540b6
Remove script
NielsRogge May 27, 2024
e85cd55
Fix merge
NielsRogge May 27, 2024
3a2555e
Fix ruff
NielsRogge May 27, 2024
a01bb5a
Apply ruff
NielsRogge May 27, 2024
3aba8f3
Fix more processor tests
NielsRogge May 27, 2024
8e56e85
Fix more model tests
NielsRogge May 27, 2024
3c26099
Remove copied from
NielsRogge May 27, 2024
87bc77a
Fix processor tests
NielsRogge May 27, 2024
fccc433
Fix typo
NielsRogge May 27, 2024
78001e1
Merge remote-tracking branch 'upstream/main' into add_cogvlm_cleaner
NielsRogge Jun 3, 2024
3f1a821
Merge remote-tracking branch 'upstream/main' into add_cogvlm_cleaner
NielsRogge Jun 3, 2024
d6a9fa5
Undo gemma updates
NielsRogge Jun 3, 2024
016d793
Fix test
NielsRogge Jun 3, 2024
93a9426
Remove archive map
NielsRogge Jun 5, 2024
44d7038
Address comment
NielsRogge Jun 5, 2024
93a5d5f
Fix merge
NielsRogge Jul 1, 2024
a593d3f
Fix image processor
NielsRogge Jul 1, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/source/en/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -754,6 +754,8 @@
title: CLIPSeg
- local: model_doc/clvp
title: CLVP
- local: model_doc/cogvlm
title: CogVLM
- local: model_doc/data2vec
title: Data2Vec
- local: model_doc/deplot
Expand Down
1 change: 1 addition & 0 deletions docs/source/en/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,7 @@ Flax), PyTorch, and/or TensorFlow.
| [CLVP](model_doc/clvp) | ✅ | ❌ | ❌ |
| [CodeGen](model_doc/codegen) | ✅ | ❌ | ❌ |
| [CodeLlama](model_doc/code_llama) | ✅ | ❌ | ✅ |
| [CogVLM](model_doc/cogvlm) | ✅ | ❌ | ❌ |
| [Cohere](model_doc/cohere) | ✅ | ❌ | ❌ |
| [Conditional DETR](model_doc/conditional_detr) | ✅ | ❌ | ❌ |
| [ConvBERT](model_doc/convbert) | ✅ | ✅ | ❌ |
Expand Down
56 changes: 56 additions & 0 deletions docs/source/en/model_doc/cogvlm.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.

⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.

-->

# CogVLM

## Overview

The CogVLM model was proposed in [CogVLM: Visual Expert for Pretrained Language Models](https://arxiv.org/abs/2311.03079) by Weihan Wang, Qingsong Lv, Wenmeng Yu, Wenyi Hong, Ji Qi, Yan Wang, Junhui Ji, Zhuoyi Yang, Lei Zhao, Xixuan Song, Jiazheng Xu, Bin Xu, Juanzi Li, Yuxiao Dong, Ming Ding, Jie Tang. CogVLM adds separate QKV and MLP weights to a frozen large language model, enabling a strong multimodal foundation model that performs well on various multimodal benchmarks.

The abstract from the paper is the following:

*We introduce CogVLM, a powerful open-source visual language foundation model. Different from the popular shallow alignment method which maps image features into the input space of language model, CogVLM bridges the gap between the frozen pretrained language model and image encoder by a trainable visual expert module in the attention and FFN layers. As a result, CogVLM enables deep fusion of vision language features without sacrificing any performance on NLP tasks. CogVLM-17B achieves state-of-the-art performance on 10 classic cross-modal benchmarks, including NoCaps, Flicker30k captioning, RefCOCO, RefCOCO+, RefCOCOg, Visual7W, GQA, ScienceQA, VizWiz VQA and TDIUC, and ranks the 2nd on VQAv2, OKVQA, TextVQA, COCO captioning, etc., surpassing or matching PaLI-X 55B.*

Tips:

- One can use [`CogvlmProcessor`] to prepare images and text for the model.

This model was contributed by [nielsr](https://huggingface.co/nielsr).
The original code can be found [here](https://github.com/THUDM/CogVLM).


NielsRogge marked this conversation as resolved.
Show resolved Hide resolved
## CogvlmConfig

[[autodoc]] CogvlmConfig

## CogvlmVisionConfig

[[autodoc]] CogvlmVisionConfig

## CogvlmProcessor

[[autodoc]] CogvlmProcessor

## CogvlmModel

[[autodoc]] CogvlmModel
- forward

## CogvlmForCausalLM

[[autodoc]] CogvlmForCausalLM
- forward
- generate
1 change: 1 addition & 0 deletions docs/source/en/perf_infer_gpu_one.md
Original file line number Diff line number Diff line change
Expand Up @@ -196,6 +196,7 @@ For now, Transformers supports SDPA inference and training for the following arc
* [Bart](https://huggingface.co/docs/transformers/model_doc/bart#transformers.BartModel)
* [Bert](https://huggingface.co/docs/transformers/model_doc/bert#transformers.BertModel)
* [Cohere](https://huggingface.co/docs/transformers/model_doc/cohere#transformers.CohereModel)
* [CogVLM](https://huggingface.co/docs/transformers/model_doc/cogvlm#transformers.CogVLMModel)
* [Dbrx](https://huggingface.co/docs/transformers/model_doc/dbrx#transformers.DbrxModel)
* [DeiT](https://huggingface.co/docs/transformers/model_doc/deit#transformers.DeiTModel)
* [Dpr](https://huggingface.co/docs/transformers/model_doc/dpr#transformers.DprReader)
Expand Down
22 changes: 22 additions & 0 deletions src/transformers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -287,6 +287,11 @@
"CodeGenConfig",
"CodeGenTokenizer",
],
"models.cogvlm": [
"CogvlmConfig",
"CogvlmProcessor",
"CogvlmVisionConfig",
],
"models.cohere": ["CohereConfig"],
"models.conditional_detr": ["ConditionalDetrConfig"],
"models.convbert": [
Expand Down Expand Up @@ -1633,6 +1638,13 @@
"CodeGenPreTrainedModel",
]
)
_import_structure["models.cogvlm"].extend(
[
"CogvlmForCausalLM",
"CogvlmModel",
"CogvlmPreTrainedModel",
]
)
_import_structure["models.cohere"].extend(["CohereForCausalLM", "CohereModel", "CoherePreTrainedModel"])
_import_structure["models.conditional_detr"].extend(
[
Expand Down Expand Up @@ -4849,6 +4861,11 @@
CodeGenConfig,
CodeGenTokenizer,
)
from .models.cogvlm import (
CogvlmConfig,
CogvlmProcessor,
CogvlmVisionConfig,
)
from .models.cohere import CohereConfig
from .models.conditional_detr import (
ConditionalDetrConfig,
Expand Down Expand Up @@ -6190,6 +6207,11 @@
CodeGenModel,
CodeGenPreTrainedModel,
)
from .models.cogvlm import (
CogvlmForCausalLM,
CogvlmModel,
CogvlmPreTrainedModel,
)
from .models.cohere import (
CohereForCausalLM,
CohereModel,
Expand Down
1 change: 1 addition & 0 deletions src/transformers/models/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@
clvp,
code_llama,
codegen,
cogvlm,
cohere,
conditional_detr,
convbert,
Expand Down
2 changes: 2 additions & 0 deletions src/transformers/models/auto/configuration_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@
("clvp", "ClvpConfig"),
("code_llama", "LlamaConfig"),
("codegen", "CodeGenConfig"),
("cogvlm", "CogvlmConfig"),
("cohere", "CohereConfig"),
("conditional_detr", "ConditionalDetrConfig"),
("convbert", "ConvBertConfig"),
Expand Down Expand Up @@ -331,6 +332,7 @@
("clvp", "CLVP"),
("code_llama", "CodeLlama"),
("codegen", "CodeGen"),
("cogvlm", "CogVLM"),
("cohere", "Cohere"),
("conditional_detr", "Conditional DETR"),
("convbert", "ConvBERT"),
Expand Down
1 change: 1 addition & 0 deletions src/transformers/models/auto/image_processing_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@
("chinese_clip", "ChineseCLIPImageProcessor"),
("clip", "CLIPImageProcessor"),
("clipseg", "ViTImageProcessor"),
("cogvlm", "CLIPImageProcessor"),
("conditional_detr", "ConditionalDetrImageProcessor"),
("convnext", "ConvNextImageProcessor"),
("convnextv2", "ConvNextImageProcessor"),
Expand Down
2 changes: 2 additions & 0 deletions src/transformers/models/auto/modeling_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@
("clvp", "ClvpModelForConditionalGeneration"),
("code_llama", "LlamaModel"),
("codegen", "CodeGenModel"),
("cogvlm", "CogvlmModel"),
("cohere", "CohereModel"),
("conditional_detr", "ConditionalDetrModel"),
("convbert", "ConvBertModel"),
Expand Down Expand Up @@ -693,6 +694,7 @@
[
("blip", "BlipForConditionalGeneration"),
("blip-2", "Blip2ForConditionalGeneration"),
("cogvlm", "CogvlmForCausalLM"),
("git", "GitForCausalLM"),
("idefics2", "Idefics2ForConditionalGeneration"),
("instructblip", "InstructBlipForConditionalGeneration"),
Expand Down
1 change: 1 addition & 0 deletions src/transformers/models/auto/processing_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,7 @@
("clip", "CLIPProcessor"),
("clipseg", "CLIPSegProcessor"),
("clvp", "ClvpProcessor"),
("cogvlm", "CogvlmProcessor"),
("flava", "FlavaProcessor"),
("fuyu", "FuyuProcessor"),
("git", "GitProcessor"),
Expand Down
1 change: 1 addition & 0 deletions src/transformers/models/auto/tokenization_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -138,6 +138,7 @@
),
),
("codegen", ("CodeGenTokenizer", "CodeGenTokenizerFast" if is_tokenizers_available() else None)),
("cogvlm", ("LlamaTokenizer", "LlamaTokenizerFast" if is_tokenizers_available() else None)),
NielsRogge marked this conversation as resolved.
Show resolved Hide resolved
("cohere", (None, "CohereTokenizerFast" if is_tokenizers_available() else None)),
("convbert", ("ConvBertTokenizer", "ConvBertTokenizerFast" if is_tokenizers_available() else None)),
(
Expand Down
63 changes: 63 additions & 0 deletions src/transformers/models/cogvlm/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# Copyright 2024 The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from typing import TYPE_CHECKING

from ...utils import OptionalDependencyNotAvailable, _LazyModule, is_torch_available


_import_structure = {
"configuration_cogvlm": [
"COGVLM_PRETRAINED_CONFIG_ARCHIVE_MAP",
"CogvlmConfig",
"CogvlmVisionConfig",
],
"processing_cogvlm": ["CogvlmProcessor"],
}

try:
if not is_torch_available():
raise OptionalDependencyNotAvailable()
except OptionalDependencyNotAvailable:
pass
else:
_import_structure["modeling_cogvlm"] = [
"CogvlmModel",
"CogvlmForCausalLM",
"CogvlmPreTrainedModel",
]

if TYPE_CHECKING:
from .configuration_cogvlm import (
COGVLM_PRETRAINED_CONFIG_ARCHIVE_MAP,
CogvlmConfig,
CogvlmVisionConfig,
)
from .processing_cogvlm import CogvlmProcessor

try:
if not is_torch_available():
raise OptionalDependencyNotAvailable()
except OptionalDependencyNotAvailable:
pass
else:
from .modeling_cogvlm import (
CogvlmForCausalLM,
CogvlmModel,
CogvlmPreTrainedModel,
)

else:
import sys

sys.modules[__name__] = _LazyModule(__name__, globals()["__file__"], _import_structure, module_spec=__spec__)
Loading
Loading