Skip to content

Commit

Permalink
Merge pull request #175 from zchoi/main
Browse files Browse the repository at this point in the history
update MMEvol codebase
  • Loading branch information
tnlin authored Nov 26, 2024
2 parents 216a79f + 37ef5e8 commit 9406bd7
Show file tree
Hide file tree
Showing 31 changed files with 139 additions and 153 deletions.
Binary file modified .DS_Store
Binary file not shown.
31 changes: 17 additions & 14 deletions mmevol/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
<!-- # MMEvol -->

# MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct
<p align="center">
<img src="dataengine/assets/mmevol_logo.png" width="50%" height="50%">
</p>

<div align="center">
<br>
Expand All @@ -23,8 +26,6 @@
<a>Jingkuan Song<sup><span>4🌟</span></sup>,
<br>



\* Equal contribution 🌟 Corresponding author

<sup>1</sup> Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences<br>
Expand All @@ -38,12 +39,9 @@

</div>

<p align="center">
<img src="mmevol_sft_data/assets/mmevol.jpg" width="100%" height="100%">
</p>

<font size=5><div align='center' > [[📖 arXiv Paper](https://arxiv.org/pdf/2409.05840)] [[📊 Dataset](https://huggingface.co/datasets/Tongyi-ConvAI/MMEvol)] [[🏆 Models](https://huggingface.co/models/Tongyi-ConvAI/MMEvol)] </div></font>
MMEvol is the first method that successfully introduces Evol-Instruct into multimodal domain to improve the diversity and complexity of multimodal instruction data. Compared with previous methods like vila2, MIMIC-IT, and MMInstruct, it can perform iterative evolution in a very elegant and simple way in a fully automatic way, breaking through human imagination of data complexity and diversity. It has no restrictions on the form of data, the type of task, or complex processing. It can quickly perform self-iterative evolution on limited image instruction data to obtain ultra-high-quality multimodal data, thereby giving multimodal models more powerful capabilities. At the same time, it can be orthogonally combined with other data flow-driven methods such as vila2, MIMIC-IT, and MMInstruct to obtain more powerful data construction effects. Everyone is welcome to experience it now!
MMEvol is the first method that successfully introduces Evol-Instruct into multimodal domain to improve the diversity and complexity of multimodal instruction data. Compared with previous methods like VILA2, MIMIC-IT, and MMInstruct, it can perform iterative evolution in a very elegant and simple way in a fully automatic way, breaking through human imagination of data complexity and diversity. It has no restrictions on the form of data, the type of task, or complex processing. It can quickly perform self-iterative evolution on limited image instruction data to obtain ultra-high-quality multimodal data, thereby giving multimodal models more powerful capabilities. At the same time, it can be orthogonally combined with other data flow-driven methods such as VILA2, MIMIC-IT, and MMInstruct to obtain more powerful data construction effects. Everyone is welcome to experience it now!

## 🔥 Update

Expand Down Expand Up @@ -103,8 +101,8 @@ Here are the pretrained weights and instruction tuning weights

| Model | Pretrained Projector | Base LLM | PT Data | IT Data | Download |
| ---------------- | -------------------- | --------- | ------------------------------------------------------------ | ------- | -------- |
| MMEvol-Qwen2-7B | [mm_projector]() | Qwen2-7B | [LLaVA-Pretrain](https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain) | MMEvol | [ckpt]() |
| MMEvol-LLaMA3-8B | [mm_projector]() | LLaMA3-8B | [LLaVA-Pretrain](https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain) | MMEvol | [ckpt]() |
| MMEvol-Qwen2-7B | [mm_projector](https://huggingface.co/models/Tongyi-ConvAI/MMEvol) | Qwen2-7B | [LLaVA-Pretrain](https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain) | MMEvol | [ckpt](https://huggingface.co/models/Tongyi-ConvAI/MMEvol) |
| MMEvol-LLaMA3-8B | [mm_projector](https://huggingface.co/models/Tongyi-ConvAI/MMEvol) | LLaMA3-8B | [LLaVA-Pretrain](https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain) | MMEvol | [ckpt](https://huggingface.co/models/Tongyi-ConvAI/MMEvol) |

### Performance

Expand Down Expand Up @@ -255,9 +253,10 @@ bash scripts/v1_6/train/llama3/finetune.sh
bash scripts/v1_6/train/qwen2/finetune.sh
```


## 📈 Evaluation

#### Ensure that your api_base and key are correctly configured before evaluation.

## opencompass

First, enter the `vlmevalkit` directory and install all dependencies:
Expand Down Expand Up @@ -313,6 +312,8 @@ While scoring on each benchmark directly, set `MODE=all`. If only inference resu
./script/run_inference.sh MMEvol-Llama3-V-1_6 MathVista_MINI all
.....

# NOTE you should use llava/eval/blink_eval.py for blink evaluation individually.
python llava/eval/blink_eval.py
```

<br />
Expand All @@ -335,22 +336,24 @@ python llava/eval/mminst_eval.py

<br />



## 👀 Visualization

The Tongyi-ConvAI generates this dataset for multi-modal supervised fine-tuning. This dataset was used to train **Evol-Llama3-8B-Instruct** and **Evol-Qwen2-7B** reported in [our paper](https://arxiv.org/pdf/2409.05840). To create this dataset, we first selected 163K Seed Instruction Tuning Dataset for Evol-Instruct, then we enhance data quality through an iterative process that involves a refined combination of fine-grained perception, cognitive reasoning, and interaction evolution. This process results in the generation of a more complex and diverse image-text instruction dataset, which in turn empowers MLLMs with enhanced capabilities. Below we showcase the detailed data distribution of the SEED-163K, which is prepared for multi-round evolution mentioned above. More details can be found in our paper.

<div align=center>
<img width="90%" src="mmevol_sft_data/assets/mmevol.jpg"/>
<img width="90%" src="dataengine/assets/mmevol_seed_dis.jpg"/>
</div>

<div align='center' >
<details>
<summary> Click to expand more examples</summary>
<p align="center">
<img src="mmevol_sft_data/assets/mmevol.jpg" width="60%" height="60%">
<img src="mmevol_sft_data/assets/mmevol.jpg" width="60%" height="60%">
<img src="mmevol_sft_data/assets/mmevol.jpg" width="60%" height="60%">
<img src="mmevol_sft_data/assets/mmevol.jpg" width="60%" height="60%">
<img src="dataengine/assets/mmevol_pai.png" width="90%" height="90%">
<img src="dataengine/assets/mmevol_dis_cam.png" width="90%" height="90%">
<img src="dataengine/assets/mmevol_long_tail.png" width="90%" height="90%">
<img src="dataengine/assets/mmevol_performance.png" width="90%" height="90%">
</details>
</div>

Expand Down
79 changes: 79 additions & 0 deletions mmevol/dataengine/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
# Data construction pipeline for MMEvol-480k.

<p align="center">
<img src="assets/mmevol_logo.png" width="50%" height="50%">
</p>

<div align="center">
<br>
<a href="https://scholar.google.com/citations?user=phg8yxoAAAAJ&hl=zh-CN&oi=ao">Run Luo</a><sup><span>1,2*</span></sup>,
<a>Haonan Zhang</a><sup><span>3*</span></sup>,
<a>Longze Chen</a><sup><span>1,2*</span></sup>,
<a>Ting-En Lin</a><sup><span>3*</span></sup>,
<a>Xiong Liu</a><sup><span>3</span></sup>,
<a>Yuchuan Wu</a><sup><span>3</span></sup>,
<a>Min Yang</a><sup><span>1,2🌟</span></sup>,
<a>Yongbin Li</a><sup><span>3🌟</span></sup>,
<br>
<a>Minzheng Wang<sup><span>2</span></sup>,
<a>Pengpeng Zeng<sup><span>4</span></sup>,
<a>Lianli Gao<sup><span>5</span></sup>,
<a>Heng Tao Shen<sup><span>4</span></sup>,
<a>Yunshui Li<sup><span>1,2</span></sup>,
<a>Xiaobo Xia<sup><span>6</span></sup>,
<a>FeiHuang<sup><span>3</span></sup>,
<a>Jingkuan Song<sup><span>4🌟</span></sup>,
<br>

\* Equal contribution 🌟 Corresponding author

<sup>1</sup> Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences<br>
<sup>2</sup> University of Chinese Academy of Sciences<br>
<sup>3</sup> Alibaba Group
<sup>4</sup> Tongji University
<sup>5</sup> Independent Researcher
<sup>6</sup> The University of Sydney<br>

![Multi-Modal](https://img.shields.io/badge/Task-Multi--Modal-red) <a href='https://arxiv.org/pdf/2409.05840'><img src='https://img.shields.io/badge/Paper-Arxiv-red'></a> <a href='https://huggingface.co/models/Tongyi-ConvAI/MMEvol'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Models-blue'></a> <a href='https://huggingface.co/datasets/Tongyi-ConvAI/MMEvol'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Data-green'> <a href='https://mmevol.github.io/'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Project-Page-green'></a></a>

</div>


<font size=5><div align='center' > [[📖 arXiv Paper](https://arxiv.org/pdf/2409.05840)] [[📊 Dataset](https://huggingface.co/datasets/Tongyi-ConvAI/MMEvol)] [[🏆 Models](https://huggingface.co/models/Tongyi-ConvAI/MMEvol)] </div></font>

Follow the instructions below to generate MMEvol-480k.

1. Download SEED-163k json file (`mm_seed_no_evo_163k.json`) from [🤗 huggingface](https://huggingface.co/datasets/Tongyi-ConvAI/MMEvol/tree/main/jsons), and place it under the `./dataengine/datasets` path.
2. Execute preprocessing code under `dataengine/datasets` path to extract each sample to the `meta_data` folder by:
```python
python dataengine/datasets/process.py
```
3. Prepare the data storage folder by referring to the format of `./dataengine/evolution/folder_template`, you can just copy folder_template and name it as your data name as you like, _e.g._, mmevol_1k_evo.json.
4. Ensure that your `api_base` and `key` are correctly configured before starting generation. You should put your key and api_base on both:

- lines 129-130 in dataengine/multi_round.py
- lines 126-127 in dataengine/score_process/difficulty_scoring_v123.py
5. Run the following code to begin the three-round data evolution:
```python
python dataengine/multi_round.py
```
Three rounds of evolution will be performed based on the SEED-163k, and data filtering will be performed at the end of each round of evolution. The final evolution data will be stored under `./datasets` paths

**License**: Please follow [Meta Llama 3.1 Community License](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/LICENSE) and [Gemma License](https://www.kaggle.com/models/google/gemma/license/).

## 📚 Citation

```bibtex
@article{luo2024mmevol,
title={Mmevol: Empowering multimodal large language models with evol-instruct},
author={Luo, Run and Zhang, Haonan and Chen, Longze and Lin, Ting-En and Liu, Xiong and Wu, Yuchuan and Yang, Min and Wang, Minzheng and Zeng, Pengpeng and Gao, Lianli and others},
journal={arXiv preprint arXiv:2409.05840},
year={2024}
}
```

**Contact**:

- Run Luo — [email protected]

- Haonan Zhang — [email protected]
Binary file added mmevol/dataengine/assets/mmevol_dis_cam.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added mmevol/dataengine/assets/mmevol_logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added mmevol/dataengine/assets/mmevol_long_tail.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added mmevol/dataengine/assets/mmevol_pai.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added mmevol/dataengine/assets/mmevol_performance.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
File renamed without changes
File renamed without changes.
24 changes: 24 additions & 0 deletions mmevol/dataengine/datasets/process.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
import json
import os
import os.path as osp
from tqdm import tqdm
import shutil

# Construct hash_id to create a unique index, because both id and image key values ​​have duplicate values
datasets_path = "/mnt/data/haonan/code/dataengine/datasets"

a = json.load(open(osp.join(datasets_path, "seed_data_1k_demo.json"), "r"))
for index, i in enumerate(a):
i["hash_id"] = str(index) + "_" + i["image"].replace("/", "_")

json.dump(a, open("/mnt/data/haonan/code/dataengine/datasets/seed_data_1k_demo.json", "w"), indent=4)

# If the data format is already well organized, store it separately in meta data
if os.path.exists(osp.join(datasets_path, "meta_data")):
shutil.rmtree(osp.join(datasets_path, "meta_data"))
os.mkdir(osp.join(datasets_path, "meta_data"))

data = json.load(open(osp.join(datasets_path, "seed_data_1k_demo.json"), "r"))

for index, d in enumerate(tqdm(data)):
json.dump(d, open(osp.join(datasets_path, "meta_data", "{}.json".format(d["hash_id"])), "w"), indent=4)
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import os
import sys
sys.path.append("/mnt/data/haonan/code/mmevol_sft_data")
sys.path.append("/mnt/data/haonan/code/dataengine")
from base import BaseAPI
import numpy as np
from tqdm import tqdm
Expand Down Expand Up @@ -466,13 +466,13 @@ def filter_round3(meta_data, conversation_v3_path):

if __name__=='__main__':

final_save_path = "/mnt/data/haonan/code/mmevol_sft_data/datasets/seed_data_1k_demo_evo.json"
root_path = '/mnt/data/haonan/code/mmevol_sft_data/evolution/multi_round_single_imgs_1k_mini'
final_save_path = "/mnt/data/haonan/code/dataengine/datasets/seed_data_1k_demo_evo.json"
root_path = '/mnt/data/haonan/code/dataengine/evolution/multi_round_single_imgs_1k_mini'
img_path = '/mnt/workspace/lr/datasets'

for round_n in [1,2,3]:
if round_n == 1:
seed_data_path = "/mnt/data/haonan/code/mmevol_sft_data/datasets/meta_data"
seed_data_path = "/mnt/data/haonan/code/dataengine/datasets/meta_data"
else:
seed_data_path = osp.join(root_path, "round{}".format(round_n-1), "filtered_qa")

Expand Down Expand Up @@ -534,4 +534,4 @@ def filter_round3(meta_data, conversation_v3_path):
merged_data.append(data)

json.dump(merged_data, open(final_save_path, "w"), indent=4)
print("Saveing file to {}".format(final_save_path))
print("Saveing file to {}".format(final_save_path))
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -124,12 +124,9 @@ def __init__(self,
print('Unknown API Base. ')
sys.exit(-1)

self.api_base="http://47.88.8.18:8088/api/ask"
# self.api_base = "http://47.88.8.18:8088/api/ask?tenant=gpt-4o-mini"
# self.key = "eyJ0eXAiOiJqd3QiLCJhbGciOiJIUzI1NiJ9.eyJ1c2VybmFtZSI6IjI1ODczMCIsInBhc3N3b3JkIjoiMjU4NzMwMTIzIiwiZXhwIjoyMDE5NTUwNzAxfQ.JuqnTa7yauGkSzWkBiEig1K_rxvfAYTXS9F9_m-h4q8"
# self.key = "eyJ0eXAiOiJqd3QiLCJhbGciOiJIUzI1NiJ9.eyJ1c2VybmFtZSI6IjI3NDM2OCIsInBhc3N3b3JkIjoiMjc0MzY4MTIzIiwiZXhwIjoyMDEyNjEzNjA4fQ.7OUpHs-AFPaFHuUy_p7XxXyNYhca2_-7F5GBtaahfe4"
self.key = "eyJhbGciOiJIUzI1NiIsInR5cCI6Imp3dCJ9.eyJ1c2VybmFtZSI6IjQ0MzQ1NSIsInBhc3N3b3JkIjoiNDQzNDU1MTIzIiwiZXhwIjoyMDMxNzA1NTA3fQ.7g4a6t9dKcRXVRa7MwQb5m2oirFu1OxjXhWbNM0w50s"
# self.key = "eyJhbGciOiJIUzI1NiIsInR5cCI6Imp3dCJ9.eyJ1c2VybmFtZSI6IjQzOTg2OSIsInBhc3N3b3JkIjoiNDM5ODY5MTIzIiwiZXhwIjoyMDMxNzA3NjkzfQ.ly9XNzVW7pEeW_bTZxzaqB3jt2kRr14XQIpT0DbCTto"
self.api_base = ""
self.key = ""

# self.model = "gpt-4o-2024-08-06"
self.model = "gpt-4o-mini"

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -123,10 +123,9 @@ def __init__(self,
print('Unknown API Base. ')
sys.exit(-1)

self.api_base="http://47.88.8.18:8088/api/ask"
# self.api_base = "http://47.88.8.18:8088/api/ask?tenant=gpt-4o-mini"
# self.key = "eyJ0eXAiOiJqd3QiLCJhbGciOiJIUzI1NiJ9.eyJ1c2VybmFtZSI6IjI1ODczMCIsInBhc3N3b3JkIjoiMjU4NzMwMTIzIiwiZXhwIjoyMDE5NTUwNzAxfQ.JuqnTa7yauGkSzWkBiEig1K_rxvfAYTXS9F9_m-h4q8"
self.key = "eyJhbGciOiJIUzI1NiIsInR5cCI6Imp3dCJ9.eyJ1c2VybmFtZSI6IjQ0MzQ1NSIsInBhc3N3b3JkIjoiNDQzNDU1MTIzIiwiZXhwIjoyMDMxNzA1NTA3fQ.7g4a6t9dKcRXVRa7MwQb5m2oirFu1OxjXhWbNM0w50s"
self.api_base = ""
self.key = ""

# self.model="gpt-4o-2024-05-13"
self.model = "gpt-4o-mini"

Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
9 changes: 5 additions & 4 deletions mmevol/llava/eval/mmvp_eval.py
Original file line number Diff line number Diff line change
Expand Up @@ -109,11 +109,12 @@ def make_request(meta):
with Pool(processes=50) as pool:
output = list(tqdm(pool.imap(make_request, data), total=len(data)))

print(output)
for i in set(all_types):
# print(output)
# for i in set(all_types):

for j in data:
if j['type']==i
# for j in data:
# if j['type']==i

num_correct, num_total = 0, 0
# Continue with the processing of the JSONL file
index=0
Expand Down
51 changes: 0 additions & 51 deletions mmevol/mmevol_sft_data/README.md

This file was deleted.

Binary file removed mmevol/mmevol_sft_data/assets/mmevol.jpg
Binary file not shown.
65 changes: 0 additions & 65 deletions mmevol/mmevol_sft_data/datasets/process.ipynb

This file was deleted.

Binary file removed mmevol/vlmevalkit/.DS_Store
Binary file not shown.
Binary file removed mmevol/vlmevalkit/vlmeval/.DS_Store
Binary file not shown.
Loading

0 comments on commit 9406bd7

Please sign in to comment.