Merge Models with Non-Standard Architectures (e.g., Multimodal Models) #450

ElliotStein · 2024-10-31T12:54:34Z

This update extends merge capabilities to models without predefined architecture JSON files, enabling support for models with non-standard architectures such as multimodal models.

Key Changes:

Automatic Architecture Info Creation: For models without an architecture JSON specification in /_data/architectures, an ArchitectureInfo class is automatically generated by reading parameter names from the saved model.
- Efficiency Note: Reading parameter names is significantly faster for models saved in .safetensors format compared to .bin.
Parameter Organization: While pre/post weights aren’t explicitly segregated, parameters are grouped into layers based on the pattern .{integer}. in parameter names.

Modifications to Core Functionality:

get_architecture_info: Now returns None and raises a warning if it fails, rather than raising an error outright.

Additional Updates:

Minor bug fixes to testing and GPT-2 configuration.

This change does not impact functionality for models with predefined architecture JSON specifications (i.e., all currently supported merges).

…icArchitectureInfo and get_model_parameter_names

…atch, or match when a prefix is removed (e.g. vision_block.layer.0 and layer.0), their overlapping layers can now be merged.

formatting

…ergekit into architecture-agnostic

…e. And minor fixes.

Ph0rk0z · 2024-11-29T16:45:12Z

I tried to merge qwen2-vl and dolphin-7b, both based on qwen inside of a HF space. The layers outside of the vision tower should match and be mergeable in theory. Instead I got an error that no common parameters were found and it couldn't resolve the arch automatically.

ElliotStein · 2024-11-29T17:13:37Z

I tried to merge qwen2-vl and dolphin-7b, both based on qwen inside of a HF space. The layers outside of the vision tower should match and be mergeable in theory. Instead I got an error that no common parameters were found and it couldn't resolve the arch automatically.

Thanks for bringing this up! I just tested a merge using the following configuration, and it successfully resolved the architecture and performed the merge:

models:

model: cognitivecomputations/dolphin-2.6-mistral-7b
parameters:
weight: 1.0
model: mistralai/Mistral-7B-Instruct-v0.3
parameters:
weight: 1.0
merge_method: linear
dtype: float16

I'd recommend pulling the latest version of the repository and trying the same configuration to see if it resolves the issue. Let me know either way :)

Ph0rk0z · 2024-11-30T13:23:55Z

One of those isn't a vision model. I'm aware that 2 normal LLMs will merge. The point is to merge a finetuned model into a VL model created from the same base. The vision model has an extra set of layers that is related to the vision tower which just have to be copied over.

Try merging https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct with https://huggingface.co/cognitivecomputations/dolphin-2.9.2-qwen2-7b

Updated the infer_architecture logic to handle cases where architectures appear mismatched

ElliotStein · 2024-12-02T17:15:27Z

Yep you're right @Ph0rk0z - this commit should address it! Let me know how it goes :)

Ph0rk0z · 2024-12-03T18:40:35Z

Thanks. It makes an attempt now. Will have to download locally and see what comes out because on hugginface I get OOM.

Ph0rk0z · 2024-12-05T13:20:39Z

Can confirm that it merges for vision models now. Doesn't take kindly to partially merging the weights but other than that works.

• Enhanced robustness of automatic architecture resolution functions. • Improved code readability and maintainability. • Updated logging to provide more detailed and actionable information.

…cture_info. Simplify implementations of get_architecture_info and infer_architecture_info

Ph0rk0z · 2024-12-13T12:42:22Z

But why? This is good.

WYHZQ · 2025-01-10T04:02:15Z

But why? This is good.

Hello! I am also trying to merge qwen2-vl and have some questions.
When I merged two Qwen2-VL-7B-Instruct, the generated result did not save chat_template.json and preprocessor_config.json, which caused an error.
I added these two files manually and the reasoning was normal, but I feel that this is not a good way. I would like to ask how you solved it?

My configuration is as follows:
models:

model: path/Qwen2-VL-7B-Instruct
parameters:
weight: 1.0
model: path/Qwen2-VL-7B-Captioner
parameters:
weight: 1.0
merge_method: linear
parameters:
normalize: false
dtype: bfloat16

Ph0rk0z · 2025-01-10T13:04:34Z

I used the tokenizer and files from the base model. No need for the inflated tokenizer it tries to merge.

ElliotStein added 11 commits October 18, 2024 16:00

Generalise mergekit to work for any architecture: implemented Automat…

e15c24d

…icArchitectureInfo and get_model_parameter_names

Fix small error in test!

14a72e3

Fixes to auto-architecture

1571e57

update formatting with black

bcab860

update formatting with isort

0381b32

streamline architecture loading. use json if possible else use automatic

4d98a39

formatting

5833523

error fix

7b260df

Merge branch 'main' into architecture-agnostic

19c9da1

tidy up

8bd864a

missing param in gpt2 config

dcf8c31

ElliotStein changed the title ~~missing param in gpt2 config~~ Enable Support for Merging Models with Non-Standard Architectures (e.g., Multimodal Models) Oct 31, 2024

ElliotStein changed the title ~~Enable Support for Merging Models with Non-Standard Architectures (e.g., Multimodal Models)~~ Merge Models with Non-Standard Architectures (e.g., Multimodal Models) Oct 31, 2024

ElliotStein and others added 4 commits November 4, 2024 17:20

Enable autodetection and merging for submodules. If parameter names m…

5b96b97

…atch, or match when a prefix is removed (e.g. vision_block.layer.0 and layer.0), their overlapping layers can now be merged.

Merge remote-tracking branch 'origin/main' into architecture-agnostic

9cf8d3c

Update architecture.py

ee730f2

formatting

Update architecture.py

63afb68

formatting

ElliotStein force-pushed the architecture-agnostic branch from ee730f2 to 63afb68 Compare November 4, 2024 18:04

ElliotStein added 4 commits November 5, 2024 20:01

Merge branch 'architecture-agnostic' of https://github.com/arcee-ai/m…

d3f0774

…ergekit into architecture-agnostic

formatting

6cc135d

formatting

d526eb9

script to manually add weights from base model after merging submodul…

f081a0b

…e. And minor fixes.

Address PR feedback: resolve merge architecture error

84260f0

Updated the infer_architecture logic to handle cases where architectures appear mismatched

Improve robustness and logging in architecture functions

eda0db1

• Enhanced robustness of automatic architecture resolution functions. • Improved code readability and maintainability. • Updated logging to provide more detailed and actionable information.

ElliotStein added 3 commits December 9, 2024 13:43

Merge remote-tracking branch 'origin/main' into architecture-agnostic

1fcaca0

Full pytest support for VLM merges

9cfa073

refactor get_architecture_info into ArchitectureInfoUtils.get_archite…

97985c4

…cture_info. Simplify implementations of get_architecture_info and infer_architecture_info

ElliotStein closed this Dec 10, 2024

ElliotStein deleted the architecture-agnostic branch December 10, 2024 16:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge Models with Non-Standard Architectures (e.g., Multimodal Models) #450

Merge Models with Non-Standard Architectures (e.g., Multimodal Models) #450

ElliotStein commented Oct 31, 2024 •

edited

Loading

Ph0rk0z commented Nov 29, 2024

ElliotStein commented Nov 29, 2024

Ph0rk0z commented Nov 30, 2024

ElliotStein commented Dec 2, 2024

Ph0rk0z commented Dec 3, 2024

Ph0rk0z commented Dec 5, 2024

Ph0rk0z commented Dec 13, 2024

WYHZQ commented Jan 10, 2025

Ph0rk0z commented Jan 10, 2025

Merge Models with Non-Standard Architectures (e.g., Multimodal Models) #450

Merge Models with Non-Standard Architectures (e.g., Multimodal Models) #450

Conversation

ElliotStein commented Oct 31, 2024 • edited Loading

Ph0rk0z commented Nov 29, 2024

ElliotStein commented Nov 29, 2024

Ph0rk0z commented Nov 30, 2024

ElliotStein commented Dec 2, 2024

Ph0rk0z commented Dec 3, 2024

Ph0rk0z commented Dec 5, 2024

Ph0rk0z commented Dec 13, 2024

WYHZQ commented Jan 10, 2025

Ph0rk0z commented Jan 10, 2025

ElliotStein commented Oct 31, 2024 •

edited

Loading