Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge Models with Non-Standard Architectures (e.g., Multimodal Models) #450

Closed
wants to merge 24 commits into from

Conversation

ElliotStein
Copy link

@ElliotStein ElliotStein commented Oct 31, 2024

This update extends merge capabilities to models without predefined architecture JSON files, enabling support for models with non-standard architectures such as multimodal models.

Key Changes:

  • Automatic Architecture Info Creation: For models without an architecture JSON specification in /_data/architectures, an ArchitectureInfo class is automatically generated by reading parameter names from the saved model.
    - Efficiency Note: Reading parameter names is significantly faster for models saved in .safetensors format compared to .bin.
  • Parameter Organization: While pre/post weights aren’t explicitly segregated, parameters are grouped into layers based on the pattern .{integer}. in parameter names.

Modifications to Core Functionality:

  • get_architecture_info: Now returns None and raises a warning if it fails, rather than raising an error outright.

Additional Updates:

  • Minor bug fixes to testing and GPT-2 configuration.

This change does not impact functionality for models with predefined architecture JSON specifications (i.e., all currently supported merges).

@ElliotStein ElliotStein changed the title missing param in gpt2 config Enable Support for Merging Models with Non-Standard Architectures (e.g., Multimodal Models) Oct 31, 2024
@ElliotStein ElliotStein changed the title Enable Support for Merging Models with Non-Standard Architectures (e.g., Multimodal Models) Merge Models with Non-Standard Architectures (e.g., Multimodal Models) Oct 31, 2024
ElliotStein and others added 4 commits November 4, 2024 17:20
…atch, or match when a prefix is removed (e.g. vision_block.layer.0 and layer.0), their overlapping layers can now be merged.
@ElliotStein ElliotStein force-pushed the architecture-agnostic branch from ee730f2 to 63afb68 Compare November 4, 2024 18:04
@Ph0rk0z
Copy link

Ph0rk0z commented Nov 29, 2024

I tried to merge qwen2-vl and dolphin-7b, both based on qwen inside of a HF space. The layers outside of the vision tower should match and be mergeable in theory. Instead I got an error that no common parameters were found and it couldn't resolve the arch automatically.

@ElliotStein
Copy link
Author

I tried to merge qwen2-vl and dolphin-7b, both based on qwen inside of a HF space. The layers outside of the vision tower should match and be mergeable in theory. Instead I got an error that no common parameters were found and it couldn't resolve the arch automatically.

Thanks for bringing this up! I just tested a merge using the following configuration, and it successfully resolved the architecture and performed the merge:

models:

  • model: cognitivecomputations/dolphin-2.6-mistral-7b
    parameters:
    weight: 1.0
  • model: mistralai/Mistral-7B-Instruct-v0.3
    parameters:
    weight: 1.0
    merge_method: linear
    dtype: float16

I'd recommend pulling the latest version of the repository and trying the same configuration to see if it resolves the issue. Let me know either way :)

@Ph0rk0z
Copy link

Ph0rk0z commented Nov 30, 2024

One of those isn't a vision model. I'm aware that 2 normal LLMs will merge. The point is to merge a finetuned model into a VL model created from the same base. The vision model has an extra set of layers that is related to the vision tower which just have to be copied over.

Try merging https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct with https://huggingface.co/cognitivecomputations/dolphin-2.9.2-qwen2-7b

Updated the infer_architecture logic to handle cases where architectures appear mismatched
@ElliotStein
Copy link
Author

Yep you're right @Ph0rk0z - this commit should address it! Let me know how it goes :)

@Ph0rk0z
Copy link

Ph0rk0z commented Dec 3, 2024

Thanks. It makes an attempt now. Will have to download locally and see what comes out because on hugginface I get OOM.

@Ph0rk0z
Copy link

Ph0rk0z commented Dec 5, 2024

Can confirm that it merges for vision models now. Doesn't take kindly to partially merging the weights but other than that works.

	•	Enhanced robustness of automatic architecture resolution functions.
	•	Improved code readability and maintainability.
	•	Updated logging to provide more detailed and actionable information.
@ElliotStein ElliotStein deleted the architecture-agnostic branch December 10, 2024 16:08
@Ph0rk0z
Copy link

Ph0rk0z commented Dec 13, 2024

But why? This is good.

@WYHZQ
Copy link

WYHZQ commented Jan 10, 2025

But why? This is good.

Hello! I am also trying to merge qwen2-vl and have some questions.
When I merged two Qwen2-VL-7B-Instruct, the generated result did not save chat_template.json and preprocessor_config.json, which caused an error.
I added these two files manually and the reasoning was normal, but I feel that this is not a good way. I would like to ask how you solved it?

My configuration is as follows:
models:

  • model: path/Qwen2-VL-7B-Instruct
    parameters:
    weight: 1.0
  • model: path/Qwen2-VL-7B-Captioner
    parameters:
    weight: 1.0
    merge_method: linear
    parameters:
    normalize: false
    dtype: bfloat16

@Ph0rk0z
Copy link

Ph0rk0z commented Jan 10, 2025

I used the tokenizer and files from the base model. No need for the inflated tokenizer it tries to merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants