-
Notifications
You must be signed in to change notification settings - Fork 474
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge Models with Non-Standard Architectures (e.g., Multimodal Models) #450
Conversation
…icArchitectureInfo and get_model_parameter_names
…atch, or match when a prefix is removed (e.g. vision_block.layer.0 and layer.0), their overlapping layers can now be merged.
formatting
formatting
ee730f2
to
63afb68
Compare
…ergekit into architecture-agnostic
…e. And minor fixes.
I tried to merge qwen2-vl and dolphin-7b, both based on qwen inside of a HF space. The layers outside of the vision tower should match and be mergeable in theory. Instead I got an error that no common parameters were found and it couldn't resolve the arch automatically. |
Thanks for bringing this up! I just tested a merge using the following configuration, and it successfully resolved the architecture and performed the merge: models:
I'd recommend pulling the latest version of the repository and trying the same configuration to see if it resolves the issue. Let me know either way :) |
One of those isn't a vision model. I'm aware that 2 normal LLMs will merge. The point is to merge a finetuned model into a VL model created from the same base. The vision model has an extra set of layers that is related to the vision tower which just have to be copied over. Try merging https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct with https://huggingface.co/cognitivecomputations/dolphin-2.9.2-qwen2-7b |
Updated the infer_architecture logic to handle cases where architectures appear mismatched
Yep you're right @Ph0rk0z - this commit should address it! Let me know how it goes :) |
Thanks. It makes an attempt now. Will have to download locally and see what comes out because on hugginface I get OOM. |
Can confirm that it merges for vision models now. Doesn't take kindly to partially merging the weights but other than that works. |
• Enhanced robustness of automatic architecture resolution functions. • Improved code readability and maintainability. • Updated logging to provide more detailed and actionable information.
…cture_info. Simplify implementations of get_architecture_info and infer_architecture_info
But why? This is good. |
Hello! I am also trying to merge qwen2-vl and have some questions. My configuration is as follows:
|
I used the tokenizer and files from the base model. No need for the inflated tokenizer it tries to merge. |
This update extends merge capabilities to models without predefined architecture JSON files, enabling support for models with non-standard architectures such as multimodal models.
Key Changes:
- Efficiency Note: Reading parameter names is significantly faster for models saved in .safetensors format compared to .bin.
Modifications to Core Functionality:
Additional Updates:
This change does not impact functionality for models with predefined architecture JSON specifications (i.e., all currently supported merges).