-
Notifications
You must be signed in to change notification settings - Fork 458
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added support for DeepseekV2 model #382
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for the wait
I think this is a good first stab but will need a bit more work to push it over the finish line.
Please see the comments and once you test it on your end it would be beneficial for both parties if you could mention how you tested it. It could be as simple as a yaml file for linear merging where you use the DeepseekV2
You've probably noticed the failing formatting check, so just be explicit please do address those
"layer_templates": { | ||
"weights": [ | ||
{ | ||
"name" : "model.layers.${layer_index}.self_attn.q_proj.weight" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless I'm mistaken I don't think this weight exists in the model. Please see here: https://huggingface.co/deepseek-ai/DeepSeek-V2/raw/main/model.safetensors.index.json
"model.layers.0.self_attn.q_a_proj.weight": "model-00001-of-000055.safetensors",
"model.layers.0.self_attn.q_a_layernorm.weight": "model-00001-of-000055.safetensors",
"model.layers.0.self_attn.q_b_proj.weight": "model-00001-of-000055.safetensors",
def layer_weights( | ||
self, index: int, config: PretrainedConfig | ||
) -> Optional[List[WeightInfo]]: | ||
num_experts = self.num_local_experts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Any reason for local aliasing? I see it is also present in the Mixtral code, but any reason you left it in?
@@ -353,6 +404,7 @@ def _load_all_architectures() -> ( | |||
JSON_ARCHITECTURES, NAME_TO_ARCH = _load_all_architectures() | |||
MISTRAL_INFO = _load_json_arch("mistral.json") | |||
QWEN2_INFO = _load_json_arch("qwen2.json") | |||
DEEPSEEKV2_INFO = _load_json_arch("deepseekv2.json") | |||
|
|||
|
|||
def get_architecture_info(config: PretrainedConfig) -> ArchitectureInfo: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless I am mistaken which I could be, there needs to be a clause within the get_architecture_info
function similar to Mixtral's otherwise the code https://github.com/arcee-ai/mergekit/blob/main/mergekit/merge.py#L51 will just pull in just the info associated with the template
@aditya-29 can you please respond :) sorry for the late reply. |
Thanks @metric-space and @shamanez. I didn't get a chance to go over the comments earlier. I will work on the suggested changes and reach out to you for any clarification |
Thanks a lot mate.
|
This pull request introduces the capability to merge DeepSeekV2 Mixture-of-Experts (MoE) models using MergeKit. To facilitate this, a
deepseekv2.json
configuration file has been added to the architecture directory. Additionally, a custom class analogous to Mixtral has been implemented to enable model merging based on the JSON configuration.