The table below introcudes all models supported by SWIFT:
- Model List: The model_type information registered in SWIFT.
- Default Lora Target Modules: Default lora_target_modules used by the model.
- Default Template: Default template used by the model.
- Support Flash Attn: Whether the model supports flash attention to accelerate sft and infer.
- Support VLLM: Whether the model supports vllm to accelerate infer and deployment.
- Requires: The extra requirements used by the model.
Model Type | Model ID | Default Lora Target Modules | Default Template | Support Flash Attn | Support VLLM | Requires | Tags | HF Model ID |
---|---|---|---|---|---|---|---|---|
qwen-1_8b | qwen/Qwen-1_8B | c_attn | default-generation | ✔ | ✔ | - | Qwen/Qwen1.5-1.8B | |
qwen-1_8b-chat | qwen/Qwen-1_8B-Chat | c_attn | qwen | ✔ | ✔ | - | Qwen/Qwen-1_8B-Chat | |
qwen-1_8b-chat-int4 | qwen/Qwen-1_8B-Chat-Int4 | c_attn | qwen | ✔ | ✔ | auto_gptq>=0.5 | - | Qwen/Qwen-1_8B-Chat-Int4 |
qwen-1_8b-chat-int8 | qwen/Qwen-1_8B-Chat-Int8 | c_attn | qwen | ✔ | ✘ | auto_gptq>=0.5 | - | Qwen/Qwen-1_8B-Chat-Int8 |
qwen-7b | qwen/Qwen-7B | c_attn | default-generation | ✔ | ✔ | - | Qwen/Qwen-7B | |
qwen-7b-chat | qwen/Qwen-7B-Chat | c_attn | qwen | ✔ | ✔ | - | Qwen/Qwen-7B-Chat | |
qwen-7b-chat-int4 | qwen/Qwen-7B-Chat-Int4 | c_attn | qwen | ✔ | ✔ | auto_gptq>=0.5 | - | Qwen/Qwen-7B-Chat-Int4 |
qwen-7b-chat-int8 | qwen/Qwen-7B-Chat-Int8 | c_attn | qwen | ✔ | ✘ | auto_gptq>=0.5 | - | Qwen/Qwen-7B-Chat-Int8 |
qwen-14b | qwen/Qwen-14B | c_attn | default-generation | ✔ | ✔ | - | Qwen/Qwen-14B | |
qwen-14b-chat | qwen/Qwen-14B-Chat | c_attn | qwen | ✔ | ✔ | - | Qwen/Qwen-14B-Chat | |
qwen-14b-chat-int4 | qwen/Qwen-14B-Chat-Int4 | c_attn | qwen | ✔ | ✔ | auto_gptq>=0.5 | - | Qwen/Qwen-14B-Chat-Int4 |
qwen-14b-chat-int8 | qwen/Qwen-14B-Chat-Int8 | c_attn | qwen | ✔ | ✘ | auto_gptq>=0.5 | - | Qwen/Qwen-14B-Chat-Int8 |
qwen-72b | qwen/Qwen-72B | c_attn | default-generation | ✔ | ✔ | - | Qwen/Qwen-72B | |
qwen-72b-chat | qwen/Qwen-72B-Chat | c_attn | qwen | ✔ | ✔ | - | Qwen/Qwen-72B-Chat | |
qwen-72b-chat-int4 | qwen/Qwen-72B-Chat-Int4 | c_attn | qwen | ✔ | ✔ | auto_gptq>=0.5 | - | Qwen/Qwen-72B-Chat-Int4 |
qwen-72b-chat-int8 | qwen/Qwen-72B-Chat-Int8 | c_attn | qwen | ✔ | ✘ | auto_gptq>=0.5 | - | Qwen/Qwen-72B-Chat-Int8 |
qwen1half-0_5b | qwen/Qwen1.5-0.5B | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | transformers>=4.37 | - | Qwen/Qwen1.5-0.5B |
qwen1half-1_8b | qwen/Qwen1.5-1.8B | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | transformers>=4.37 | - | Qwen/Qwen1.5-1.8B |
qwen1half-4b | qwen/Qwen1.5-4B | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | transformers>=4.37 | - | Qwen/Qwen1.5-4B |
qwen1half-7b | qwen/Qwen1.5-7B | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | transformers>=4.37 | - | Qwen/Qwen1.5-7B |
qwen1half-14b | qwen/Qwen1.5-14B | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | transformers>=4.37 | - | Qwen/Qwen1.5-14B |
qwen1half-32b | qwen/Qwen1.5-32B | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | transformers>=4.37 | - | Qwen/Qwen1.5-32B |
qwen1half-72b | qwen/Qwen1.5-72B | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | transformers>=4.37 | - | Qwen/Qwen1.5-72B |
codeqwen1half-7b | qwen/CodeQwen1.5-7B | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | transformers>=4.37 | - | Qwen/CodeQwen1.5-7B |
qwen1half-moe-a2_7b | qwen/Qwen1.5-MoE-A2.7B | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | transformers>=4.37 | - | Qwen/Qwen1.5-MoE-A2.7B |
qwen1half-0_5b-chat | qwen/Qwen1.5-0.5B-Chat | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | transformers>=4.37 | - | Qwen/Qwen1.5-0.5B-Chat |
qwen1half-1_8b-chat | qwen/Qwen1.5-1.8B-Chat | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | transformers>=4.37 | - | Qwen/Qwen1.5-1.8B-Chat |
qwen1half-4b-chat | qwen/Qwen1.5-4B-Chat | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | transformers>=4.37 | - | Qwen/Qwen1.5-4B-Chat |
qwen1half-7b-chat | qwen/Qwen1.5-7B-Chat | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | transformers>=4.37 | - | Qwen/Qwen1.5-7B-Chat |
qwen1half-14b-chat | qwen/Qwen1.5-14B-Chat | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | transformers>=4.37 | - | Qwen/Qwen1.5-14B-Chat |
qwen1half-32b-chat | qwen/Qwen1.5-32B-Chat | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | transformers>=4.37 | - | Qwen/Qwen1.5-32B-Chat |
qwen1half-72b-chat | qwen/Qwen1.5-72B-Chat | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | transformers>=4.37 | - | Qwen/Qwen1.5-72B-Chat |
qwen1half-moe-a2_7b-chat | qwen/Qwen1.5-MoE-A2.7B-Chat | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | transformers>=4.37 | - | Qwen/Qwen1.5-MoE-A2.7B-Chat |
codeqwen1half-7b-chat | qwen/CodeQwen1.5-7B-Chat | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | transformers>=4.37 | - | Qwen/CodeQwen1.5-7B-Chat |
qwen1half-0_5b-chat-int4 | qwen/Qwen1.5-0.5B-Chat-GPTQ-Int4 | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | auto_gptq>=0.5, transformers>=4.37 | - | Qwen/Qwen1.5-0.5B-Chat-GPTQ-Int4 |
qwen1half-1_8b-chat-int4 | qwen/Qwen1.5-1.8B-Chat-GPTQ-Int4 | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | auto_gptq>=0.5, transformers>=4.37 | - | Qwen/Qwen1.5-1.8B-Chat-GPTQ-Int4 |
qwen1half-4b-chat-int4 | qwen/Qwen1.5-4B-Chat-GPTQ-Int4 | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | auto_gptq>=0.5, transformers>=4.37 | - | Qwen/Qwen1.5-4B-Chat-GPTQ-Int4 |
qwen1half-7b-chat-int4 | qwen/Qwen1.5-7B-Chat-GPTQ-Int4 | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | auto_gptq>=0.5, transformers>=4.37 | - | Qwen/Qwen1.5-7B-Chat-GPTQ-Int4 |
qwen1half-14b-chat-int4 | qwen/Qwen1.5-14B-Chat-GPTQ-Int4 | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | auto_gptq>=0.5, transformers>=4.37 | - | Qwen/Qwen1.5-14B-Chat-GPTQ-Int4 |
qwen1half-32b-chat-int4 | qwen/Qwen1.5-32B-Chat-GPTQ-Int4 | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | auto_gptq>=0.5, transformers>=4.37 | - | Qwen/Qwen1.5-32B-Chat-GPTQ-Int4 |
qwen1half-72b-chat-int4 | qwen/Qwen1.5-72B-Chat-GPTQ-Int4 | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | auto_gptq>=0.5, transformers>=4.37 | - | Qwen/Qwen1.5-72B-Chat-GPTQ-Int4 |
qwen1half-0_5b-chat-int8 | qwen/Qwen1.5-0.5B-Chat-GPTQ-Int8 | q_proj, k_proj, v_proj | qwen | ✔ | ✘ | auto_gptq>=0.5, transformers>=4.37 | - | Qwen/Qwen1.5-0.5B-Chat-GPTQ-Int8 |
qwen1half-1_8b-chat-int8 | qwen/Qwen1.5-1.8B-Chat-GPTQ-Int8 | q_proj, k_proj, v_proj | qwen | ✔ | ✘ | auto_gptq>=0.5, transformers>=4.37 | - | Qwen/Qwen1.5-1.8B-Chat-GPTQ-Int8 |
qwen1half-4b-chat-int8 | qwen/Qwen1.5-4B-Chat-GPTQ-Int8 | q_proj, k_proj, v_proj | qwen | ✔ | ✘ | auto_gptq>=0.5, transformers>=4.37 | - | Qwen/Qwen1.5-4B-Chat-GPTQ-Int8 |
qwen1half-7b-chat-int8 | qwen/Qwen1.5-7B-Chat-GPTQ-Int8 | q_proj, k_proj, v_proj | qwen | ✔ | ✘ | auto_gptq>=0.5, transformers>=4.37 | - | Qwen/Qwen1.5-7B-Chat-GPTQ-Int8 |
qwen1half-14b-chat-int8 | qwen/Qwen1.5-14B-Chat-GPTQ-Int8 | q_proj, k_proj, v_proj | qwen | ✔ | ✘ | auto_gptq>=0.5, transformers>=4.37 | - | Qwen/Qwen1.5-14B-Chat-GPTQ-Int8 |
qwen1half-72b-chat-int8 | qwen/Qwen1.5-72B-Chat-GPTQ-Int8 | q_proj, k_proj, v_proj | qwen | ✔ | ✘ | auto_gptq>=0.5, transformers>=4.37 | - | Qwen/Qwen1.5-72B-Chat-GPTQ-Int8 |
qwen1half-moe-a2_7b-chat-int4 | qwen/Qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4 | q_proj, k_proj, v_proj | qwen | ✔ | ✘ | auto_gptq>=0.5, transformers>=4.37 | - | Qwen/Qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4 |
qwen1half-0_5b-chat-awq | qwen/Qwen1.5-0.5B-Chat-AWQ | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | transformers>=4.37, autoawq | - | Qwen/Qwen1.5-0.5B-Chat-AWQ |
qwen1half-1_8b-chat-awq | qwen/Qwen1.5-1.8B-Chat-AWQ | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | transformers>=4.37, autoawq | - | Qwen/Qwen1.5-1.8B-Chat-AWQ |
qwen1half-4b-chat-awq | qwen/Qwen1.5-4B-Chat-AWQ | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | transformers>=4.37, autoawq | - | Qwen/Qwen1.5-4B-Chat-AWQ |
qwen1half-7b-chat-awq | qwen/Qwen1.5-7B-Chat-AWQ | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | transformers>=4.37, autoawq | - | Qwen/Qwen1.5-7B-Chat-AWQ |
qwen1half-14b-chat-awq | qwen/Qwen1.5-14B-Chat-AWQ | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | transformers>=4.37, autoawq | - | Qwen/Qwen1.5-14B-Chat-AWQ |
qwen1half-32b-chat-awq | qwen/Qwen1.5-32B-Chat-AWQ | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | transformers>=4.37, autoawq | - | Qwen/Qwen1.5-32B-Chat-AWQ |
qwen1half-72b-chat-awq | qwen/Qwen1.5-72B-Chat-AWQ | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | transformers>=4.37, autoawq | - | Qwen/Qwen1.5-72B-Chat-AWQ |
codeqwen1half-7b-chat-awq | qwen/CodeQwen1.5-7B-Chat-AWQ | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | transformers>=4.37, autoawq | - | Qwen/CodeQwen1.5-7B-Chat-AWQ |
qwen-vl | qwen/Qwen-VL | c_attn | default-generation | ✔ | ✘ | multi-modal, vision | Qwen/Qwen-VL | |
qwen-vl-chat | qwen/Qwen-VL-Chat | c_attn | qwen | ✔ | ✘ | multi-modal, vision | Qwen/Qwen-VL-Chat | |
qwen-vl-chat-int4 | qwen/Qwen-VL-Chat-Int4 | c_attn | qwen | ✔ | ✘ | auto_gptq>=0.5 | multi-modal, vision | Qwen/Qwen-VL-Chat-Int4 |
qwen-audio | qwen/Qwen-Audio | c_attn | qwen-audio-generation | ✔ | ✘ | multi-modal, audio | Qwen/Qwen-Audio | |
qwen-audio-chat | qwen/Qwen-Audio-Chat | c_attn | qwen-audio | ✔ | ✘ | multi-modal, audio | Qwen/Qwen-Audio-Chat | |
chatglm2-6b | ZhipuAI/chatglm2-6b | query_key_value | chatglm2 | ✘ | ✔ | - | THUDM/chatglm2-6b | |
chatglm2-6b-32k | ZhipuAI/chatglm2-6b-32k | query_key_value | chatglm2 | ✘ | ✔ | - | THUDM/chatglm2-6b-32k | |
chatglm3-6b-base | ZhipuAI/chatglm3-6b-base | query_key_value | chatglm-generation | ✘ | ✔ | - | THUDM/chatglm3-6b-base | |
chatglm3-6b | ZhipuAI/chatglm3-6b | query_key_value | chatglm3 | ✘ | ✔ | - | THUDM/chatglm3-6b | |
chatglm3-6b-32k | ZhipuAI/chatglm3-6b-32k | query_key_value | chatglm3 | ✘ | ✔ | - | THUDM/chatglm3-6b-32k | |
codegeex2-6b | ZhipuAI/codegeex2-6b | query_key_value | chatglm-generation | ✘ | ✔ | transformers<4.34 | coding | THUDM/codegeex2-6b |
llama2-7b | modelscope/Llama-2-7b-ms | q_proj, k_proj, v_proj | default-generation-bos | ✔ | ✔ | - | meta-llama/Llama-2-7b-hf | |
llama2-7b-chat | modelscope/Llama-2-7b-chat-ms | q_proj, k_proj, v_proj | llama | ✔ | ✔ | - | meta-llama/Llama-2-7b-chat-hf | |
llama2-13b | modelscope/Llama-2-13b-ms | q_proj, k_proj, v_proj | default-generation-bos | ✔ | ✔ | - | meta-llama/Llama-2-13b-hf | |
llama2-13b-chat | modelscope/Llama-2-13b-chat-ms | q_proj, k_proj, v_proj | llama | ✔ | ✔ | - | meta-llama/Llama-2-13b-chat-hf | |
llama2-70b | modelscope/Llama-2-70b-ms | q_proj, k_proj, v_proj | default-generation-bos | ✔ | ✔ | - | meta-llama/Llama-2-70b-hf | |
llama2-70b-chat | modelscope/Llama-2-70b-chat-ms | q_proj, k_proj, v_proj | llama | ✔ | ✔ | - | meta-llama/Llama-2-70b-chat-hf | |
llama2-7b-aqlm-2bit-1x16 | AI-ModelScope/Llama-2-7b-AQLM-2Bit-1x16-hf | q_proj, k_proj, v_proj | default-generation-bos | ✔ | ✘ | transformers>=4.38, aqlm, torch>=2.2.0 | - | ISTA-DASLab/Llama-2-7b-AQLM-2Bit-1x16-hf |
llama3-8b | LLM-Research/Meta-Llama-3-8B | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | - | meta-llama/Meta-Llama-3-8B | |
llama3-8b-instruct | LLM-Research/Meta-Llama-3-8B-Instruct | q_proj, k_proj, v_proj | llama3 | ✔ | ✔ | - | meta-llama/Meta-Llama-3-8B-Instruct | |
llama3-70b | LLM-Research/Meta-Llama-3-70B | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | - | meta-llama/Meta-Llama-3-70B | |
llama3-70b-instruct | LLM-Research/Meta-Llama-3-70B-Instruct | q_proj, k_proj, v_proj | llama3 | ✔ | ✔ | - | meta-llama/Meta-Llama-3-70B-Instruct | |
llava1d6-mistral-7b-instruct | AI-ModelScope/llava-v1.6-mistral-7b | q_proj, k_proj, v_proj | llava-mistral-instruct | ✔ | ✘ | transformers>=4.34 | multi-modal, vision | liuhaotian/llava-v1.6-mistral-7b |
llava1d6-yi-34b-instruct | AI-ModelScope/llava-v1.6-34b | q_proj, k_proj, v_proj | llava-yi-instruct | ✔ | ✘ | multi-modal, vision | liuhaotian/llava-v1.6-34b | |
yi-6b | 01ai/Yi-6B | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | - | 01-ai/Yi-6B | |
yi-6b-200k | 01ai/Yi-6B-200K | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | - | 01-ai/Yi-6B-200K | |
yi-6b-chat | 01ai/Yi-6B-Chat | q_proj, k_proj, v_proj | yi | ✔ | ✔ | - | 01-ai/Yi-6B-Chat | |
yi-6b-chat-awq | 01ai/Yi-6B-Chat-4bits | q_proj, k_proj, v_proj | yi | ✔ | ✔ | autoawq | - | 01-ai/Yi-6B-Chat-4bits |
yi-6b-chat-int8 | 01ai/Yi-6B-Chat-8bits | q_proj, k_proj, v_proj | yi | ✔ | ✔ | auto_gptq | - | 01-ai/Yi-6B-Chat-8bits |
yi-9b | 01ai/Yi-9B | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | - | 01-ai/Yi-9B | |
yi-9b-200k | 01ai/Yi-9B-200K | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | - | 01-ai/Yi-9B-200K | |
yi-34b | 01ai/Yi-34B | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | - | 01-ai/Yi-34B | |
yi-34b-200k | 01ai/Yi-34B-200K | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | - | 01-ai/Yi-34B-200K | |
yi-34b-chat | 01ai/Yi-34B-Chat | q_proj, k_proj, v_proj | yi | ✔ | ✔ | - | 01-ai/Yi-34B-Chat | |
yi-34b-chat-awq | 01ai/Yi-34B-Chat-4bits | q_proj, k_proj, v_proj | yi | ✔ | ✔ | autoawq | - | 01-ai/Yi-34B-Chat-4bits |
yi-34b-chat-int8 | 01ai/Yi-34B-Chat-8bits | q_proj, k_proj, v_proj | yi | ✔ | ✔ | auto_gptq | - | 01-ai/Yi-34B-Chat-8bits |
yi-vl-6b-chat | 01ai/Yi-VL-6B | q_proj, k_proj, v_proj | yi-vl | ✔ | ✘ | transformers>=4.34 | multi-modal, vision | 01-ai/Yi-VL-6B |
yi-vl-34b-chat | 01ai/Yi-VL-34B | q_proj, k_proj, v_proj | yi-vl | ✔ | ✘ | transformers>=4.34 | multi-modal, vision | 01-ai/Yi-VL-34B |
internlm-7b | Shanghai_AI_Laboratory/internlm-7b | q_proj, k_proj, v_proj | default-generation-bos | ✘ | ✔ | - | internlm/internlm-7b | |
internlm-7b-chat | Shanghai_AI_Laboratory/internlm-chat-7b | q_proj, k_proj, v_proj | internlm | ✘ | ✔ | - | internlm/internlm-chat-7b | |
internlm-7b-chat-8k | Shanghai_AI_Laboratory/internlm-chat-7b-8k | q_proj, k_proj, v_proj | internlm | ✘ | ✔ | - | - | |
internlm-20b | Shanghai_AI_Laboratory/internlm-20b | q_proj, k_proj, v_proj | default-generation-bos | ✘ | ✔ | - | internlm/internlm2-20b | |
internlm-20b-chat | Shanghai_AI_Laboratory/internlm-chat-20b | q_proj, k_proj, v_proj | internlm | ✘ | ✔ | - | internlm/internlm2-chat-20b | |
internlm2-1_8b | Shanghai_AI_Laboratory/internlm2-1_8b | wqkv | default-generation-bos | ✔ | ✔ | - | internlm/internlm2-1_8b | |
internlm2-1_8b-sft-chat | Shanghai_AI_Laboratory/internlm2-chat-1_8b-sft | wqkv | internlm2 | ✔ | ✔ | - | internlm/internlm2-chat-1_8b-sft | |
internlm2-1_8b-chat | Shanghai_AI_Laboratory/internlm2-chat-1_8b | wqkv | internlm2 | ✔ | ✔ | - | internlm/internlm2-chat-1_8b | |
internlm2-7b-base | Shanghai_AI_Laboratory/internlm2-base-7b | wqkv | default-generation-bos | ✔ | ✔ | - | internlm/internlm2-base-7b | |
internlm2-7b | Shanghai_AI_Laboratory/internlm2-7b | wqkv | default-generation-bos | ✔ | ✔ | - | internlm/internlm2-7b | |
internlm2-7b-sft-chat | Shanghai_AI_Laboratory/internlm2-chat-7b-sft | wqkv | internlm2 | ✔ | ✔ | - | internlm/internlm2-chat-7b-sft | |
internlm2-7b-chat | Shanghai_AI_Laboratory/internlm2-chat-7b | wqkv | internlm2 | ✔ | ✔ | - | internlm/internlm2-chat-7b | |
internlm2-20b-base | Shanghai_AI_Laboratory/internlm2-base-20b | wqkv | default-generation-bos | ✔ | ✔ | - | internlm/internlm2-base-20b | |
internlm2-20b | Shanghai_AI_Laboratory/internlm2-20b | wqkv | default-generation-bos | ✔ | ✔ | - | internlm/internlm2-20b | |
internlm2-20b-sft-chat | Shanghai_AI_Laboratory/internlm2-chat-20b-sft | wqkv | internlm2 | ✔ | ✔ | - | internlm/internlm2-chat-20b-sft | |
internlm2-20b-chat | Shanghai_AI_Laboratory/internlm2-chat-20b | wqkv | internlm2 | ✔ | ✔ | - | internlm/internlm2-chat-20b | |
internlm2-math-7b | Shanghai_AI_Laboratory/internlm2-math-base-7b | wqkv | default-generation-bos | ✔ | ✔ | math | internlm/internlm2-math-base-7b | |
internlm2-math-7b-chat | Shanghai_AI_Laboratory/internlm2-math-7b | wqkv | internlm2 | ✔ | ✔ | math | internlm/internlm2-math-7b | |
internlm2-math-20b | Shanghai_AI_Laboratory/internlm2-math-base-20b | wqkv | default-generation-bos | ✔ | ✔ | math | internlm/internlm2-math-base-20b | |
internlm2-math-20b-chat | Shanghai_AI_Laboratory/internlm2-math-20b | wqkv | internlm2 | ✔ | ✔ | math | internlm/internlm2-math-20b | |
internlm-xcomposer2-7b-chat | Shanghai_AI_Laboratory/internlm-xcomposer2-7b | wqkv | internlm-xcomposer2 | ✔ | ✘ | multi-modal, vision | internlm/internlm-xcomposer2-7b | |
deepseek-7b | deepseek-ai/deepseek-llm-7b-base | q_proj, k_proj, v_proj | default-generation-bos | ✔ | ✔ | - | deepseek-ai/deepseek-llm-7b-base | |
deepseek-7b-chat | deepseek-ai/deepseek-llm-7b-chat | q_proj, k_proj, v_proj | deepseek | ✔ | ✔ | - | deepseek-ai/deepseek-llm-7b-chat | |
deepseek-moe-16b | deepseek-ai/deepseek-moe-16b-base | q_proj, k_proj, v_proj | default-generation-bos | ✔ | ✔ | - | deepseek-ai/deepseek-moe-16b-base | |
deepseek-moe-16b-chat | deepseek-ai/deepseek-moe-16b-chat | q_proj, k_proj, v_proj | deepseek | ✔ | ✔ | - | deepseek-ai/deepseek-moe-16b-chat | |
deepseek-67b | deepseek-ai/deepseek-llm-67b-base | q_proj, k_proj, v_proj | default-generation-bos | ✔ | ✔ | - | deepseek-ai/deepseek-llm-67b-base | |
deepseek-67b-chat | deepseek-ai/deepseek-llm-67b-chat | q_proj, k_proj, v_proj | deepseek | ✔ | ✔ | - | deepseek-ai/deepseek-llm-67b-chat | |
deepseek-coder-1_3b | deepseek-ai/deepseek-coder-1.3b-base | q_proj, k_proj, v_proj | default-generation-bos | ✔ | ✔ | coding | deepseek-ai/deepseek-coder-1.3b-base | |
deepseek-coder-1_3b-instruct | deepseek-ai/deepseek-coder-1.3b-instruct | q_proj, k_proj, v_proj | deepseek-coder | ✔ | ✔ | coding | deepseek-ai/deepseek-coder-1.3b-instruct | |
deepseek-coder-6_7b | deepseek-ai/deepseek-coder-6.7b-base | q_proj, k_proj, v_proj | default-generation-bos | ✔ | ✔ | coding | deepseek-ai/deepseek-coder-6.7b-base | |
deepseek-coder-6_7b-instruct | deepseek-ai/deepseek-coder-6.7b-instruct | q_proj, k_proj, v_proj | deepseek-coder | ✔ | ✔ | coding | deepseek-ai/deepseek-coder-6.7b-instruct | |
deepseek-coder-33b | deepseek-ai/deepseek-coder-33b-base | q_proj, k_proj, v_proj | default-generation-bos | ✔ | ✔ | coding | deepseek-ai/deepseek-coder-33b-base | |
deepseek-coder-33b-instruct | deepseek-ai/deepseek-coder-33b-instruct | q_proj, k_proj, v_proj | deepseek-coder | ✔ | ✔ | coding | deepseek-ai/deepseek-coder-33b-instruct | |
deepseek-math-7b | deepseek-ai/deepseek-math-7b-base | q_proj, k_proj, v_proj | default-generation-bos | ✔ | ✔ | math | deepseek-ai/deepseek-math-7b-base | |
deepseek-math-7b-instruct | deepseek-ai/deepseek-math-7b-instruct | q_proj, k_proj, v_proj | deepseek | ✔ | ✔ | math | deepseek-ai/deepseek-math-7b-instruct | |
deepseek-math-7b-chat | deepseek-ai/deepseek-math-7b-rl | q_proj, k_proj, v_proj | deepseek | ✔ | ✔ | math | deepseek-ai/deepseek-math-7b-rl | |
deepseek-vl-1_3b-chat | deepseek-ai/deepseek-vl-1.3b-chat | q_proj, k_proj, v_proj | deepseek-vl | ✔ | ✘ | multi-modal, vision | deepseek-ai/deepseek-vl-1.3b-chat | |
deepseek-vl-7b-chat | deepseek-ai/deepseek-vl-7b-chat | q_proj, k_proj, v_proj | deepseek-vl | ✔ | ✘ | multi-modal, vision | deepseek-ai/deepseek-vl-7b-chat | |
gemma-2b | AI-ModelScope/gemma-2b | q_proj, k_proj, v_proj | default-generation-bos | ✔ | ✔ | transformers>=4.38 | - | google/gemma-2b |
gemma-7b | AI-ModelScope/gemma-7b | q_proj, k_proj, v_proj | default-generation-bos | ✔ | ✔ | transformers>=4.38 | - | google/gemma-7b |
gemma-2b-instruct | AI-ModelScope/gemma-2b-it | q_proj, k_proj, v_proj | gemma | ✔ | ✔ | transformers>=4.38 | - | google/gemma-2b-it |
gemma-7b-instruct | AI-ModelScope/gemma-7b-it | q_proj, k_proj, v_proj | gemma | ✔ | ✔ | transformers>=4.38 | - | google/gemma-7b-it |
minicpm-1b-sft-chat | OpenBMB/MiniCPM-1B-sft-bf16 | q_proj, k_proj, v_proj | minicpm | ✔ | ✔ | transformers>=4.36.0 | - | openbmb/MiniCPM-1B-sft-bf16 |
minicpm-2b-sft-chat | OpenBMB/MiniCPM-2B-sft-fp32 | q_proj, k_proj, v_proj | minicpm | ✔ | ✔ | - | openbmb/MiniCPM-2B-sft-fp32 | |
minicpm-2b-chat | OpenBMB/MiniCPM-2B-dpo-fp32 | q_proj, k_proj, v_proj | minicpm | ✔ | ✔ | - | openbmb/MiniCPM-2B-dpo-fp32 | |
minicpm-2b-128k | OpenBMB/MiniCPM-2B-128k | q_proj, k_proj, v_proj | chatml | ✔ | ✔ | transformers>=4.36.0 | - | openbmb/MiniCPM-2B-128k |
minicpm-moe-8x2b | OpenBMB/MiniCPM-MoE-8x2B | q_proj, k_proj, v_proj | minicpm | ✔ | ✔ | transformers>=4.36.0 | - | openbmb/MiniCPM-MoE-8x2B |
minicpm-v-3b-chat | OpenBMB/MiniCPM-V | q_proj, k_proj, v_proj | minicpm-v | ✔ | ✘ | - | openbmb/MiniCPM-V | |
minicpm-v-v2 | OpenBMB/MiniCPM-V-2 | q_proj, k_proj, v_proj | minicpm-v | ✔ | ✘ | - | openbmb/MiniCPM-V-2 | |
openbuddy-llama2-13b-chat | OpenBuddy/openbuddy-llama2-13b-v8.1-fp16 | q_proj, k_proj, v_proj | openbuddy | ✔ | ✔ | - | OpenBuddy/openbuddy-llama2-13b-v8.1-fp16 | |
openbuddy-llama-65b-chat | OpenBuddy/openbuddy-llama-65b-v8-bf16 | q_proj, k_proj, v_proj | openbuddy | ✔ | ✔ | - | OpenBuddy/openbuddy-llama-65b-v8-bf16 | |
openbuddy-llama2-70b-chat | OpenBuddy/openbuddy-llama2-70b-v10.1-bf16 | q_proj, k_proj, v_proj | openbuddy | ✔ | ✔ | - | OpenBuddy/openbuddy-llama2-70b-v10.1-bf16 | |
openbuddy-mistral-7b-chat | OpenBuddy/openbuddy-mistral-7b-v17.1-32k | q_proj, k_proj, v_proj | openbuddy | ✔ | ✔ | transformers>=4.34 | - | OpenBuddy/openbuddy-mistral-7b-v17.1-32k |
openbuddy-zephyr-7b-chat | OpenBuddy/openbuddy-zephyr-7b-v14.1 | q_proj, k_proj, v_proj | openbuddy | ✔ | ✔ | transformers>=4.34 | - | OpenBuddy/openbuddy-zephyr-7b-v14.1 |
openbuddy-deepseek-67b-chat | OpenBuddy/openbuddy-deepseek-67b-v15.2 | q_proj, k_proj, v_proj | openbuddy | ✔ | ✔ | - | OpenBuddy/openbuddy-deepseek-67b-v15.2 | |
openbuddy-mixtral-moe-7b-chat | OpenBuddy/openbuddy-mixtral-7bx8-v18.1-32k | q_proj, k_proj, v_proj | openbuddy | ✔ | ✔ | transformers>=4.36 | - | OpenBuddy/openbuddy-mixtral-7bx8-v18.1-32k |
mistral-7b | AI-ModelScope/Mistral-7B-v0.1 | q_proj, k_proj, v_proj | default-generation-bos | ✔ | ✔ | transformers>=4.34 | - | mistralai/Mistral-7B-v0.1 |
mistral-7b-v2 | AI-ModelScope/Mistral-7B-v0.2-hf | q_proj, k_proj, v_proj | default-generation-bos | ✔ | ✔ | transformers>=4.34 | - | alpindale/Mistral-7B-v0.2-hf |
mistral-7b-instruct | AI-ModelScope/Mistral-7B-Instruct-v0.1 | q_proj, k_proj, v_proj | llama | ✔ | ✔ | transformers>=4.34 | - | mistralai/Mistral-7B-Instruct-v0.1 |
mistral-7b-instruct-v2 | AI-ModelScope/Mistral-7B-Instruct-v0.2 | q_proj, k_proj, v_proj | llama | ✔ | ✔ | transformers>=4.34 | - | mistralai/Mistral-7B-Instruct-v0.2 |
mixtral-moe-7b | AI-ModelScope/Mixtral-8x7B-v0.1 | q_proj, k_proj, v_proj | default-generation-bos | ✔ | ✔ | transformers>=4.36 | - | mistralai/Mixtral-8x7B-v0.1 |
mixtral-moe-7b-instruct | AI-ModelScope/Mixtral-8x7B-Instruct-v0.1 | q_proj, k_proj, v_proj | llama | ✔ | ✔ | transformers>=4.36 | - | mistralai/Mixtral-8x7B-Instruct-v0.1 |
mixtral-moe-7b-aqlm-2bit-1x16 | AI-ModelScope/Mixtral-8x7b-AQLM-2Bit-1x16-hf | q_proj, k_proj, v_proj | default-generation-bos | ✔ | ✘ | transformers>=4.38, aqlm, torch>=2.2.0 | - | ISTA-DASLab/Mixtral-8x7b-AQLM-2Bit-1x16-hf |
mixtral-moe-8x22b-v1 | AI-ModelScope/Mixtral-8x22B-v0.1 | q_proj, k_proj, v_proj | default-generation-bos | ✔ | ✔ | transformers>=4.36 | - | mistral-community/Mixtral-8x22B-v0.1 |
wizardlm2-7b-awq | AI-ModelScope/WizardLM-2-7B-AWQ | q_proj, k_proj, v_proj | wizardlm2-awq | ✔ | ✔ | transformers>=4.34 | - | MaziyarPanahi/WizardLM-2-7B-AWQ |
wizardlm2-8x22b | AI-ModelScope/WizardLM-2-8x22B | q_proj, k_proj, v_proj | wizardlm2 | ✔ | ✔ | transformers>=4.36 | - | alpindale/WizardLM-2-8x22B |
baichuan-7b | baichuan-inc/baichuan-7B | W_pack | default-generation | ✘ | ✔ | transformers<4.34 | - | baichuan-inc/Baichuan-7B |
baichuan-13b | baichuan-inc/Baichuan-13B-Base | W_pack | default-generation | ✘ | ✔ | transformers<4.34 | - | baichuan-inc/Baichuan-13B-Base |
baichuan-13b-chat | baichuan-inc/Baichuan-13B-Chat | W_pack | baichuan | ✘ | ✔ | transformers<4.34 | - | baichuan-inc/Baichuan-13B-Chat |
baichuan2-7b | baichuan-inc/Baichuan2-7B-Base | W_pack | default-generation | ✘ | ✔ | - | baichuan-inc/Baichuan2-7B-Base | |
baichuan2-7b-chat | baichuan-inc/Baichuan2-7B-Chat | W_pack | baichuan | ✘ | ✔ | - | baichuan-inc/Baichuan2-7B-Chat | |
baichuan2-7b-chat-int4 | baichuan-inc/Baichuan2-7B-Chat-4bits | W_pack | baichuan | ✘ | ✘ | bitsandbytes<0.41.2, accelerate<0.26 | - | baichuan-inc/Baichuan2-7B-Chat-4bits |
baichuan2-13b | baichuan-inc/Baichuan2-13B-Base | W_pack | default-generation | ✘ | ✔ | - | baichuan-inc/Baichuan2-13B-Base | |
baichuan2-13b-chat | baichuan-inc/Baichuan2-13B-Chat | W_pack | baichuan | ✘ | ✔ | - | baichuan-inc/Baichuan2-13B-Chat | |
baichuan2-13b-chat-int4 | baichuan-inc/Baichuan2-13B-Chat-4bits | W_pack | baichuan | ✘ | ✘ | bitsandbytes<0.41.2, accelerate<0.26 | - | baichuan-inc/Baichuan2-13B-Chat-4bits |
mplug-owl2-chat | iic/mPLUG-Owl2 | q_proj, k_proj.multiway.0, k_proj.multiway.1, v_proj.multiway.0, v_proj.multiway.1 | mplug-owl2 | ✔ | ✘ | transformers<4.35, icecream | - | MAGAer13/mplug-owl2-llama2-7b |
mplug-owl2d1-chat | iic/mPLUG-Owl2.1 | c_attn.multiway.0, c_attn.multiway.1 | mplug-owl2 | ✔ | ✘ | transformers<4.35, icecream | - | Mizukiluke/mplug_owl_2_1 |
yuan2-2b-instruct | YuanLLM/Yuan2.0-2B-hf | q_proj, k_proj, v_proj | yuan | ✔ | ✘ | - | IEITYuan/Yuan2-2B-hf | |
yuan2-2b-janus-instruct | YuanLLM/Yuan2-2B-Janus-hf | q_proj, k_proj, v_proj | yuan | ✔ | ✘ | - | IEITYuan/Yuan2-2B-Janus-hf | |
yuan2-51b-instruct | YuanLLM/Yuan2.0-51B-hf | q_proj, k_proj, v_proj | yuan | ✔ | ✘ | - | IEITYuan/Yuan2-51B-hf | |
yuan2-102b-instruct | YuanLLM/Yuan2.0-102B-hf | q_proj, k_proj, v_proj | yuan | ✔ | ✘ | - | IEITYuan/Yuan2-102B-hf | |
xverse-7b | xverse/XVERSE-7B | q_proj, k_proj, v_proj | default-generation | ✘ | ✘ | - | xverse/XVERSE-7B | |
xverse-7b-chat | xverse/XVERSE-7B-Chat | q_proj, k_proj, v_proj | xverse | ✘ | ✘ | - | xverse/XVERSE-7B-Chat | |
xverse-13b | xverse/XVERSE-13B | q_proj, k_proj, v_proj | default-generation | ✘ | ✘ | - | xverse/XVERSE-13B | |
xverse-13b-chat | xverse/XVERSE-13B-Chat | q_proj, k_proj, v_proj | xverse | ✘ | ✘ | - | xverse/XVERSE-13B-Chat | |
xverse-65b | xverse/XVERSE-65B | q_proj, k_proj, v_proj | default-generation | ✘ | ✘ | - | xverse/XVERSE-65B | |
xverse-65b-v2 | xverse/XVERSE-65B-2 | q_proj, k_proj, v_proj | default-generation | ✘ | ✘ | - | xverse/XVERSE-65B-2 | |
xverse-65b-chat | xverse/XVERSE-65B-Chat | q_proj, k_proj, v_proj | xverse | ✘ | ✘ | - | xverse/XVERSE-65B-Chat | |
xverse-13b-256k | xverse/XVERSE-13B-256K | q_proj, k_proj, v_proj | default-generation | ✘ | ✘ | - | xverse/XVERSE-13B-256K | |
xverse-moe-a4_2b | xverse/XVERSE-MoE-A4.2B | q_proj, k_proj, v_proj | default-generation | ✘ | ✘ | - | xverse/XVERSE-MoE-A4.2B | |
orion-14b | OrionStarAI/Orion-14B-Base | q_proj, k_proj, v_proj | default-generation | ✔ | ✘ | - | OrionStarAI/Orion-14B-Base | |
orion-14b-chat | OrionStarAI/Orion-14B-Chat | q_proj, k_proj, v_proj | orion | ✔ | ✘ | - | OrionStarAI/Orion-14B-Chat | |
bluelm-7b | vivo-ai/BlueLM-7B-Base | q_proj, k_proj, v_proj | default-generation-bos | ✘ | ✘ | - | vivo-ai/BlueLM-7B-Base | |
bluelm-7b-32k | vivo-ai/BlueLM-7B-Base-32K | q_proj, k_proj, v_proj | default-generation-bos | ✘ | ✘ | - | vivo-ai/BlueLM-7B-Base-32K | |
bluelm-7b-chat | vivo-ai/BlueLM-7B-Chat | q_proj, k_proj, v_proj | bluelm | ✘ | ✘ | - | vivo-ai/BlueLM-7B-Chat | |
bluelm-7b-chat-32k | vivo-ai/BlueLM-7B-Chat-32K | q_proj, k_proj, v_proj | bluelm | ✘ | ✘ | - | vivo-ai/BlueLM-7B-Chat-32K | |
ziya2-13b | Fengshenbang/Ziya2-13B-Base | q_proj, k_proj, v_proj | default-generation-bos | ✔ | ✔ | - | IDEA-CCNL/Ziya2-13B-Base | |
ziya2-13b-chat | Fengshenbang/Ziya2-13B-Chat | q_proj, k_proj, v_proj | ziya | ✔ | ✔ | - | IDEA-CCNL/Ziya2-13B-Chat | |
skywork-13b | skywork/Skywork-13B-base | q_proj, k_proj, v_proj | default-generation-bos | ✘ | ✘ | - | Skywork/Skywork-13B-base | |
skywork-13b-chat | skywork/Skywork-13B-chat | q_proj, k_proj, v_proj | skywork | ✘ | ✘ | - | - | |
zephyr-7b-beta-chat | modelscope/zephyr-7b-beta | q_proj, k_proj, v_proj | zephyr | ✔ | ✔ | transformers>=4.34 | - | HuggingFaceH4/zephyr-7b-beta |
polylm-13b | damo/nlp_polylm_13b_text_generation | c_attn | default-generation | ✘ | ✘ | - | DAMO-NLP-MT/polylm-13b | |
seqgpt-560m | damo/nlp_seqgpt-560m | query_key_value | default-generation | ✘ | ✔ | - | DAMO-NLP/SeqGPT-560M | |
sus-34b-chat | SUSTC/SUS-Chat-34B | q_proj, k_proj, v_proj | sus | ✔ | ✔ | - | SUSTech/SUS-Chat-34B | |
tongyi-finance-14b | TongyiFinance/Tongyi-Finance-14B | c_attn | default-generation | ✔ | ✔ | financial | - | |
tongyi-finance-14b-chat | TongyiFinance/Tongyi-Finance-14B-Chat | c_attn | qwen | ✔ | ✔ | financial | jxy/Tongyi-Finance-14B-Chat | |
tongyi-finance-14b-chat-int4 | TongyiFinance/Tongyi-Finance-14B-Chat-Int4 | c_attn | qwen | ✔ | ✔ | auto_gptq>=0.5 | financial | jxy/Tongyi-Finance-14B-Chat-Int4 |
codefuse-codellama-34b-chat | codefuse-ai/CodeFuse-CodeLlama-34B | q_proj, k_proj, v_proj | codefuse-codellama | ✔ | ✔ | coding | codefuse-ai/CodeFuse-CodeLlama-34B | |
codefuse-codegeex2-6b-chat | codefuse-ai/CodeFuse-CodeGeeX2-6B | query_key_value | codefuse | ✘ | ✔ | transformers<4.34 | coding | codefuse-ai/CodeFuse-CodeGeeX2-6B |
codefuse-qwen-14b-chat | codefuse-ai/CodeFuse-QWen-14B | c_attn | codefuse | ✔ | ✔ | coding | codefuse-ai/CodeFuse-QWen-14B | |
phi2-3b | AI-ModelScope/phi-2 | Wqkv | default-generation | ✔ | ✔ | coding | microsoft/phi-2 | |
cogvlm-17b-instruct | ZhipuAI/cogvlm-chat | vision_expert_query_key_value, vision_expert_dense, language_expert_query_key_value, language_expert_dense | cogvlm-instruct | ✘ | ✘ | multi-modal, vision | THUDM/cogvlm-chat-hf | |
cogagent-18b-chat | ZhipuAI/cogagent-chat | vision_expert_query_key_value, vision_expert_dense, language_expert_query_key_value, language_expert_dense, query, key_value, dense | cogagent-chat | ✘ | ✘ | multi-modal, vision | THUDM/cogagent-chat-hf | |
cogagent-18b-instruct | ZhipuAI/cogagent-vqa | vision_expert_query_key_value, vision_expert_dense, language_expert_query_key_value, language_expert_dense, query, key_value, dense | cogagent-instruct | ✘ | ✘ | multi-modal, vision | THUDM/cogagent-vqa-hf | |
mamba-130m | AI-ModelScope/mamba-130m-hf | in_proj, x_proj, embeddings, out_proj | default-generation | ✘ | ✘ | transformers>=4.39.0 | - | state-spaces/mamba-130m-hf |
mamba-370m | AI-ModelScope/mamba-370m-hf | in_proj, x_proj, embeddings, out_proj | default-generation | ✘ | ✘ | transformers>=4.39.0 | - | state-spaces/mamba-370m-hf |
mamba-390m | AI-ModelScope/mamba-390m-hf | in_proj, x_proj, embeddings, out_proj | default-generation | ✘ | ✘ | transformers>=4.39.0 | - | state-spaces/mamba-390m-hf |
mamba-790m | AI-ModelScope/mamba-790m-hf | in_proj, x_proj, embeddings, out_proj | default-generation | ✘ | ✘ | transformers>=4.39.0 | - | state-spaces/mamba-790m-hf |
mamba-1.4b | AI-ModelScope/mamba-1.4b-hf | in_proj, x_proj, embeddings, out_proj | default-generation | ✘ | ✘ | transformers>=4.39.0 | - | state-spaces/mamba-1.4b-hf |
mamba-2.8b | AI-ModelScope/mamba-2.8b-hf | in_proj, x_proj, embeddings, out_proj | default-generation | ✘ | ✘ | transformers>=4.39.0 | - | state-spaces/mamba-2.8b-hf |
telechat-7b | TeleAI/TeleChat-7B | key_value, query | telechat | ✔ | ✘ | - | Tele-AI/telechat-7B | |
telechat-12b | TeleAI/TeleChat-12B | key_value, query | telechat | ✔ | ✘ | - | Tele-AI/TeleChat-12B | |
grok-1 | colossalai/grok-1-pytorch | q_proj, k_proj, v_proj | default-generation | ✘ | ✘ | - | hpcai-tech/grok-1 | |
dbrx-instruct | AI-ModelScope/dbrx-instruct | attn.Wqkv | dbrx | ✔ | ✔ | transformers>=4.36 | - | databricks/dbrx-instruct |
dbrx-base | AI-ModelScope/dbrx-base | attn.Wqkv | dbrx | ✔ | ✔ | transformers>=4.36 | - | databricks/dbrx-base |
mengzi3-13b-base | langboat/Mengzi3-13B-Base | q_proj, k_proj, v_proj | mengzi | ✔ | ✔ | - | Langboat/Mengzi3-13B-Base | |
c4ai-command-r-v01 | AI-ModelScope/c4ai-command-r-v01 | q_proj, k_proj, v_proj | c4ai | ✔ | ✘ | transformers>=4.39.1 | - | CohereForAI/c4ai-command-r-v01 |
c4ai-command-r-plus | AI-ModelScope/c4ai-command-r-plus | q_proj, k_proj, v_proj | c4ai | ✔ | ✘ | transformers>4.39 | - | CohereForAI/c4ai-command-r-plus |
The table below introduces the datasets supported by SWIFT:
- Dataset Name: The dataset name registered in SWIFT.
- Dataset ID: The dataset id in ModelScope.
- Size: The data row count of the dataset.
- Statistic: Dataset statistics. We use the number of tokens for statistics, which helps adjust the max_length hyperparameter. We concatenate the training and validation sets of the dataset and then compute the statistics. We use qwen's tokenizer to tokenize the dataset. Different tokenizers produce different statistics. If you want to obtain token statistics for tokenizers of other models, you can use the script to get them yourself.
Dataset Name | Dataset ID | Train Size | Val Size | Statistic (token) | Tags | HF Dataset ID |
---|---|---|---|---|---|---|
🔥ms-bench | iic/ms_bench | 316228 | 0 | 345.0±441.3, min=22, max=30960 | chat, general, multi-round | - |
🔥ms-bench-mini | iic/ms_bench | 19492 | 0 | 353.9±439.4, min=29, max=12078 | chat, general, multi-round | - |
🔥alpaca-en | AI-ModelScope/alpaca-gpt4-data-en | 52002 | 0 | 176.2±125.8, min=26, max=740 | chat, general | vicgalle/alpaca-gpt4 |
🔥alpaca-zh | AI-ModelScope/alpaca-gpt4-data-zh | 48818 | 0 | 162.1±93.9, min=26, max=856 | chat, general | c-s-ale/alpaca-gpt4-data-zh |
multi-alpaca-all | damo/nlp_polylm_multialpaca_sft | 131867 | 0 | 112.9±50.6, min=26, max=1226 | chat, general, multilingual | - |
instinwild-en | wyj123456/instinwild | 52191 | 0 | 160.2±69.7, min=33, max=763 | chat, general | - |
instinwild-zh | wyj123456/instinwild | 51504 | 0 | 130.3±45.1, min=28, max=1434 | chat, general | - |
cot-en | YorickHe/CoT | 74771 | 0 | 122.7±64.8, min=51, max=8320 | chat, general | - |
cot-zh | YorickHe/CoT_zh | 74771 | 0 | 117.5±70.8, min=43, max=9636 | chat, general | - |
firefly-all-zh | wyj123456/firefly | 1649399 | 0 | 178.1±260.4, min=26, max=12516 | chat, general | - |
instruct-en | wyj123456/instruct | 888970 | 0 | 268.9±331.2, min=26, max=7252 | chat, general | - |
gpt4all-en | wyj123456/GPT4all | 806199 | 0 | 302.5±384.1, min=27, max=7391 | chat, general | - |
sharegpt-en | huangjintao/sharegpt | 99799 | 0 | 1045.7±431.9, min=22, max=7907 | chat, general, multi-round | - |
sharegpt-zh | huangjintao/sharegpt | 135399 | 0 | 806.3±771.7, min=21, max=65318 | chat, general, multi-round | - |
tulu-v2-sft-mixture | AI-ModelScope/tulu-v2-sft-mixture | 326154 | 0 | 867.8±996.4, min=22, max=12111 | chat, multilingual, general, multi-round | allenai/tulu-v2-sft-mixture |
wikipedia-zh | AI-ModelScope/wikipedia-cn-20230720-filtered | 254547 | 0 | 568.4±713.2, min=37, max=78678 | text-generation, general, pretrained | pleisto/wikipedia-cn-20230720-filtered |
open-orca | AI-ModelScope/OpenOrca | 3239027 | 0 | 360.4±402.9, min=27, max=8672 | chat, multilingual, general | - |
open-orca-gpt4 | AI-ModelScope/OpenOrca | 994896 | 0 | 382.3±417.4, min=31, max=8740 | chat, multilingual, general | - |
sharegpt-gpt4 | AI-ModelScope/sharegpt_gpt4 | 103063 | 0 | 1286.2±2089.4, min=22, max=221080 | chat, multilingual, general, multi-round | - |
🔥sharegpt-gpt4-mini | AI-ModelScope/sharegpt_gpt4 | 6205 | 0 | 3511.6±6068.5, min=33, max=116018 | chat, multilingual, general, multi-round, gpt4 | - |
🔥ms-agent | iic/ms_agent | 30000 | 0 | 647.7±217.1, min=199, max=2722 | chat, agent, multi-round | - |
ms-agent-for-agentfabric-default | AI-ModelScope/ms_agent_for_agentfabric | 30000 | 0 | 617.8±199.1, min=251, max=2657 | chat, agent, multi-round | - |
ms-agent-for-agentfabric-addition | AI-ModelScope/ms_agent_for_agentfabric | 488 | 0 | 2084.9±1514.8, min=489, max=7354 | chat, agent, multi-round | - |
ms-agent-multirole | iic/MSAgent-MultiRole | 8425 | 0 | 443.2±84.7, min=201, max=1101 | chat, agent, multi-round, role-play, multi-agent | - |
damo-agent-zh | damo/MSAgent-Bench | 422115 | 161 | 965.7±440.9, min=321, max=31535 | chat, agent, multi-round | - |
damo-agent-mini-zh | damo/MSAgent-Bench | 39964 | 152 | 1230.9±350.1, min=558, max=4982 | chat, agent, multi-round | - |
agent-instruct-all-en | huangjintao/AgentInstruct_copy | 1866 | 0 | 1144.3±635.5, min=206, max=6412 | chat, agent, multi-round | - |
code-alpaca-en | wyj123456/code_alpaca_en | 20016 | 0 | 100.1±60.1, min=29, max=1776 | chat, coding | sahil2801/CodeAlpaca-20k |
🔥leetcode-python-en | AI-ModelScope/leetcode-solutions-python | 2359 | 0 | 723.8±233.5, min=259, max=2117 | chat, coding | - |
🔥codefuse-python-en | codefuse-ai/CodeExercise-Python-27k | 27224 | 0 | 483.6±193.9, min=45, max=3082 | chat, coding | - |
🔥codefuse-evol-instruction-zh | codefuse-ai/Evol-instruction-66k | 66862 | 0 | 439.6±206.3, min=37, max=2983 | chat, coding | - |
medical-en | huangjintao/medical_zh | 117117 | 500 | 257.4±89.1, min=36, max=2564 | chat, medical | - |
medical-zh | huangjintao/medical_zh | 1950472 | 500 | 167.2±219.7, min=26, max=27351 | chat, medical | - |
medical-mini-zh | huangjintao/medical_zh | 50000 | 500 | 168.1±220.8, min=26, max=12320 | chat, medical | - |
🔥disc-med-sft-zh | AI-ModelScope/DISC-Med-SFT | 441767 | 0 | 354.1±193.1, min=25, max=2231 | chat, medical | Flmc/DISC-Med-SFT |
lawyer-llama-zh | AI-ModelScope/lawyer_llama_data | 21476 | 0 | 194.4±91.7, min=27, max=924 | chat, law | Skepsun/lawyer_llama_data |
tigerbot-law-zh | AI-ModelScope/tigerbot-law-plugin | 55895 | 0 | 109.9±126.4, min=37, max=18878 | text-generation, law, pretrained | TigerResearch/tigerbot-law-plugin |
🔥disc-law-sft-zh | AI-ModelScope/DISC-Law-SFT | 166758 | 0 | 533.7±495.4, min=30, max=15169 | chat, law | - |
🔥blossom-math-zh | AI-ModelScope/blossom-math-v2 | 10000 | 0 | 169.3±58.7, min=35, max=563 | chat, math | Azure99/blossom-math-v2 |
school-math-zh | AI-ModelScope/school_math_0.25M | 248480 | 0 | 157.6±72.1, min=33, max=3450 | chat, math | BelleGroup/school_math_0.25M |
open-platypus-en | AI-ModelScope/Open-Platypus | 24926 | 0 | 367.9±254.8, min=30, max=3951 | chat, math | garage-bAInd/Open-Platypus |
text2sql-en | AI-ModelScope/texttosqlv2_25000_v2 | 25000 | 0 | 274.6±326.4, min=38, max=1975 | chat, sql | Clinton/texttosqlv2_25000_v2 |
🔥sql-create-context-en | AI-ModelScope/sql-create-context | 78577 | 0 | 80.2±17.8, min=36, max=456 | chat, sql | b-mc2/sql-create-context |
🔥advertise-gen-zh | lvjianjin/AdvertiseGen | 97484 | 915 | 131.6±21.7, min=52, max=242 | text-generation | shibing624/AdvertiseGen |
🔥dureader-robust-zh | modelscope/DuReader_robust-QG | 15937 | 1962 | 242.1±137.4, min=61, max=1417 | text-generation | - |
cmnli-zh | clue | 391783 | 12241 | 83.6±16.6, min=52, max=200 | text-generation, classification | clue |
🔥cmnli-mini-zh | clue | 20000 | 200 | 82.9±16.3, min=52, max=188 | text-generation, classification | clue |
🔥jd-sentiment-zh | DAMO_NLP/jd | 45012 | 4988 | 67.0±83.2, min=40, max=4040 | text-generation, classification | - |
🔥hc3-zh | simpleai/HC3-Chinese | 39781 | 0 | 177.8±81.5, min=58, max=3052 | text-generation, classification | Hello-SimpleAI/HC3-Chinese |
🔥hc3-en | simpleai/HC3 | 11021 | 0 | 299.3±138.7, min=66, max=2268 | text-generation, classification | Hello-SimpleAI/HC3 |
finance-en | wyj123456/finance_en | 68911 | 0 | 135.6±134.3, min=26, max=3525 | chat, financial | ssbuild/alpaca_finance_en |
poetry-zh | modelscope/chinese-poetry-collection | 388599 | 1710 | 55.2±9.4, min=23, max=83 | text-generation, poetry | - |
webnovel-zh | AI-ModelScope/webnovel_cn | 50000 | 0 | 1478.9±11526.1, min=100, max=490484 | chat, novel | zxbsmk/webnovel_cn |
generated-chat-zh | AI-ModelScope/generated_chat_0.4M | 396004 | 0 | 273.3±52.0, min=32, max=873 | chat, character-dialogue | BelleGroup/generated_chat_0.4M |
cls-fudan-news-zh | damo/zh_cls_fudan-news | 4959 | 0 | 3234.4±2547.5, min=91, max=19548 | chat, classification | - |
ner-jave-zh | damo/zh_ner-JAVE | 1266 | 0 | 118.3±45.5, min=44, max=223 | chat, ner | - |
long-alpaca-12k | AI-ModelScope/LongAlpaca-12k | 11998 | 0 | 9619.0±8295.8, min=36, max=78925 | longlora, QA | Yukang/LongAlpaca-12k |
coco-en | modelscope/coco_2014_caption | 414113 | 40504 | 298.8±2.8, min=294, max=351 | chat, multi-modal, vision | - |
🔥coco-mini-en | modelscope/coco_2014_caption | 20000 | 200 | 298.8±2.8, min=294, max=339 | chat, multi-modal, vision | - |
🔥coco-mini-en-2 | modelscope/coco_2014_caption | 20000 | 200 | 36.8±2.8, min=32, max=77 | chat, multi-modal, vision | - |
capcha-images | AI-ModelScope/captcha-images | 6000 | 2000 | 29.0±0.0, min=29, max=29 | chat, multi-modal, vision | - |
aishell1-zh | speech_asr/speech_asr_aishell1_trainsets | 134424 | 7176 | 152.2±36.8, min=63, max=419 | chat, multi-modal, audio | - |
🔥aishell1-mini-zh | speech_asr/speech_asr_aishell1_trainsets | 14326 | 200 | 152.0±35.5, min=74, max=359 | chat, multi-modal, audio | - |
hh-rlhf-harmless-base | AI-ModelScope/hh-rlhf | 42462 | 2308 | 167.2±123.1, min=22, max=986 | rlhf, dpo, pairwise | - |
hh-rlhf-helpful-base | AI-ModelScope/hh-rlhf | 43777 | 2348 | 201.9±135.2, min=25, max=1070 | rlhf, dpo, pairwise | - |
hh-rlhf-helpful-online | AI-ModelScope/hh-rlhf | 10150 | 1137 | 401.5±278.7, min=32, max=1987 | rlhf, dpo, pairwise | - |
hh-rlhf-helpful-rejection-sampled | AI-ModelScope/hh-rlhf | 52413 | 2749 | 247.0±152.6, min=26, max=1300 | rlhf, dpo, pairwise | - |
hh-rlhf-red-team-attempts | AI-ModelScope/hh-rlhf | 52413 | 2749 | 247.0±152.6, min=26, max=1300 | rlhf, dpo, pairwise | - |
🔥hh-rlhf-cn | AI-ModelScope/hh_rlhf_cn | 172085 | 9292 | 172.8±124.0, min=22, max=1638 | rlhf, dpo, pairwise | - |
hh-rlhf-cn-harmless-base-cn | AI-ModelScope/hh_rlhf_cn | 42394 | 2304 | 143.9±109.4, min=24, max=3078 | rlhf, dpo, pairwise | - |
hh-rlhf-cn-helpful-base-cn | AI-ModelScope/hh_rlhf_cn | 43722 | 2346 | 176.8±120.0, min=26, max=1420 | rlhf, dpo, pairwise | - |
hh-rlhf-cn-harmless-base-en | AI-ModelScope/hh_rlhf_cn | 42394 | 2304 | 167.5±123.2, min=22, max=986 | rlhf, dpo, pairwise | - |
hh-rlhf-cn-helpful-base-en | AI-ModelScope/hh_rlhf_cn | 43722 | 2346 | 202.2±135.3, min=25, max=1070 | rlhf, dpo, pairwise | - |
stack-exchange-paired | AI-ModelScope/stack-exchange-paired | 4483004 | 0 | 534.5±594.6, min=31, max=56588 | hfrl, dpo, pairwise | - |
pileval | huangjintao/pile-val-backup | 214670 | 0 | 1612.3±8856.2, min=11, max=1208955 | text-generation, awq | mit-han-lab/pile-val-backup |
🔥coig-cqia-chinese-traditional | AI-ModelScope/COIG-CQIA | 1111 | 0 | 172.6±59.9, min=55, max=856 | general | - |
🔥coig-cqia-coig-pc | AI-ModelScope/COIG-CQIA | 3000 | 0 | 353.5±859.6, min=34, max=19288 | general | - |
🔥coig-cqia-exam | AI-ModelScope/COIG-CQIA | 4856 | 0 | 275.0±240.0, min=45, max=4932 | general | - |
🔥coig-cqia-finance | AI-ModelScope/COIG-CQIA | 11288 | 0 | 1266.4±561.1, min=60, max=10582 | general | - |
🔥coig-cqia-douban | AI-ModelScope/COIG-CQIA | 3086 | 0 | 402.9±544.7, min=88, max=10870 | general | - |
🔥coig-cqia-human-value | AI-ModelScope/COIG-CQIA | 1007 | 0 | 151.2±77.3, min=39, max=656 | general | - |
🔥coig-cqia-logi-qa | AI-ModelScope/COIG-CQIA | 421 | 0 | 309.8±188.8, min=43, max=1306 | general | - |
🔥coig-cqia-ruozhiba | AI-ModelScope/COIG-CQIA | 240 | 0 | 189.8±62.2, min=33, max=505 | general | - |
🔥coig-cqia-segmentfault | AI-ModelScope/COIG-CQIA | 458 | 0 | 449.0±495.8, min=87, max=6342 | general | - |
🔥coig-cqia-wiki | AI-ModelScope/COIG-CQIA | 10603 | 0 | 619.2±515.8, min=73, max=10140 | general | - |
🔥coig-cqia-wikihow | AI-ModelScope/COIG-CQIA | 1485 | 0 | 1700.0±790.9, min=260, max=6371 | general | - |
🔥coig-cqia-xhs | AI-ModelScope/COIG-CQIA | 1508 | 0 | 438.0±179.6, min=129, max=2191 | general | - |
🔥coig-cqia-zhihu | AI-ModelScope/COIG-CQIA | 5631 | 0 | 540.7±306.7, min=161, max=3036 | general | - |
🔥ruozhiba-post-annual | AI-ModelScope/ruozhiba | 1361 | 0 | 36.6±15.3, min=24, max=559 | pretrain | - |
🔥ruozhiba-title-good | AI-ModelScope/ruozhiba | 2597 | 0 | 41.9±19.3, min=22, max=246 | pretrain | - |
🔥ruozhiba-title-norm | AI-ModelScope/ruozhiba | 81700 | 0 | 39.9±12.8, min=21, max=386 | pretrain | - |