Supported models and datasets

Models

The table below introcudes all models supported by SWIFT:

Model List: The model_type information registered in SWIFT.
Default Lora Target Modules: Default lora_target_modules used by the model.
Default Template: Default template used by the model.
Support Flash Attn: Whether the model supports flash attention to accelerate sft and infer.
Support VLLM: Whether the model supports vllm to accelerate infer and deployment.
Requires: The extra requirements used by the model.

Model Type	Model ID	Default Lora Target Modules	Default Template	Support Flash Attn	Support VLLM	Requires	Tags	HF Model ID
qwen-1_8b	qwen/Qwen-1_8B	c_attn	default-generation	✔	✔		-	Qwen/Qwen1.5-1.8B
qwen-1_8b-chat	qwen/Qwen-1_8B-Chat	c_attn	qwen	✔	✔		-	Qwen/Qwen-1_8B-Chat
qwen-1_8b-chat-int4	qwen/Qwen-1_8B-Chat-Int4	c_attn	qwen	✔	✔	auto_gptq>=0.5	-	Qwen/Qwen-1_8B-Chat-Int4
qwen-1_8b-chat-int8	qwen/Qwen-1_8B-Chat-Int8	c_attn	qwen	✔	✘	auto_gptq>=0.5	-	Qwen/Qwen-1_8B-Chat-Int8
qwen-7b	qwen/Qwen-7B	c_attn	default-generation	✔	✔		-	Qwen/Qwen-7B
qwen-7b-chat	qwen/Qwen-7B-Chat	c_attn	qwen	✔	✔		-	Qwen/Qwen-7B-Chat
qwen-7b-chat-int4	qwen/Qwen-7B-Chat-Int4	c_attn	qwen	✔	✔	auto_gptq>=0.5	-	Qwen/Qwen-7B-Chat-Int4
qwen-7b-chat-int8	qwen/Qwen-7B-Chat-Int8	c_attn	qwen	✔	✘	auto_gptq>=0.5	-	Qwen/Qwen-7B-Chat-Int8
qwen-14b	qwen/Qwen-14B	c_attn	default-generation	✔	✔		-	Qwen/Qwen-14B
qwen-14b-chat	qwen/Qwen-14B-Chat	c_attn	qwen	✔	✔		-	Qwen/Qwen-14B-Chat
qwen-14b-chat-int4	qwen/Qwen-14B-Chat-Int4	c_attn	qwen	✔	✔	auto_gptq>=0.5	-	Qwen/Qwen-14B-Chat-Int4
qwen-14b-chat-int8	qwen/Qwen-14B-Chat-Int8	c_attn	qwen	✔	✘	auto_gptq>=0.5	-	Qwen/Qwen-14B-Chat-Int8
qwen-72b	qwen/Qwen-72B	c_attn	default-generation	✔	✔		-	Qwen/Qwen-72B
qwen-72b-chat	qwen/Qwen-72B-Chat	c_attn	qwen	✔	✔		-	Qwen/Qwen-72B-Chat
qwen-72b-chat-int4	qwen/Qwen-72B-Chat-Int4	c_attn	qwen	✔	✔	auto_gptq>=0.5	-	Qwen/Qwen-72B-Chat-Int4
qwen-72b-chat-int8	qwen/Qwen-72B-Chat-Int8	c_attn	qwen	✔	✘	auto_gptq>=0.5	-	Qwen/Qwen-72B-Chat-Int8
qwen1half-0_5b	qwen/Qwen1.5-0.5B	q_proj, k_proj, v_proj	default-generation	✔	✔	transformers>=4.37	-	Qwen/Qwen1.5-0.5B
qwen1half-1_8b	qwen/Qwen1.5-1.8B	q_proj, k_proj, v_proj	default-generation	✔	✔	transformers>=4.37	-	Qwen/Qwen1.5-1.8B
qwen1half-4b	qwen/Qwen1.5-4B	q_proj, k_proj, v_proj	default-generation	✔	✔	transformers>=4.37	-	Qwen/Qwen1.5-4B
qwen1half-7b	qwen/Qwen1.5-7B	q_proj, k_proj, v_proj	default-generation	✔	✔	transformers>=4.37	-	Qwen/Qwen1.5-7B
qwen1half-14b	qwen/Qwen1.5-14B	q_proj, k_proj, v_proj	default-generation	✔	✔	transformers>=4.37	-	Qwen/Qwen1.5-14B
qwen1half-32b	qwen/Qwen1.5-32B	q_proj, k_proj, v_proj	default-generation	✔	✔	transformers>=4.37	-	Qwen/Qwen1.5-32B
qwen1half-72b	qwen/Qwen1.5-72B	q_proj, k_proj, v_proj	default-generation	✔	✔	transformers>=4.37	-	Qwen/Qwen1.5-72B
codeqwen1half-7b	qwen/CodeQwen1.5-7B	q_proj, k_proj, v_proj	default-generation	✔	✔	transformers>=4.37	-	Qwen/CodeQwen1.5-7B
qwen1half-moe-a2_7b	qwen/Qwen1.5-MoE-A2.7B	q_proj, k_proj, v_proj	default-generation	✔	✔	transformers>=4.37	-	Qwen/Qwen1.5-MoE-A2.7B
qwen1half-0_5b-chat	qwen/Qwen1.5-0.5B-Chat	q_proj, k_proj, v_proj	qwen	✔	✔	transformers>=4.37	-	Qwen/Qwen1.5-0.5B-Chat
qwen1half-1_8b-chat	qwen/Qwen1.5-1.8B-Chat	q_proj, k_proj, v_proj	qwen	✔	✔	transformers>=4.37	-	Qwen/Qwen1.5-1.8B-Chat
qwen1half-4b-chat	qwen/Qwen1.5-4B-Chat	q_proj, k_proj, v_proj	qwen	✔	✔	transformers>=4.37	-	Qwen/Qwen1.5-4B-Chat
qwen1half-7b-chat	qwen/Qwen1.5-7B-Chat	q_proj, k_proj, v_proj	qwen	✔	✔	transformers>=4.37	-	Qwen/Qwen1.5-7B-Chat
qwen1half-14b-chat	qwen/Qwen1.5-14B-Chat	q_proj, k_proj, v_proj	qwen	✔	✔	transformers>=4.37	-	Qwen/Qwen1.5-14B-Chat
qwen1half-32b-chat	qwen/Qwen1.5-32B-Chat	q_proj, k_proj, v_proj	qwen	✔	✔	transformers>=4.37	-	Qwen/Qwen1.5-32B-Chat
qwen1half-72b-chat	qwen/Qwen1.5-72B-Chat	q_proj, k_proj, v_proj	qwen	✔	✔	transformers>=4.37	-	Qwen/Qwen1.5-72B-Chat
qwen1half-moe-a2_7b-chat	qwen/Qwen1.5-MoE-A2.7B-Chat	q_proj, k_proj, v_proj	qwen	✔	✔	transformers>=4.37	-	Qwen/Qwen1.5-MoE-A2.7B-Chat
codeqwen1half-7b-chat	qwen/CodeQwen1.5-7B-Chat	q_proj, k_proj, v_proj	qwen	✔	✔	transformers>=4.37	-	Qwen/CodeQwen1.5-7B-Chat
qwen1half-0_5b-chat-int4	qwen/Qwen1.5-0.5B-Chat-GPTQ-Int4	q_proj, k_proj, v_proj	qwen	✔	✔	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen1.5-0.5B-Chat-GPTQ-Int4
qwen1half-1_8b-chat-int4	qwen/Qwen1.5-1.8B-Chat-GPTQ-Int4	q_proj, k_proj, v_proj	qwen	✔	✔	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen1.5-1.8B-Chat-GPTQ-Int4
qwen1half-4b-chat-int4	qwen/Qwen1.5-4B-Chat-GPTQ-Int4	q_proj, k_proj, v_proj	qwen	✔	✔	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen1.5-4B-Chat-GPTQ-Int4
qwen1half-7b-chat-int4	qwen/Qwen1.5-7B-Chat-GPTQ-Int4	q_proj, k_proj, v_proj	qwen	✔	✔	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen1.5-7B-Chat-GPTQ-Int4
qwen1half-14b-chat-int4	qwen/Qwen1.5-14B-Chat-GPTQ-Int4	q_proj, k_proj, v_proj	qwen	✔	✔	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen1.5-14B-Chat-GPTQ-Int4
qwen1half-32b-chat-int4	qwen/Qwen1.5-32B-Chat-GPTQ-Int4	q_proj, k_proj, v_proj	qwen	✔	✔	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen1.5-32B-Chat-GPTQ-Int4
qwen1half-72b-chat-int4	qwen/Qwen1.5-72B-Chat-GPTQ-Int4	q_proj, k_proj, v_proj	qwen	✔	✔	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen1.5-72B-Chat-GPTQ-Int4
qwen1half-0_5b-chat-int8	qwen/Qwen1.5-0.5B-Chat-GPTQ-Int8	q_proj, k_proj, v_proj	qwen	✔	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen1.5-0.5B-Chat-GPTQ-Int8
qwen1half-1_8b-chat-int8	qwen/Qwen1.5-1.8B-Chat-GPTQ-Int8	q_proj, k_proj, v_proj	qwen	✔	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen1.5-1.8B-Chat-GPTQ-Int8
qwen1half-4b-chat-int8	qwen/Qwen1.5-4B-Chat-GPTQ-Int8	q_proj, k_proj, v_proj	qwen	✔	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen1.5-4B-Chat-GPTQ-Int8
qwen1half-7b-chat-int8	qwen/Qwen1.5-7B-Chat-GPTQ-Int8	q_proj, k_proj, v_proj	qwen	✔	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen1.5-7B-Chat-GPTQ-Int8
qwen1half-14b-chat-int8	qwen/Qwen1.5-14B-Chat-GPTQ-Int8	q_proj, k_proj, v_proj	qwen	✔	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen1.5-14B-Chat-GPTQ-Int8
qwen1half-72b-chat-int8	qwen/Qwen1.5-72B-Chat-GPTQ-Int8	q_proj, k_proj, v_proj	qwen	✔	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen1.5-72B-Chat-GPTQ-Int8
qwen1half-moe-a2_7b-chat-int4	qwen/Qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4	q_proj, k_proj, v_proj	qwen	✔	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4
qwen1half-0_5b-chat-awq	qwen/Qwen1.5-0.5B-Chat-AWQ	q_proj, k_proj, v_proj	qwen	✔	✔	transformers>=4.37, autoawq	-	Qwen/Qwen1.5-0.5B-Chat-AWQ
qwen1half-1_8b-chat-awq	qwen/Qwen1.5-1.8B-Chat-AWQ	q_proj, k_proj, v_proj	qwen	✔	✔	transformers>=4.37, autoawq	-	Qwen/Qwen1.5-1.8B-Chat-AWQ
qwen1half-4b-chat-awq	qwen/Qwen1.5-4B-Chat-AWQ	q_proj, k_proj, v_proj	qwen	✔	✔	transformers>=4.37, autoawq	-	Qwen/Qwen1.5-4B-Chat-AWQ
qwen1half-7b-chat-awq	qwen/Qwen1.5-7B-Chat-AWQ	q_proj, k_proj, v_proj	qwen	✔	✔	transformers>=4.37, autoawq	-	Qwen/Qwen1.5-7B-Chat-AWQ
qwen1half-14b-chat-awq	qwen/Qwen1.5-14B-Chat-AWQ	q_proj, k_proj, v_proj	qwen	✔	✔	transformers>=4.37, autoawq	-	Qwen/Qwen1.5-14B-Chat-AWQ
qwen1half-32b-chat-awq	qwen/Qwen1.5-32B-Chat-AWQ	q_proj, k_proj, v_proj	qwen	✔	✔	transformers>=4.37, autoawq	-	Qwen/Qwen1.5-32B-Chat-AWQ
qwen1half-72b-chat-awq	qwen/Qwen1.5-72B-Chat-AWQ	q_proj, k_proj, v_proj	qwen	✔	✔	transformers>=4.37, autoawq	-	Qwen/Qwen1.5-72B-Chat-AWQ
codeqwen1half-7b-chat-awq	qwen/CodeQwen1.5-7B-Chat-AWQ	q_proj, k_proj, v_proj	qwen	✔	✔	transformers>=4.37, autoawq	-	Qwen/CodeQwen1.5-7B-Chat-AWQ
qwen-vl	qwen/Qwen-VL	c_attn	default-generation	✔	✘		multi-modal, vision	Qwen/Qwen-VL
qwen-vl-chat	qwen/Qwen-VL-Chat	c_attn	qwen	✔	✘		multi-modal, vision	Qwen/Qwen-VL-Chat
qwen-vl-chat-int4	qwen/Qwen-VL-Chat-Int4	c_attn	qwen	✔	✘	auto_gptq>=0.5	multi-modal, vision	Qwen/Qwen-VL-Chat-Int4
qwen-audio	qwen/Qwen-Audio	c_attn	qwen-audio-generation	✔	✘		multi-modal, audio	Qwen/Qwen-Audio
qwen-audio-chat	qwen/Qwen-Audio-Chat	c_attn	qwen-audio	✔	✘		multi-modal, audio	Qwen/Qwen-Audio-Chat
chatglm2-6b	ZhipuAI/chatglm2-6b	query_key_value	chatglm2	✘	✔		-	THUDM/chatglm2-6b
chatglm2-6b-32k	ZhipuAI/chatglm2-6b-32k	query_key_value	chatglm2	✘	✔		-	THUDM/chatglm2-6b-32k
chatglm3-6b-base	ZhipuAI/chatglm3-6b-base	query_key_value	chatglm-generation	✘	✔		-	THUDM/chatglm3-6b-base
chatglm3-6b	ZhipuAI/chatglm3-6b	query_key_value	chatglm3	✘	✔		-	THUDM/chatglm3-6b
chatglm3-6b-32k	ZhipuAI/chatglm3-6b-32k	query_key_value	chatglm3	✘	✔		-	THUDM/chatglm3-6b-32k
codegeex2-6b	ZhipuAI/codegeex2-6b	query_key_value	chatglm-generation	✘	✔	transformers<4.34	coding	THUDM/codegeex2-6b
llama2-7b	modelscope/Llama-2-7b-ms	q_proj, k_proj, v_proj	default-generation-bos	✔	✔		-	meta-llama/Llama-2-7b-hf
llama2-7b-chat	modelscope/Llama-2-7b-chat-ms	q_proj, k_proj, v_proj	llama	✔	✔		-	meta-llama/Llama-2-7b-chat-hf
llama2-13b	modelscope/Llama-2-13b-ms	q_proj, k_proj, v_proj	default-generation-bos	✔	✔		-	meta-llama/Llama-2-13b-hf
llama2-13b-chat	modelscope/Llama-2-13b-chat-ms	q_proj, k_proj, v_proj	llama	✔	✔		-	meta-llama/Llama-2-13b-chat-hf
llama2-70b	modelscope/Llama-2-70b-ms	q_proj, k_proj, v_proj	default-generation-bos	✔	✔		-	meta-llama/Llama-2-70b-hf
llama2-70b-chat	modelscope/Llama-2-70b-chat-ms	q_proj, k_proj, v_proj	llama	✔	✔		-	meta-llama/Llama-2-70b-chat-hf
llama2-7b-aqlm-2bit-1x16	AI-ModelScope/Llama-2-7b-AQLM-2Bit-1x16-hf	q_proj, k_proj, v_proj	default-generation-bos	✔	✘	transformers>=4.38, aqlm, torch>=2.2.0	-	ISTA-DASLab/Llama-2-7b-AQLM-2Bit-1x16-hf
llama3-8b	LLM-Research/Meta-Llama-3-8B	q_proj, k_proj, v_proj	default-generation	✔	✔		-	meta-llama/Meta-Llama-3-8B
llama3-8b-instruct	LLM-Research/Meta-Llama-3-8B-Instruct	q_proj, k_proj, v_proj	llama3	✔	✔		-	meta-llama/Meta-Llama-3-8B-Instruct
llama3-70b	LLM-Research/Meta-Llama-3-70B	q_proj, k_proj, v_proj	default-generation	✔	✔		-	meta-llama/Meta-Llama-3-70B
llama3-70b-instruct	LLM-Research/Meta-Llama-3-70B-Instruct	q_proj, k_proj, v_proj	llama3	✔	✔		-	meta-llama/Meta-Llama-3-70B-Instruct
llava1d6-mistral-7b-instruct	AI-ModelScope/llava-v1.6-mistral-7b	q_proj, k_proj, v_proj	llava-mistral-instruct	✔	✘	transformers>=4.34	multi-modal, vision	liuhaotian/llava-v1.6-mistral-7b
llava1d6-yi-34b-instruct	AI-ModelScope/llava-v1.6-34b	q_proj, k_proj, v_proj	llava-yi-instruct	✔	✘		multi-modal, vision	liuhaotian/llava-v1.6-34b
yi-6b	01ai/Yi-6B	q_proj, k_proj, v_proj	default-generation	✔	✔		-	01-ai/Yi-6B
yi-6b-200k	01ai/Yi-6B-200K	q_proj, k_proj, v_proj	default-generation	✔	✔		-	01-ai/Yi-6B-200K
yi-6b-chat	01ai/Yi-6B-Chat	q_proj, k_proj, v_proj	yi	✔	✔		-	01-ai/Yi-6B-Chat
yi-6b-chat-awq	01ai/Yi-6B-Chat-4bits	q_proj, k_proj, v_proj	yi	✔	✔	autoawq	-	01-ai/Yi-6B-Chat-4bits
yi-6b-chat-int8	01ai/Yi-6B-Chat-8bits	q_proj, k_proj, v_proj	yi	✔	✔	auto_gptq	-	01-ai/Yi-6B-Chat-8bits
yi-9b	01ai/Yi-9B	q_proj, k_proj, v_proj	default-generation	✔	✔		-	01-ai/Yi-9B
yi-9b-200k	01ai/Yi-9B-200K	q_proj, k_proj, v_proj	default-generation	✔	✔		-	01-ai/Yi-9B-200K
yi-34b	01ai/Yi-34B	q_proj, k_proj, v_proj	default-generation	✔	✔		-	01-ai/Yi-34B
yi-34b-200k	01ai/Yi-34B-200K	q_proj, k_proj, v_proj	default-generation	✔	✔		-	01-ai/Yi-34B-200K
yi-34b-chat	01ai/Yi-34B-Chat	q_proj, k_proj, v_proj	yi	✔	✔		-	01-ai/Yi-34B-Chat
yi-34b-chat-awq	01ai/Yi-34B-Chat-4bits	q_proj, k_proj, v_proj	yi	✔	✔	autoawq	-	01-ai/Yi-34B-Chat-4bits
yi-34b-chat-int8	01ai/Yi-34B-Chat-8bits	q_proj, k_proj, v_proj	yi	✔	✔	auto_gptq	-	01-ai/Yi-34B-Chat-8bits
yi-vl-6b-chat	01ai/Yi-VL-6B	q_proj, k_proj, v_proj	yi-vl	✔	✘	transformers>=4.34	multi-modal, vision	01-ai/Yi-VL-6B
yi-vl-34b-chat	01ai/Yi-VL-34B	q_proj, k_proj, v_proj	yi-vl	✔	✘	transformers>=4.34	multi-modal, vision	01-ai/Yi-VL-34B
internlm-7b	Shanghai_AI_Laboratory/internlm-7b	q_proj, k_proj, v_proj	default-generation-bos	✘	✔		-	internlm/internlm-7b
internlm-7b-chat	Shanghai_AI_Laboratory/internlm-chat-7b	q_proj, k_proj, v_proj	internlm	✘	✔		-	internlm/internlm-chat-7b
internlm-7b-chat-8k	Shanghai_AI_Laboratory/internlm-chat-7b-8k	q_proj, k_proj, v_proj	internlm	✘	✔		-	-
internlm-20b	Shanghai_AI_Laboratory/internlm-20b	q_proj, k_proj, v_proj	default-generation-bos	✘	✔		-	internlm/internlm2-20b
internlm-20b-chat	Shanghai_AI_Laboratory/internlm-chat-20b	q_proj, k_proj, v_proj	internlm	✘	✔		-	internlm/internlm2-chat-20b
internlm2-1_8b	Shanghai_AI_Laboratory/internlm2-1_8b	wqkv	default-generation-bos	✔	✔		-	internlm/internlm2-1_8b
internlm2-1_8b-sft-chat	Shanghai_AI_Laboratory/internlm2-chat-1_8b-sft	wqkv	internlm2	✔	✔		-	internlm/internlm2-chat-1_8b-sft
internlm2-1_8b-chat	Shanghai_AI_Laboratory/internlm2-chat-1_8b	wqkv	internlm2	✔	✔		-	internlm/internlm2-chat-1_8b
internlm2-7b-base	Shanghai_AI_Laboratory/internlm2-base-7b	wqkv	default-generation-bos	✔	✔		-	internlm/internlm2-base-7b
internlm2-7b	Shanghai_AI_Laboratory/internlm2-7b	wqkv	default-generation-bos	✔	✔		-	internlm/internlm2-7b
internlm2-7b-sft-chat	Shanghai_AI_Laboratory/internlm2-chat-7b-sft	wqkv	internlm2	✔	✔		-	internlm/internlm2-chat-7b-sft
internlm2-7b-chat	Shanghai_AI_Laboratory/internlm2-chat-7b	wqkv	internlm2	✔	✔		-	internlm/internlm2-chat-7b
internlm2-20b-base	Shanghai_AI_Laboratory/internlm2-base-20b	wqkv	default-generation-bos	✔	✔		-	internlm/internlm2-base-20b
internlm2-20b	Shanghai_AI_Laboratory/internlm2-20b	wqkv	default-generation-bos	✔	✔		-	internlm/internlm2-20b
internlm2-20b-sft-chat	Shanghai_AI_Laboratory/internlm2-chat-20b-sft	wqkv	internlm2	✔	✔		-	internlm/internlm2-chat-20b-sft
internlm2-20b-chat	Shanghai_AI_Laboratory/internlm2-chat-20b	wqkv	internlm2	✔	✔		-	internlm/internlm2-chat-20b
internlm2-math-7b	Shanghai_AI_Laboratory/internlm2-math-base-7b	wqkv	default-generation-bos	✔	✔		math	internlm/internlm2-math-base-7b
internlm2-math-7b-chat	Shanghai_AI_Laboratory/internlm2-math-7b	wqkv	internlm2	✔	✔		math	internlm/internlm2-math-7b
internlm2-math-20b	Shanghai_AI_Laboratory/internlm2-math-base-20b	wqkv	default-generation-bos	✔	✔		math	internlm/internlm2-math-base-20b
internlm2-math-20b-chat	Shanghai_AI_Laboratory/internlm2-math-20b	wqkv	internlm2	✔	✔		math	internlm/internlm2-math-20b
internlm-xcomposer2-7b-chat	Shanghai_AI_Laboratory/internlm-xcomposer2-7b	wqkv	internlm-xcomposer2	✔	✘		multi-modal, vision	internlm/internlm-xcomposer2-7b
deepseek-7b	deepseek-ai/deepseek-llm-7b-base	q_proj, k_proj, v_proj	default-generation-bos	✔	✔		-	deepseek-ai/deepseek-llm-7b-base
deepseek-7b-chat	deepseek-ai/deepseek-llm-7b-chat	q_proj, k_proj, v_proj	deepseek	✔	✔		-	deepseek-ai/deepseek-llm-7b-chat
deepseek-moe-16b	deepseek-ai/deepseek-moe-16b-base	q_proj, k_proj, v_proj	default-generation-bos	✔	✔		-	deepseek-ai/deepseek-moe-16b-base
deepseek-moe-16b-chat	deepseek-ai/deepseek-moe-16b-chat	q_proj, k_proj, v_proj	deepseek	✔	✔		-	deepseek-ai/deepseek-moe-16b-chat
deepseek-67b	deepseek-ai/deepseek-llm-67b-base	q_proj, k_proj, v_proj	default-generation-bos	✔	✔		-	deepseek-ai/deepseek-llm-67b-base
deepseek-67b-chat	deepseek-ai/deepseek-llm-67b-chat	q_proj, k_proj, v_proj	deepseek	✔	✔		-	deepseek-ai/deepseek-llm-67b-chat
deepseek-coder-1_3b	deepseek-ai/deepseek-coder-1.3b-base	q_proj, k_proj, v_proj	default-generation-bos	✔	✔		coding	deepseek-ai/deepseek-coder-1.3b-base
deepseek-coder-1_3b-instruct	deepseek-ai/deepseek-coder-1.3b-instruct	q_proj, k_proj, v_proj	deepseek-coder	✔	✔		coding	deepseek-ai/deepseek-coder-1.3b-instruct
deepseek-coder-6_7b	deepseek-ai/deepseek-coder-6.7b-base	q_proj, k_proj, v_proj	default-generation-bos	✔	✔		coding	deepseek-ai/deepseek-coder-6.7b-base
deepseek-coder-6_7b-instruct	deepseek-ai/deepseek-coder-6.7b-instruct	q_proj, k_proj, v_proj	deepseek-coder	✔	✔		coding	deepseek-ai/deepseek-coder-6.7b-instruct
deepseek-coder-33b	deepseek-ai/deepseek-coder-33b-base	q_proj, k_proj, v_proj	default-generation-bos	✔	✔		coding	deepseek-ai/deepseek-coder-33b-base
deepseek-coder-33b-instruct	deepseek-ai/deepseek-coder-33b-instruct	q_proj, k_proj, v_proj	deepseek-coder	✔	✔		coding	deepseek-ai/deepseek-coder-33b-instruct
deepseek-math-7b	deepseek-ai/deepseek-math-7b-base	q_proj, k_proj, v_proj	default-generation-bos	✔	✔		math	deepseek-ai/deepseek-math-7b-base
deepseek-math-7b-instruct	deepseek-ai/deepseek-math-7b-instruct	q_proj, k_proj, v_proj	deepseek	✔	✔		math	deepseek-ai/deepseek-math-7b-instruct
deepseek-math-7b-chat	deepseek-ai/deepseek-math-7b-rl	q_proj, k_proj, v_proj	deepseek	✔	✔		math	deepseek-ai/deepseek-math-7b-rl
deepseek-vl-1_3b-chat	deepseek-ai/deepseek-vl-1.3b-chat	q_proj, k_proj, v_proj	deepseek-vl	✔	✘		multi-modal, vision	deepseek-ai/deepseek-vl-1.3b-chat
deepseek-vl-7b-chat	deepseek-ai/deepseek-vl-7b-chat	q_proj, k_proj, v_proj	deepseek-vl	✔	✘		multi-modal, vision	deepseek-ai/deepseek-vl-7b-chat
gemma-2b	AI-ModelScope/gemma-2b	q_proj, k_proj, v_proj	default-generation-bos	✔	✔	transformers>=4.38	-	google/gemma-2b
gemma-7b	AI-ModelScope/gemma-7b	q_proj, k_proj, v_proj	default-generation-bos	✔	✔	transformers>=4.38	-	google/gemma-7b
gemma-2b-instruct	AI-ModelScope/gemma-2b-it	q_proj, k_proj, v_proj	gemma	✔	✔	transformers>=4.38	-	google/gemma-2b-it
gemma-7b-instruct	AI-ModelScope/gemma-7b-it	q_proj, k_proj, v_proj	gemma	✔	✔	transformers>=4.38	-	google/gemma-7b-it
minicpm-1b-sft-chat	OpenBMB/MiniCPM-1B-sft-bf16	q_proj, k_proj, v_proj	minicpm	✔	✔	transformers>=4.36.0	-	openbmb/MiniCPM-1B-sft-bf16
minicpm-2b-sft-chat	OpenBMB/MiniCPM-2B-sft-fp32	q_proj, k_proj, v_proj	minicpm	✔	✔		-	openbmb/MiniCPM-2B-sft-fp32
minicpm-2b-chat	OpenBMB/MiniCPM-2B-dpo-fp32	q_proj, k_proj, v_proj	minicpm	✔	✔		-	openbmb/MiniCPM-2B-dpo-fp32
minicpm-2b-128k	OpenBMB/MiniCPM-2B-128k	q_proj, k_proj, v_proj	chatml	✔	✔	transformers>=4.36.0	-	openbmb/MiniCPM-2B-128k
minicpm-moe-8x2b	OpenBMB/MiniCPM-MoE-8x2B	q_proj, k_proj, v_proj	minicpm	✔	✔	transformers>=4.36.0	-	openbmb/MiniCPM-MoE-8x2B
minicpm-v-3b-chat	OpenBMB/MiniCPM-V	q_proj, k_proj, v_proj	minicpm-v	✔	✘		-	openbmb/MiniCPM-V
minicpm-v-v2	OpenBMB/MiniCPM-V-2	q_proj, k_proj, v_proj	minicpm-v	✔	✘		-	openbmb/MiniCPM-V-2
openbuddy-llama2-13b-chat	OpenBuddy/openbuddy-llama2-13b-v8.1-fp16	q_proj, k_proj, v_proj	openbuddy	✔	✔		-	OpenBuddy/openbuddy-llama2-13b-v8.1-fp16
openbuddy-llama-65b-chat	OpenBuddy/openbuddy-llama-65b-v8-bf16	q_proj, k_proj, v_proj	openbuddy	✔	✔		-	OpenBuddy/openbuddy-llama-65b-v8-bf16
openbuddy-llama2-70b-chat	OpenBuddy/openbuddy-llama2-70b-v10.1-bf16	q_proj, k_proj, v_proj	openbuddy	✔	✔		-	OpenBuddy/openbuddy-llama2-70b-v10.1-bf16
openbuddy-mistral-7b-chat	OpenBuddy/openbuddy-mistral-7b-v17.1-32k	q_proj, k_proj, v_proj	openbuddy	✔	✔	transformers>=4.34	-	OpenBuddy/openbuddy-mistral-7b-v17.1-32k
openbuddy-zephyr-7b-chat	OpenBuddy/openbuddy-zephyr-7b-v14.1	q_proj, k_proj, v_proj	openbuddy	✔	✔	transformers>=4.34	-	OpenBuddy/openbuddy-zephyr-7b-v14.1
openbuddy-deepseek-67b-chat	OpenBuddy/openbuddy-deepseek-67b-v15.2	q_proj, k_proj, v_proj	openbuddy	✔	✔		-	OpenBuddy/openbuddy-deepseek-67b-v15.2
openbuddy-mixtral-moe-7b-chat	OpenBuddy/openbuddy-mixtral-7bx8-v18.1-32k	q_proj, k_proj, v_proj	openbuddy	✔	✔	transformers>=4.36	-	OpenBuddy/openbuddy-mixtral-7bx8-v18.1-32k
mistral-7b	AI-ModelScope/Mistral-7B-v0.1	q_proj, k_proj, v_proj	default-generation-bos	✔	✔	transformers>=4.34	-	mistralai/Mistral-7B-v0.1
mistral-7b-v2	AI-ModelScope/Mistral-7B-v0.2-hf	q_proj, k_proj, v_proj	default-generation-bos	✔	✔	transformers>=4.34	-	alpindale/Mistral-7B-v0.2-hf
mistral-7b-instruct	AI-ModelScope/Mistral-7B-Instruct-v0.1	q_proj, k_proj, v_proj	llama	✔	✔	transformers>=4.34	-	mistralai/Mistral-7B-Instruct-v0.1
mistral-7b-instruct-v2	AI-ModelScope/Mistral-7B-Instruct-v0.2	q_proj, k_proj, v_proj	llama	✔	✔	transformers>=4.34	-	mistralai/Mistral-7B-Instruct-v0.2
mixtral-moe-7b	AI-ModelScope/Mixtral-8x7B-v0.1	q_proj, k_proj, v_proj	default-generation-bos	✔	✔	transformers>=4.36	-	mistralai/Mixtral-8x7B-v0.1
mixtral-moe-7b-instruct	AI-ModelScope/Mixtral-8x7B-Instruct-v0.1	q_proj, k_proj, v_proj	llama	✔	✔	transformers>=4.36	-	mistralai/Mixtral-8x7B-Instruct-v0.1
mixtral-moe-7b-aqlm-2bit-1x16	AI-ModelScope/Mixtral-8x7b-AQLM-2Bit-1x16-hf	q_proj, k_proj, v_proj	default-generation-bos	✔	✘	transformers>=4.38, aqlm, torch>=2.2.0	-	ISTA-DASLab/Mixtral-8x7b-AQLM-2Bit-1x16-hf
mixtral-moe-8x22b-v1	AI-ModelScope/Mixtral-8x22B-v0.1	q_proj, k_proj, v_proj	default-generation-bos	✔	✔	transformers>=4.36	-	mistral-community/Mixtral-8x22B-v0.1
wizardlm2-7b-awq	AI-ModelScope/WizardLM-2-7B-AWQ	q_proj, k_proj, v_proj	wizardlm2-awq	✔	✔	transformers>=4.34	-	MaziyarPanahi/WizardLM-2-7B-AWQ
wizardlm2-8x22b	AI-ModelScope/WizardLM-2-8x22B	q_proj, k_proj, v_proj	wizardlm2	✔	✔	transformers>=4.36	-	alpindale/WizardLM-2-8x22B
baichuan-7b	baichuan-inc/baichuan-7B	W_pack	default-generation	✘	✔	transformers<4.34	-	baichuan-inc/Baichuan-7B
baichuan-13b	baichuan-inc/Baichuan-13B-Base	W_pack	default-generation	✘	✔	transformers<4.34	-	baichuan-inc/Baichuan-13B-Base
baichuan-13b-chat	baichuan-inc/Baichuan-13B-Chat	W_pack	baichuan	✘	✔	transformers<4.34	-	baichuan-inc/Baichuan-13B-Chat
baichuan2-7b	baichuan-inc/Baichuan2-7B-Base	W_pack	default-generation	✘	✔		-	baichuan-inc/Baichuan2-7B-Base
baichuan2-7b-chat	baichuan-inc/Baichuan2-7B-Chat	W_pack	baichuan	✘	✔		-	baichuan-inc/Baichuan2-7B-Chat
baichuan2-7b-chat-int4	baichuan-inc/Baichuan2-7B-Chat-4bits	W_pack	baichuan	✘	✘	bitsandbytes<0.41.2, accelerate<0.26	-	baichuan-inc/Baichuan2-7B-Chat-4bits
baichuan2-13b	baichuan-inc/Baichuan2-13B-Base	W_pack	default-generation	✘	✔		-	baichuan-inc/Baichuan2-13B-Base
baichuan2-13b-chat	baichuan-inc/Baichuan2-13B-Chat	W_pack	baichuan	✘	✔		-	baichuan-inc/Baichuan2-13B-Chat
baichuan2-13b-chat-int4	baichuan-inc/Baichuan2-13B-Chat-4bits	W_pack	baichuan	✘	✘	bitsandbytes<0.41.2, accelerate<0.26	-	baichuan-inc/Baichuan2-13B-Chat-4bits
mplug-owl2-chat	iic/mPLUG-Owl2	q_proj, k_proj.multiway.0, k_proj.multiway.1, v_proj.multiway.0, v_proj.multiway.1	mplug-owl2	✔	✘	transformers<4.35, icecream	-	MAGAer13/mplug-owl2-llama2-7b
mplug-owl2d1-chat	iic/mPLUG-Owl2.1	c_attn.multiway.0, c_attn.multiway.1	mplug-owl2	✔	✘	transformers<4.35, icecream	-	Mizukiluke/mplug_owl_2_1
yuan2-2b-instruct	YuanLLM/Yuan2.0-2B-hf	q_proj, k_proj, v_proj	yuan	✔	✘		-	IEITYuan/Yuan2-2B-hf
yuan2-2b-janus-instruct	YuanLLM/Yuan2-2B-Janus-hf	q_proj, k_proj, v_proj	yuan	✔	✘		-	IEITYuan/Yuan2-2B-Janus-hf
yuan2-51b-instruct	YuanLLM/Yuan2.0-51B-hf	q_proj, k_proj, v_proj	yuan	✔	✘		-	IEITYuan/Yuan2-51B-hf
yuan2-102b-instruct	YuanLLM/Yuan2.0-102B-hf	q_proj, k_proj, v_proj	yuan	✔	✘		-	IEITYuan/Yuan2-102B-hf
xverse-7b	xverse/XVERSE-7B	q_proj, k_proj, v_proj	default-generation	✘	✘		-	xverse/XVERSE-7B
xverse-7b-chat	xverse/XVERSE-7B-Chat	q_proj, k_proj, v_proj	xverse	✘	✘		-	xverse/XVERSE-7B-Chat
xverse-13b	xverse/XVERSE-13B	q_proj, k_proj, v_proj	default-generation	✘	✘		-	xverse/XVERSE-13B
xverse-13b-chat	xverse/XVERSE-13B-Chat	q_proj, k_proj, v_proj	xverse	✘	✘		-	xverse/XVERSE-13B-Chat
xverse-65b	xverse/XVERSE-65B	q_proj, k_proj, v_proj	default-generation	✘	✘		-	xverse/XVERSE-65B
xverse-65b-v2	xverse/XVERSE-65B-2	q_proj, k_proj, v_proj	default-generation	✘	✘		-	xverse/XVERSE-65B-2
xverse-65b-chat	xverse/XVERSE-65B-Chat	q_proj, k_proj, v_proj	xverse	✘	✘		-	xverse/XVERSE-65B-Chat
xverse-13b-256k	xverse/XVERSE-13B-256K	q_proj, k_proj, v_proj	default-generation	✘	✘		-	xverse/XVERSE-13B-256K
xverse-moe-a4_2b	xverse/XVERSE-MoE-A4.2B	q_proj, k_proj, v_proj	default-generation	✘	✘		-	xverse/XVERSE-MoE-A4.2B
orion-14b	OrionStarAI/Orion-14B-Base	q_proj, k_proj, v_proj	default-generation	✔	✘		-	OrionStarAI/Orion-14B-Base
orion-14b-chat	OrionStarAI/Orion-14B-Chat	q_proj, k_proj, v_proj	orion	✔	✘		-	OrionStarAI/Orion-14B-Chat
bluelm-7b	vivo-ai/BlueLM-7B-Base	q_proj, k_proj, v_proj	default-generation-bos	✘	✘		-	vivo-ai/BlueLM-7B-Base
bluelm-7b-32k	vivo-ai/BlueLM-7B-Base-32K	q_proj, k_proj, v_proj	default-generation-bos	✘	✘		-	vivo-ai/BlueLM-7B-Base-32K
bluelm-7b-chat	vivo-ai/BlueLM-7B-Chat	q_proj, k_proj, v_proj	bluelm	✘	✘		-	vivo-ai/BlueLM-7B-Chat
bluelm-7b-chat-32k	vivo-ai/BlueLM-7B-Chat-32K	q_proj, k_proj, v_proj	bluelm	✘	✘		-	vivo-ai/BlueLM-7B-Chat-32K
ziya2-13b	Fengshenbang/Ziya2-13B-Base	q_proj, k_proj, v_proj	default-generation-bos	✔	✔		-	IDEA-CCNL/Ziya2-13B-Base
ziya2-13b-chat	Fengshenbang/Ziya2-13B-Chat	q_proj, k_proj, v_proj	ziya	✔	✔		-	IDEA-CCNL/Ziya2-13B-Chat
skywork-13b	skywork/Skywork-13B-base	q_proj, k_proj, v_proj	default-generation-bos	✘	✘		-	Skywork/Skywork-13B-base
skywork-13b-chat	skywork/Skywork-13B-chat	q_proj, k_proj, v_proj	skywork	✘	✘		-	-
zephyr-7b-beta-chat	modelscope/zephyr-7b-beta	q_proj, k_proj, v_proj	zephyr	✔	✔	transformers>=4.34	-	HuggingFaceH4/zephyr-7b-beta
polylm-13b	damo/nlp_polylm_13b_text_generation	c_attn	default-generation	✘	✘		-	DAMO-NLP-MT/polylm-13b
seqgpt-560m	damo/nlp_seqgpt-560m	query_key_value	default-generation	✘	✔		-	DAMO-NLP/SeqGPT-560M
sus-34b-chat	SUSTC/SUS-Chat-34B	q_proj, k_proj, v_proj	sus	✔	✔		-	SUSTech/SUS-Chat-34B
tongyi-finance-14b	TongyiFinance/Tongyi-Finance-14B	c_attn	default-generation	✔	✔		financial	-
tongyi-finance-14b-chat	TongyiFinance/Tongyi-Finance-14B-Chat	c_attn	qwen	✔	✔		financial	jxy/Tongyi-Finance-14B-Chat
tongyi-finance-14b-chat-int4	TongyiFinance/Tongyi-Finance-14B-Chat-Int4	c_attn	qwen	✔	✔	auto_gptq>=0.5	financial	jxy/Tongyi-Finance-14B-Chat-Int4
codefuse-codellama-34b-chat	codefuse-ai/CodeFuse-CodeLlama-34B	q_proj, k_proj, v_proj	codefuse-codellama	✔	✔		coding	codefuse-ai/CodeFuse-CodeLlama-34B
codefuse-codegeex2-6b-chat	codefuse-ai/CodeFuse-CodeGeeX2-6B	query_key_value	codefuse	✘	✔	transformers<4.34	coding	codefuse-ai/CodeFuse-CodeGeeX2-6B
codefuse-qwen-14b-chat	codefuse-ai/CodeFuse-QWen-14B	c_attn	codefuse	✔	✔		coding	codefuse-ai/CodeFuse-QWen-14B
phi2-3b	AI-ModelScope/phi-2	Wqkv	default-generation	✔	✔		coding	microsoft/phi-2
cogvlm-17b-instruct	ZhipuAI/cogvlm-chat	vision_expert_query_key_value, vision_expert_dense, language_expert_query_key_value, language_expert_dense	cogvlm-instruct	✘	✘		multi-modal, vision	THUDM/cogvlm-chat-hf
cogagent-18b-chat	ZhipuAI/cogagent-chat	vision_expert_query_key_value, vision_expert_dense, language_expert_query_key_value, language_expert_dense, query, key_value, dense	cogagent-chat	✘	✘		multi-modal, vision	THUDM/cogagent-chat-hf
cogagent-18b-instruct	ZhipuAI/cogagent-vqa	vision_expert_query_key_value, vision_expert_dense, language_expert_query_key_value, language_expert_dense, query, key_value, dense	cogagent-instruct	✘	✘		multi-modal, vision	THUDM/cogagent-vqa-hf
mamba-130m	AI-ModelScope/mamba-130m-hf	in_proj, x_proj, embeddings, out_proj	default-generation	✘	✘	transformers>=4.39.0	-	state-spaces/mamba-130m-hf
mamba-370m	AI-ModelScope/mamba-370m-hf	in_proj, x_proj, embeddings, out_proj	default-generation	✘	✘	transformers>=4.39.0	-	state-spaces/mamba-370m-hf
mamba-390m	AI-ModelScope/mamba-390m-hf	in_proj, x_proj, embeddings, out_proj	default-generation	✘	✘	transformers>=4.39.0	-	state-spaces/mamba-390m-hf
mamba-790m	AI-ModelScope/mamba-790m-hf	in_proj, x_proj, embeddings, out_proj	default-generation	✘	✘	transformers>=4.39.0	-	state-spaces/mamba-790m-hf
mamba-1.4b	AI-ModelScope/mamba-1.4b-hf	in_proj, x_proj, embeddings, out_proj	default-generation	✘	✘	transformers>=4.39.0	-	state-spaces/mamba-1.4b-hf
mamba-2.8b	AI-ModelScope/mamba-2.8b-hf	in_proj, x_proj, embeddings, out_proj	default-generation	✘	✘	transformers>=4.39.0	-	state-spaces/mamba-2.8b-hf
telechat-7b	TeleAI/TeleChat-7B	key_value, query	telechat	✔	✘		-	Tele-AI/telechat-7B
telechat-12b	TeleAI/TeleChat-12B	key_value, query	telechat	✔	✘		-	Tele-AI/TeleChat-12B
grok-1	colossalai/grok-1-pytorch	q_proj, k_proj, v_proj	default-generation	✘	✘		-	hpcai-tech/grok-1
dbrx-instruct	AI-ModelScope/dbrx-instruct	attn.Wqkv	dbrx	✔	✔	transformers>=4.36	-	databricks/dbrx-instruct
dbrx-base	AI-ModelScope/dbrx-base	attn.Wqkv	dbrx	✔	✔	transformers>=4.36	-	databricks/dbrx-base
mengzi3-13b-base	langboat/Mengzi3-13B-Base	q_proj, k_proj, v_proj	mengzi	✔	✔		-	Langboat/Mengzi3-13B-Base
c4ai-command-r-v01	AI-ModelScope/c4ai-command-r-v01	q_proj, k_proj, v_proj	c4ai	✔	✘	transformers>=4.39.1	-	CohereForAI/c4ai-command-r-v01
c4ai-command-r-plus	AI-ModelScope/c4ai-command-r-plus	q_proj, k_proj, v_proj	c4ai	✔	✘	transformers>4.39	-	CohereForAI/c4ai-command-r-plus

dataset

The table below introduces the datasets supported by SWIFT:

Dataset Name: The dataset name registered in SWIFT.
Dataset ID: The dataset id in ModelScope.
Size: The data row count of the dataset.
Statistic: Dataset statistics. We use the number of tokens for statistics, which helps adjust the max_length hyperparameter. We concatenate the training and validation sets of the dataset and then compute the statistics. We use qwen's tokenizer to tokenize the dataset. Different tokenizers produce different statistics. If you want to obtain token statistics for tokenizers of other models, you can use the script to get them yourself.

Dataset Name	Dataset ID	Train Size	Val Size	Statistic (token)	Tags	HF Dataset ID
🔥ms-bench	iic/ms_bench	316228	0	345.0±441.3, min=22, max=30960	chat, general, multi-round	-
🔥ms-bench-mini	iic/ms_bench	19492	0	353.9±439.4, min=29, max=12078	chat, general, multi-round	-
🔥alpaca-en	AI-ModelScope/alpaca-gpt4-data-en	52002	0	176.2±125.8, min=26, max=740	chat, general	vicgalle/alpaca-gpt4
🔥alpaca-zh	AI-ModelScope/alpaca-gpt4-data-zh	48818	0	162.1±93.9, min=26, max=856	chat, general	c-s-ale/alpaca-gpt4-data-zh
multi-alpaca-all	damo/nlp_polylm_multialpaca_sft	131867	0	112.9±50.6, min=26, max=1226	chat, general, multilingual	-
instinwild-en	wyj123456/instinwild	52191	0	160.2±69.7, min=33, max=763	chat, general	-
instinwild-zh	wyj123456/instinwild	51504	0	130.3±45.1, min=28, max=1434	chat, general	-
cot-en	YorickHe/CoT	74771	0	122.7±64.8, min=51, max=8320	chat, general	-
cot-zh	YorickHe/CoT_zh	74771	0	117.5±70.8, min=43, max=9636	chat, general	-
firefly-all-zh	wyj123456/firefly	1649399	0	178.1±260.4, min=26, max=12516	chat, general	-
instruct-en	wyj123456/instruct	888970	0	268.9±331.2, min=26, max=7252	chat, general	-
gpt4all-en	wyj123456/GPT4all	806199	0	302.5±384.1, min=27, max=7391	chat, general	-
sharegpt-en	huangjintao/sharegpt	99799	0	1045.7±431.9, min=22, max=7907	chat, general, multi-round	-
sharegpt-zh	huangjintao/sharegpt	135399	0	806.3±771.7, min=21, max=65318	chat, general, multi-round	-
tulu-v2-sft-mixture	AI-ModelScope/tulu-v2-sft-mixture	326154	0	867.8±996.4, min=22, max=12111	chat, multilingual, general, multi-round	allenai/tulu-v2-sft-mixture
wikipedia-zh	AI-ModelScope/wikipedia-cn-20230720-filtered	254547	0	568.4±713.2, min=37, max=78678	text-generation, general, pretrained	pleisto/wikipedia-cn-20230720-filtered
open-orca	AI-ModelScope/OpenOrca	3239027	0	360.4±402.9, min=27, max=8672	chat, multilingual, general	-
open-orca-gpt4	AI-ModelScope/OpenOrca	994896	0	382.3±417.4, min=31, max=8740	chat, multilingual, general	-
sharegpt-gpt4	AI-ModelScope/sharegpt_gpt4	103063	0	1286.2±2089.4, min=22, max=221080	chat, multilingual, general, multi-round	-
🔥sharegpt-gpt4-mini	AI-ModelScope/sharegpt_gpt4	6205	0	3511.6±6068.5, min=33, max=116018	chat, multilingual, general, multi-round, gpt4	-
🔥ms-agent	iic/ms_agent	30000	0	647.7±217.1, min=199, max=2722	chat, agent, multi-round	-
ms-agent-for-agentfabric-default	AI-ModelScope/ms_agent_for_agentfabric	30000	0	617.8±199.1, min=251, max=2657	chat, agent, multi-round	-
ms-agent-for-agentfabric-addition	AI-ModelScope/ms_agent_for_agentfabric	488	0	2084.9±1514.8, min=489, max=7354	chat, agent, multi-round	-
ms-agent-multirole	iic/MSAgent-MultiRole	8425	0	443.2±84.7, min=201, max=1101	chat, agent, multi-round, role-play, multi-agent	-
damo-agent-zh	damo/MSAgent-Bench	422115	161	965.7±440.9, min=321, max=31535	chat, agent, multi-round	-
damo-agent-mini-zh	damo/MSAgent-Bench	39964	152	1230.9±350.1, min=558, max=4982	chat, agent, multi-round	-
agent-instruct-all-en	huangjintao/AgentInstruct_copy	1866	0	1144.3±635.5, min=206, max=6412	chat, agent, multi-round	-
code-alpaca-en	wyj123456/code_alpaca_en	20016	0	100.1±60.1, min=29, max=1776	chat, coding	sahil2801/CodeAlpaca-20k
🔥leetcode-python-en	AI-ModelScope/leetcode-solutions-python	2359	0	723.8±233.5, min=259, max=2117	chat, coding	-
🔥codefuse-python-en	codefuse-ai/CodeExercise-Python-27k	27224	0	483.6±193.9, min=45, max=3082	chat, coding	-
🔥codefuse-evol-instruction-zh	codefuse-ai/Evol-instruction-66k	66862	0	439.6±206.3, min=37, max=2983	chat, coding	-
medical-en	huangjintao/medical_zh	117117	500	257.4±89.1, min=36, max=2564	chat, medical	-
medical-zh	huangjintao/medical_zh	1950472	500	167.2±219.7, min=26, max=27351	chat, medical	-
medical-mini-zh	huangjintao/medical_zh	50000	500	168.1±220.8, min=26, max=12320	chat, medical	-
🔥disc-med-sft-zh	AI-ModelScope/DISC-Med-SFT	441767	0	354.1±193.1, min=25, max=2231	chat, medical	Flmc/DISC-Med-SFT
lawyer-llama-zh	AI-ModelScope/lawyer_llama_data	21476	0	194.4±91.7, min=27, max=924	chat, law	Skepsun/lawyer_llama_data
tigerbot-law-zh	AI-ModelScope/tigerbot-law-plugin	55895	0	109.9±126.4, min=37, max=18878	text-generation, law, pretrained	TigerResearch/tigerbot-law-plugin
🔥disc-law-sft-zh	AI-ModelScope/DISC-Law-SFT	166758	0	533.7±495.4, min=30, max=15169	chat, law	-
🔥blossom-math-zh	AI-ModelScope/blossom-math-v2	10000	0	169.3±58.7, min=35, max=563	chat, math	Azure99/blossom-math-v2
school-math-zh	AI-ModelScope/school_math_0.25M	248480	0	157.6±72.1, min=33, max=3450	chat, math	BelleGroup/school_math_0.25M
open-platypus-en	AI-ModelScope/Open-Platypus	24926	0	367.9±254.8, min=30, max=3951	chat, math	garage-bAInd/Open-Platypus
text2sql-en	AI-ModelScope/texttosqlv2_25000_v2	25000	0	274.6±326.4, min=38, max=1975	chat, sql	Clinton/texttosqlv2_25000_v2
🔥sql-create-context-en	AI-ModelScope/sql-create-context	78577	0	80.2±17.8, min=36, max=456	chat, sql	b-mc2/sql-create-context
🔥advertise-gen-zh	lvjianjin/AdvertiseGen	97484	915	131.6±21.7, min=52, max=242	text-generation	shibing624/AdvertiseGen
🔥dureader-robust-zh	modelscope/DuReader_robust-QG	15937	1962	242.1±137.4, min=61, max=1417	text-generation	-
cmnli-zh	clue	391783	12241	83.6±16.6, min=52, max=200	text-generation, classification	clue
🔥cmnli-mini-zh	clue	20000	200	82.9±16.3, min=52, max=188	text-generation, classification	clue
🔥jd-sentiment-zh	DAMO_NLP/jd	45012	4988	67.0±83.2, min=40, max=4040	text-generation, classification	-
🔥hc3-zh	simpleai/HC3-Chinese	39781	0	177.8±81.5, min=58, max=3052	text-generation, classification	Hello-SimpleAI/HC3-Chinese
🔥hc3-en	simpleai/HC3	11021	0	299.3±138.7, min=66, max=2268	text-generation, classification	Hello-SimpleAI/HC3
finance-en	wyj123456/finance_en	68911	0	135.6±134.3, min=26, max=3525	chat, financial	ssbuild/alpaca_finance_en
poetry-zh	modelscope/chinese-poetry-collection	388599	1710	55.2±9.4, min=23, max=83	text-generation, poetry	-
webnovel-zh	AI-ModelScope/webnovel_cn	50000	0	1478.9±11526.1, min=100, max=490484	chat, novel	zxbsmk/webnovel_cn
generated-chat-zh	AI-ModelScope/generated_chat_0.4M	396004	0	273.3±52.0, min=32, max=873	chat, character-dialogue	BelleGroup/generated_chat_0.4M
cls-fudan-news-zh	damo/zh_cls_fudan-news	4959	0	3234.4±2547.5, min=91, max=19548	chat, classification	-
ner-jave-zh	damo/zh_ner-JAVE	1266	0	118.3±45.5, min=44, max=223	chat, ner	-
long-alpaca-12k	AI-ModelScope/LongAlpaca-12k	11998	0	9619.0±8295.8, min=36, max=78925	longlora, QA	Yukang/LongAlpaca-12k
coco-en	modelscope/coco_2014_caption	414113	40504	298.8±2.8, min=294, max=351	chat, multi-modal, vision	-
🔥coco-mini-en	modelscope/coco_2014_caption	20000	200	298.8±2.8, min=294, max=339	chat, multi-modal, vision	-
🔥coco-mini-en-2	modelscope/coco_2014_caption	20000	200	36.8±2.8, min=32, max=77	chat, multi-modal, vision	-
capcha-images	AI-ModelScope/captcha-images	6000	2000	29.0±0.0, min=29, max=29	chat, multi-modal, vision	-
aishell1-zh	speech_asr/speech_asr_aishell1_trainsets	134424	7176	152.2±36.8, min=63, max=419	chat, multi-modal, audio	-
🔥aishell1-mini-zh	speech_asr/speech_asr_aishell1_trainsets	14326	200	152.0±35.5, min=74, max=359	chat, multi-modal, audio	-
hh-rlhf-harmless-base	AI-ModelScope/hh-rlhf	42462	2308	167.2±123.1, min=22, max=986	rlhf, dpo, pairwise	-
hh-rlhf-helpful-base	AI-ModelScope/hh-rlhf	43777	2348	201.9±135.2, min=25, max=1070	rlhf, dpo, pairwise	-
hh-rlhf-helpful-online	AI-ModelScope/hh-rlhf	10150	1137	401.5±278.7, min=32, max=1987	rlhf, dpo, pairwise	-
hh-rlhf-helpful-rejection-sampled	AI-ModelScope/hh-rlhf	52413	2749	247.0±152.6, min=26, max=1300	rlhf, dpo, pairwise	-
hh-rlhf-red-team-attempts	AI-ModelScope/hh-rlhf	52413	2749	247.0±152.6, min=26, max=1300	rlhf, dpo, pairwise	-
🔥hh-rlhf-cn	AI-ModelScope/hh_rlhf_cn	172085	9292	172.8±124.0, min=22, max=1638	rlhf, dpo, pairwise	-
hh-rlhf-cn-harmless-base-cn	AI-ModelScope/hh_rlhf_cn	42394	2304	143.9±109.4, min=24, max=3078	rlhf, dpo, pairwise	-
hh-rlhf-cn-helpful-base-cn	AI-ModelScope/hh_rlhf_cn	43722	2346	176.8±120.0, min=26, max=1420	rlhf, dpo, pairwise	-
hh-rlhf-cn-harmless-base-en	AI-ModelScope/hh_rlhf_cn	42394	2304	167.5±123.2, min=22, max=986	rlhf, dpo, pairwise	-
hh-rlhf-cn-helpful-base-en	AI-ModelScope/hh_rlhf_cn	43722	2346	202.2±135.3, min=25, max=1070	rlhf, dpo, pairwise	-
stack-exchange-paired	AI-ModelScope/stack-exchange-paired	4483004	0	534.5±594.6, min=31, max=56588	hfrl, dpo, pairwise	-
pileval	huangjintao/pile-val-backup	214670	0	1612.3±8856.2, min=11, max=1208955	text-generation, awq	mit-han-lab/pile-val-backup
🔥coig-cqia-chinese-traditional	AI-ModelScope/COIG-CQIA	1111	0	172.6±59.9, min=55, max=856	general	-
🔥coig-cqia-coig-pc	AI-ModelScope/COIG-CQIA	3000	0	353.5±859.6, min=34, max=19288	general	-
🔥coig-cqia-exam	AI-ModelScope/COIG-CQIA	4856	0	275.0±240.0, min=45, max=4932	general	-
🔥coig-cqia-finance	AI-ModelScope/COIG-CQIA	11288	0	1266.4±561.1, min=60, max=10582	general	-
🔥coig-cqia-douban	AI-ModelScope/COIG-CQIA	3086	0	402.9±544.7, min=88, max=10870	general	-
🔥coig-cqia-human-value	AI-ModelScope/COIG-CQIA	1007	0	151.2±77.3, min=39, max=656	general	-
🔥coig-cqia-logi-qa	AI-ModelScope/COIG-CQIA	421	0	309.8±188.8, min=43, max=1306	general	-
🔥coig-cqia-ruozhiba	AI-ModelScope/COIG-CQIA	240	0	189.8±62.2, min=33, max=505	general	-
🔥coig-cqia-segmentfault	AI-ModelScope/COIG-CQIA	458	0	449.0±495.8, min=87, max=6342	general	-
🔥coig-cqia-wiki	AI-ModelScope/COIG-CQIA	10603	0	619.2±515.8, min=73, max=10140	general	-
🔥coig-cqia-wikihow	AI-ModelScope/COIG-CQIA	1485	0	1700.0±790.9, min=260, max=6371	general	-
🔥coig-cqia-xhs	AI-ModelScope/COIG-CQIA	1508	0	438.0±179.6, min=129, max=2191	general	-
🔥coig-cqia-zhihu	AI-ModelScope/COIG-CQIA	5631	0	540.7±306.7, min=161, max=3036	general	-
🔥ruozhiba-post-annual	AI-ModelScope/ruozhiba	1361	0	36.6±15.3, min=24, max=559	pretrain	-
🔥ruozhiba-title-good	AI-ModelScope/ruozhiba	2597	0	41.9±19.3, min=22, max=246	pretrain	-
🔥ruozhiba-title-norm	AI-ModelScope/ruozhiba	81700	0	39.9±12.8, min=21, max=386	pretrain	-

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Supported-models-datasets.md

Supported-models-datasets.md

Supported models and datasets

Table of Contents

Models

dataset

Files

Supported-models-datasets.md

Latest commit

History

Supported-models-datasets.md

File metadata and controls

Supported models and datasets

Table of Contents

Models

dataset