Skip to content

Latest commit

 

History

History
340 lines (335 loc) · 75.2 KB

Supported-models-datasets.md

File metadata and controls

340 lines (335 loc) · 75.2 KB

Supported models and datasets

Table of Contents

Models

The table below introcudes all models supported by SWIFT:

  • Model List: The model_type information registered in SWIFT.
  • Default Lora Target Modules: Default lora_target_modules used by the model.
  • Default Template: Default template used by the model.
  • Support Flash Attn: Whether the model supports flash attention to accelerate sft and infer.
  • Support VLLM: Whether the model supports vllm to accelerate infer and deployment.
  • Requires: The extra requirements used by the model.
Model Type Model ID Default Lora Target Modules Default Template Support Flash Attn Support VLLM Requires Tags HF Model ID
qwen-1_8b qwen/Qwen-1_8B c_attn default-generation - Qwen/Qwen1.5-1.8B
qwen-1_8b-chat qwen/Qwen-1_8B-Chat c_attn qwen - Qwen/Qwen-1_8B-Chat
qwen-1_8b-chat-int4 qwen/Qwen-1_8B-Chat-Int4 c_attn qwen auto_gptq>=0.5 - Qwen/Qwen-1_8B-Chat-Int4
qwen-1_8b-chat-int8 qwen/Qwen-1_8B-Chat-Int8 c_attn qwen auto_gptq>=0.5 - Qwen/Qwen-1_8B-Chat-Int8
qwen-7b qwen/Qwen-7B c_attn default-generation - Qwen/Qwen-7B
qwen-7b-chat qwen/Qwen-7B-Chat c_attn qwen - Qwen/Qwen-7B-Chat
qwen-7b-chat-int4 qwen/Qwen-7B-Chat-Int4 c_attn qwen auto_gptq>=0.5 - Qwen/Qwen-7B-Chat-Int4
qwen-7b-chat-int8 qwen/Qwen-7B-Chat-Int8 c_attn qwen auto_gptq>=0.5 - Qwen/Qwen-7B-Chat-Int8
qwen-14b qwen/Qwen-14B c_attn default-generation - Qwen/Qwen-14B
qwen-14b-chat qwen/Qwen-14B-Chat c_attn qwen - Qwen/Qwen-14B-Chat
qwen-14b-chat-int4 qwen/Qwen-14B-Chat-Int4 c_attn qwen auto_gptq>=0.5 - Qwen/Qwen-14B-Chat-Int4
qwen-14b-chat-int8 qwen/Qwen-14B-Chat-Int8 c_attn qwen auto_gptq>=0.5 - Qwen/Qwen-14B-Chat-Int8
qwen-72b qwen/Qwen-72B c_attn default-generation - Qwen/Qwen-72B
qwen-72b-chat qwen/Qwen-72B-Chat c_attn qwen - Qwen/Qwen-72B-Chat
qwen-72b-chat-int4 qwen/Qwen-72B-Chat-Int4 c_attn qwen auto_gptq>=0.5 - Qwen/Qwen-72B-Chat-Int4
qwen-72b-chat-int8 qwen/Qwen-72B-Chat-Int8 c_attn qwen auto_gptq>=0.5 - Qwen/Qwen-72B-Chat-Int8
qwen1half-0_5b qwen/Qwen1.5-0.5B q_proj, k_proj, v_proj default-generation transformers>=4.37 - Qwen/Qwen1.5-0.5B
qwen1half-1_8b qwen/Qwen1.5-1.8B q_proj, k_proj, v_proj default-generation transformers>=4.37 - Qwen/Qwen1.5-1.8B
qwen1half-4b qwen/Qwen1.5-4B q_proj, k_proj, v_proj default-generation transformers>=4.37 - Qwen/Qwen1.5-4B
qwen1half-7b qwen/Qwen1.5-7B q_proj, k_proj, v_proj default-generation transformers>=4.37 - Qwen/Qwen1.5-7B
qwen1half-14b qwen/Qwen1.5-14B q_proj, k_proj, v_proj default-generation transformers>=4.37 - Qwen/Qwen1.5-14B
qwen1half-32b qwen/Qwen1.5-32B q_proj, k_proj, v_proj default-generation transformers>=4.37 - Qwen/Qwen1.5-32B
qwen1half-72b qwen/Qwen1.5-72B q_proj, k_proj, v_proj default-generation transformers>=4.37 - Qwen/Qwen1.5-72B
codeqwen1half-7b qwen/CodeQwen1.5-7B q_proj, k_proj, v_proj default-generation transformers>=4.37 - Qwen/CodeQwen1.5-7B
qwen1half-moe-a2_7b qwen/Qwen1.5-MoE-A2.7B q_proj, k_proj, v_proj default-generation transformers>=4.37 - Qwen/Qwen1.5-MoE-A2.7B
qwen1half-0_5b-chat qwen/Qwen1.5-0.5B-Chat q_proj, k_proj, v_proj qwen transformers>=4.37 - Qwen/Qwen1.5-0.5B-Chat
qwen1half-1_8b-chat qwen/Qwen1.5-1.8B-Chat q_proj, k_proj, v_proj qwen transformers>=4.37 - Qwen/Qwen1.5-1.8B-Chat
qwen1half-4b-chat qwen/Qwen1.5-4B-Chat q_proj, k_proj, v_proj qwen transformers>=4.37 - Qwen/Qwen1.5-4B-Chat
qwen1half-7b-chat qwen/Qwen1.5-7B-Chat q_proj, k_proj, v_proj qwen transformers>=4.37 - Qwen/Qwen1.5-7B-Chat
qwen1half-14b-chat qwen/Qwen1.5-14B-Chat q_proj, k_proj, v_proj qwen transformers>=4.37 - Qwen/Qwen1.5-14B-Chat
qwen1half-32b-chat qwen/Qwen1.5-32B-Chat q_proj, k_proj, v_proj qwen transformers>=4.37 - Qwen/Qwen1.5-32B-Chat
qwen1half-72b-chat qwen/Qwen1.5-72B-Chat q_proj, k_proj, v_proj qwen transformers>=4.37 - Qwen/Qwen1.5-72B-Chat
qwen1half-moe-a2_7b-chat qwen/Qwen1.5-MoE-A2.7B-Chat q_proj, k_proj, v_proj qwen transformers>=4.37 - Qwen/Qwen1.5-MoE-A2.7B-Chat
codeqwen1half-7b-chat qwen/CodeQwen1.5-7B-Chat q_proj, k_proj, v_proj qwen transformers>=4.37 - Qwen/CodeQwen1.5-7B-Chat
qwen1half-0_5b-chat-int4 qwen/Qwen1.5-0.5B-Chat-GPTQ-Int4 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen1.5-0.5B-Chat-GPTQ-Int4
qwen1half-1_8b-chat-int4 qwen/Qwen1.5-1.8B-Chat-GPTQ-Int4 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen1.5-1.8B-Chat-GPTQ-Int4
qwen1half-4b-chat-int4 qwen/Qwen1.5-4B-Chat-GPTQ-Int4 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen1.5-4B-Chat-GPTQ-Int4
qwen1half-7b-chat-int4 qwen/Qwen1.5-7B-Chat-GPTQ-Int4 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen1.5-7B-Chat-GPTQ-Int4
qwen1half-14b-chat-int4 qwen/Qwen1.5-14B-Chat-GPTQ-Int4 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen1.5-14B-Chat-GPTQ-Int4
qwen1half-32b-chat-int4 qwen/Qwen1.5-32B-Chat-GPTQ-Int4 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen1.5-32B-Chat-GPTQ-Int4
qwen1half-72b-chat-int4 qwen/Qwen1.5-72B-Chat-GPTQ-Int4 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen1.5-72B-Chat-GPTQ-Int4
qwen1half-0_5b-chat-int8 qwen/Qwen1.5-0.5B-Chat-GPTQ-Int8 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen1.5-0.5B-Chat-GPTQ-Int8
qwen1half-1_8b-chat-int8 qwen/Qwen1.5-1.8B-Chat-GPTQ-Int8 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen1.5-1.8B-Chat-GPTQ-Int8
qwen1half-4b-chat-int8 qwen/Qwen1.5-4B-Chat-GPTQ-Int8 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen1.5-4B-Chat-GPTQ-Int8
qwen1half-7b-chat-int8 qwen/Qwen1.5-7B-Chat-GPTQ-Int8 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen1.5-7B-Chat-GPTQ-Int8
qwen1half-14b-chat-int8 qwen/Qwen1.5-14B-Chat-GPTQ-Int8 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen1.5-14B-Chat-GPTQ-Int8
qwen1half-72b-chat-int8 qwen/Qwen1.5-72B-Chat-GPTQ-Int8 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen1.5-72B-Chat-GPTQ-Int8
qwen1half-moe-a2_7b-chat-int4 qwen/Qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4
qwen1half-0_5b-chat-awq qwen/Qwen1.5-0.5B-Chat-AWQ q_proj, k_proj, v_proj qwen transformers>=4.37, autoawq - Qwen/Qwen1.5-0.5B-Chat-AWQ
qwen1half-1_8b-chat-awq qwen/Qwen1.5-1.8B-Chat-AWQ q_proj, k_proj, v_proj qwen transformers>=4.37, autoawq - Qwen/Qwen1.5-1.8B-Chat-AWQ
qwen1half-4b-chat-awq qwen/Qwen1.5-4B-Chat-AWQ q_proj, k_proj, v_proj qwen transformers>=4.37, autoawq - Qwen/Qwen1.5-4B-Chat-AWQ
qwen1half-7b-chat-awq qwen/Qwen1.5-7B-Chat-AWQ q_proj, k_proj, v_proj qwen transformers>=4.37, autoawq - Qwen/Qwen1.5-7B-Chat-AWQ
qwen1half-14b-chat-awq qwen/Qwen1.5-14B-Chat-AWQ q_proj, k_proj, v_proj qwen transformers>=4.37, autoawq - Qwen/Qwen1.5-14B-Chat-AWQ
qwen1half-32b-chat-awq qwen/Qwen1.5-32B-Chat-AWQ q_proj, k_proj, v_proj qwen transformers>=4.37, autoawq - Qwen/Qwen1.5-32B-Chat-AWQ
qwen1half-72b-chat-awq qwen/Qwen1.5-72B-Chat-AWQ q_proj, k_proj, v_proj qwen transformers>=4.37, autoawq - Qwen/Qwen1.5-72B-Chat-AWQ
codeqwen1half-7b-chat-awq qwen/CodeQwen1.5-7B-Chat-AWQ q_proj, k_proj, v_proj qwen transformers>=4.37, autoawq - Qwen/CodeQwen1.5-7B-Chat-AWQ
qwen-vl qwen/Qwen-VL c_attn default-generation multi-modal, vision Qwen/Qwen-VL
qwen-vl-chat qwen/Qwen-VL-Chat c_attn qwen multi-modal, vision Qwen/Qwen-VL-Chat
qwen-vl-chat-int4 qwen/Qwen-VL-Chat-Int4 c_attn qwen auto_gptq>=0.5 multi-modal, vision Qwen/Qwen-VL-Chat-Int4
qwen-audio qwen/Qwen-Audio c_attn qwen-audio-generation multi-modal, audio Qwen/Qwen-Audio
qwen-audio-chat qwen/Qwen-Audio-Chat c_attn qwen-audio multi-modal, audio Qwen/Qwen-Audio-Chat
chatglm2-6b ZhipuAI/chatglm2-6b query_key_value chatglm2 - THUDM/chatglm2-6b
chatglm2-6b-32k ZhipuAI/chatglm2-6b-32k query_key_value chatglm2 - THUDM/chatglm2-6b-32k
chatglm3-6b-base ZhipuAI/chatglm3-6b-base query_key_value chatglm-generation - THUDM/chatglm3-6b-base
chatglm3-6b ZhipuAI/chatglm3-6b query_key_value chatglm3 - THUDM/chatglm3-6b
chatglm3-6b-32k ZhipuAI/chatglm3-6b-32k query_key_value chatglm3 - THUDM/chatglm3-6b-32k
codegeex2-6b ZhipuAI/codegeex2-6b query_key_value chatglm-generation transformers<4.34 coding THUDM/codegeex2-6b
llama2-7b modelscope/Llama-2-7b-ms q_proj, k_proj, v_proj default-generation-bos - meta-llama/Llama-2-7b-hf
llama2-7b-chat modelscope/Llama-2-7b-chat-ms q_proj, k_proj, v_proj llama - meta-llama/Llama-2-7b-chat-hf
llama2-13b modelscope/Llama-2-13b-ms q_proj, k_proj, v_proj default-generation-bos - meta-llama/Llama-2-13b-hf
llama2-13b-chat modelscope/Llama-2-13b-chat-ms q_proj, k_proj, v_proj llama - meta-llama/Llama-2-13b-chat-hf
llama2-70b modelscope/Llama-2-70b-ms q_proj, k_proj, v_proj default-generation-bos - meta-llama/Llama-2-70b-hf
llama2-70b-chat modelscope/Llama-2-70b-chat-ms q_proj, k_proj, v_proj llama - meta-llama/Llama-2-70b-chat-hf
llama2-7b-aqlm-2bit-1x16 AI-ModelScope/Llama-2-7b-AQLM-2Bit-1x16-hf q_proj, k_proj, v_proj default-generation-bos transformers>=4.38, aqlm, torch>=2.2.0 - ISTA-DASLab/Llama-2-7b-AQLM-2Bit-1x16-hf
llama3-8b LLM-Research/Meta-Llama-3-8B q_proj, k_proj, v_proj default-generation - meta-llama/Meta-Llama-3-8B
llama3-8b-instruct LLM-Research/Meta-Llama-3-8B-Instruct q_proj, k_proj, v_proj llama3 - meta-llama/Meta-Llama-3-8B-Instruct
llama3-70b LLM-Research/Meta-Llama-3-70B q_proj, k_proj, v_proj default-generation - meta-llama/Meta-Llama-3-70B
llama3-70b-instruct LLM-Research/Meta-Llama-3-70B-Instruct q_proj, k_proj, v_proj llama3 - meta-llama/Meta-Llama-3-70B-Instruct
llava1d6-mistral-7b-instruct AI-ModelScope/llava-v1.6-mistral-7b q_proj, k_proj, v_proj llava-mistral-instruct transformers>=4.34 multi-modal, vision liuhaotian/llava-v1.6-mistral-7b
llava1d6-yi-34b-instruct AI-ModelScope/llava-v1.6-34b q_proj, k_proj, v_proj llava-yi-instruct multi-modal, vision liuhaotian/llava-v1.6-34b
yi-6b 01ai/Yi-6B q_proj, k_proj, v_proj default-generation - 01-ai/Yi-6B
yi-6b-200k 01ai/Yi-6B-200K q_proj, k_proj, v_proj default-generation - 01-ai/Yi-6B-200K
yi-6b-chat 01ai/Yi-6B-Chat q_proj, k_proj, v_proj yi - 01-ai/Yi-6B-Chat
yi-6b-chat-awq 01ai/Yi-6B-Chat-4bits q_proj, k_proj, v_proj yi autoawq - 01-ai/Yi-6B-Chat-4bits
yi-6b-chat-int8 01ai/Yi-6B-Chat-8bits q_proj, k_proj, v_proj yi auto_gptq - 01-ai/Yi-6B-Chat-8bits
yi-9b 01ai/Yi-9B q_proj, k_proj, v_proj default-generation - 01-ai/Yi-9B
yi-9b-200k 01ai/Yi-9B-200K q_proj, k_proj, v_proj default-generation - 01-ai/Yi-9B-200K
yi-34b 01ai/Yi-34B q_proj, k_proj, v_proj default-generation - 01-ai/Yi-34B
yi-34b-200k 01ai/Yi-34B-200K q_proj, k_proj, v_proj default-generation - 01-ai/Yi-34B-200K
yi-34b-chat 01ai/Yi-34B-Chat q_proj, k_proj, v_proj yi - 01-ai/Yi-34B-Chat
yi-34b-chat-awq 01ai/Yi-34B-Chat-4bits q_proj, k_proj, v_proj yi autoawq - 01-ai/Yi-34B-Chat-4bits
yi-34b-chat-int8 01ai/Yi-34B-Chat-8bits q_proj, k_proj, v_proj yi auto_gptq - 01-ai/Yi-34B-Chat-8bits
yi-vl-6b-chat 01ai/Yi-VL-6B q_proj, k_proj, v_proj yi-vl transformers>=4.34 multi-modal, vision 01-ai/Yi-VL-6B
yi-vl-34b-chat 01ai/Yi-VL-34B q_proj, k_proj, v_proj yi-vl transformers>=4.34 multi-modal, vision 01-ai/Yi-VL-34B
internlm-7b Shanghai_AI_Laboratory/internlm-7b q_proj, k_proj, v_proj default-generation-bos - internlm/internlm-7b
internlm-7b-chat Shanghai_AI_Laboratory/internlm-chat-7b q_proj, k_proj, v_proj internlm - internlm/internlm-chat-7b
internlm-7b-chat-8k Shanghai_AI_Laboratory/internlm-chat-7b-8k q_proj, k_proj, v_proj internlm - -
internlm-20b Shanghai_AI_Laboratory/internlm-20b q_proj, k_proj, v_proj default-generation-bos - internlm/internlm2-20b
internlm-20b-chat Shanghai_AI_Laboratory/internlm-chat-20b q_proj, k_proj, v_proj internlm - internlm/internlm2-chat-20b
internlm2-1_8b Shanghai_AI_Laboratory/internlm2-1_8b wqkv default-generation-bos - internlm/internlm2-1_8b
internlm2-1_8b-sft-chat Shanghai_AI_Laboratory/internlm2-chat-1_8b-sft wqkv internlm2 - internlm/internlm2-chat-1_8b-sft
internlm2-1_8b-chat Shanghai_AI_Laboratory/internlm2-chat-1_8b wqkv internlm2 - internlm/internlm2-chat-1_8b
internlm2-7b-base Shanghai_AI_Laboratory/internlm2-base-7b wqkv default-generation-bos - internlm/internlm2-base-7b
internlm2-7b Shanghai_AI_Laboratory/internlm2-7b wqkv default-generation-bos - internlm/internlm2-7b
internlm2-7b-sft-chat Shanghai_AI_Laboratory/internlm2-chat-7b-sft wqkv internlm2 - internlm/internlm2-chat-7b-sft
internlm2-7b-chat Shanghai_AI_Laboratory/internlm2-chat-7b wqkv internlm2 - internlm/internlm2-chat-7b
internlm2-20b-base Shanghai_AI_Laboratory/internlm2-base-20b wqkv default-generation-bos - internlm/internlm2-base-20b
internlm2-20b Shanghai_AI_Laboratory/internlm2-20b wqkv default-generation-bos - internlm/internlm2-20b
internlm2-20b-sft-chat Shanghai_AI_Laboratory/internlm2-chat-20b-sft wqkv internlm2 - internlm/internlm2-chat-20b-sft
internlm2-20b-chat Shanghai_AI_Laboratory/internlm2-chat-20b wqkv internlm2 - internlm/internlm2-chat-20b
internlm2-math-7b Shanghai_AI_Laboratory/internlm2-math-base-7b wqkv default-generation-bos math internlm/internlm2-math-base-7b
internlm2-math-7b-chat Shanghai_AI_Laboratory/internlm2-math-7b wqkv internlm2 math internlm/internlm2-math-7b
internlm2-math-20b Shanghai_AI_Laboratory/internlm2-math-base-20b wqkv default-generation-bos math internlm/internlm2-math-base-20b
internlm2-math-20b-chat Shanghai_AI_Laboratory/internlm2-math-20b wqkv internlm2 math internlm/internlm2-math-20b
internlm-xcomposer2-7b-chat Shanghai_AI_Laboratory/internlm-xcomposer2-7b wqkv internlm-xcomposer2 multi-modal, vision internlm/internlm-xcomposer2-7b
deepseek-7b deepseek-ai/deepseek-llm-7b-base q_proj, k_proj, v_proj default-generation-bos - deepseek-ai/deepseek-llm-7b-base
deepseek-7b-chat deepseek-ai/deepseek-llm-7b-chat q_proj, k_proj, v_proj deepseek - deepseek-ai/deepseek-llm-7b-chat
deepseek-moe-16b deepseek-ai/deepseek-moe-16b-base q_proj, k_proj, v_proj default-generation-bos - deepseek-ai/deepseek-moe-16b-base
deepseek-moe-16b-chat deepseek-ai/deepseek-moe-16b-chat q_proj, k_proj, v_proj deepseek - deepseek-ai/deepseek-moe-16b-chat
deepseek-67b deepseek-ai/deepseek-llm-67b-base q_proj, k_proj, v_proj default-generation-bos - deepseek-ai/deepseek-llm-67b-base
deepseek-67b-chat deepseek-ai/deepseek-llm-67b-chat q_proj, k_proj, v_proj deepseek - deepseek-ai/deepseek-llm-67b-chat
deepseek-coder-1_3b deepseek-ai/deepseek-coder-1.3b-base q_proj, k_proj, v_proj default-generation-bos coding deepseek-ai/deepseek-coder-1.3b-base
deepseek-coder-1_3b-instruct deepseek-ai/deepseek-coder-1.3b-instruct q_proj, k_proj, v_proj deepseek-coder coding deepseek-ai/deepseek-coder-1.3b-instruct
deepseek-coder-6_7b deepseek-ai/deepseek-coder-6.7b-base q_proj, k_proj, v_proj default-generation-bos coding deepseek-ai/deepseek-coder-6.7b-base
deepseek-coder-6_7b-instruct deepseek-ai/deepseek-coder-6.7b-instruct q_proj, k_proj, v_proj deepseek-coder coding deepseek-ai/deepseek-coder-6.7b-instruct
deepseek-coder-33b deepseek-ai/deepseek-coder-33b-base q_proj, k_proj, v_proj default-generation-bos coding deepseek-ai/deepseek-coder-33b-base
deepseek-coder-33b-instruct deepseek-ai/deepseek-coder-33b-instruct q_proj, k_proj, v_proj deepseek-coder coding deepseek-ai/deepseek-coder-33b-instruct
deepseek-math-7b deepseek-ai/deepseek-math-7b-base q_proj, k_proj, v_proj default-generation-bos math deepseek-ai/deepseek-math-7b-base
deepseek-math-7b-instruct deepseek-ai/deepseek-math-7b-instruct q_proj, k_proj, v_proj deepseek math deepseek-ai/deepseek-math-7b-instruct
deepseek-math-7b-chat deepseek-ai/deepseek-math-7b-rl q_proj, k_proj, v_proj deepseek math deepseek-ai/deepseek-math-7b-rl
deepseek-vl-1_3b-chat deepseek-ai/deepseek-vl-1.3b-chat q_proj, k_proj, v_proj deepseek-vl multi-modal, vision deepseek-ai/deepseek-vl-1.3b-chat
deepseek-vl-7b-chat deepseek-ai/deepseek-vl-7b-chat q_proj, k_proj, v_proj deepseek-vl multi-modal, vision deepseek-ai/deepseek-vl-7b-chat
gemma-2b AI-ModelScope/gemma-2b q_proj, k_proj, v_proj default-generation-bos transformers>=4.38 - google/gemma-2b
gemma-7b AI-ModelScope/gemma-7b q_proj, k_proj, v_proj default-generation-bos transformers>=4.38 - google/gemma-7b
gemma-2b-instruct AI-ModelScope/gemma-2b-it q_proj, k_proj, v_proj gemma transformers>=4.38 - google/gemma-2b-it
gemma-7b-instruct AI-ModelScope/gemma-7b-it q_proj, k_proj, v_proj gemma transformers>=4.38 - google/gemma-7b-it
minicpm-1b-sft-chat OpenBMB/MiniCPM-1B-sft-bf16 q_proj, k_proj, v_proj minicpm transformers>=4.36.0 - openbmb/MiniCPM-1B-sft-bf16
minicpm-2b-sft-chat OpenBMB/MiniCPM-2B-sft-fp32 q_proj, k_proj, v_proj minicpm - openbmb/MiniCPM-2B-sft-fp32
minicpm-2b-chat OpenBMB/MiniCPM-2B-dpo-fp32 q_proj, k_proj, v_proj minicpm - openbmb/MiniCPM-2B-dpo-fp32
minicpm-2b-128k OpenBMB/MiniCPM-2B-128k q_proj, k_proj, v_proj chatml transformers>=4.36.0 - openbmb/MiniCPM-2B-128k
minicpm-moe-8x2b OpenBMB/MiniCPM-MoE-8x2B q_proj, k_proj, v_proj minicpm transformers>=4.36.0 - openbmb/MiniCPM-MoE-8x2B
minicpm-v-3b-chat OpenBMB/MiniCPM-V q_proj, k_proj, v_proj minicpm-v - openbmb/MiniCPM-V
minicpm-v-v2 OpenBMB/MiniCPM-V-2 q_proj, k_proj, v_proj minicpm-v - openbmb/MiniCPM-V-2
openbuddy-llama2-13b-chat OpenBuddy/openbuddy-llama2-13b-v8.1-fp16 q_proj, k_proj, v_proj openbuddy - OpenBuddy/openbuddy-llama2-13b-v8.1-fp16
openbuddy-llama-65b-chat OpenBuddy/openbuddy-llama-65b-v8-bf16 q_proj, k_proj, v_proj openbuddy - OpenBuddy/openbuddy-llama-65b-v8-bf16
openbuddy-llama2-70b-chat OpenBuddy/openbuddy-llama2-70b-v10.1-bf16 q_proj, k_proj, v_proj openbuddy - OpenBuddy/openbuddy-llama2-70b-v10.1-bf16
openbuddy-mistral-7b-chat OpenBuddy/openbuddy-mistral-7b-v17.1-32k q_proj, k_proj, v_proj openbuddy transformers>=4.34 - OpenBuddy/openbuddy-mistral-7b-v17.1-32k
openbuddy-zephyr-7b-chat OpenBuddy/openbuddy-zephyr-7b-v14.1 q_proj, k_proj, v_proj openbuddy transformers>=4.34 - OpenBuddy/openbuddy-zephyr-7b-v14.1
openbuddy-deepseek-67b-chat OpenBuddy/openbuddy-deepseek-67b-v15.2 q_proj, k_proj, v_proj openbuddy - OpenBuddy/openbuddy-deepseek-67b-v15.2
openbuddy-mixtral-moe-7b-chat OpenBuddy/openbuddy-mixtral-7bx8-v18.1-32k q_proj, k_proj, v_proj openbuddy transformers>=4.36 - OpenBuddy/openbuddy-mixtral-7bx8-v18.1-32k
mistral-7b AI-ModelScope/Mistral-7B-v0.1 q_proj, k_proj, v_proj default-generation-bos transformers>=4.34 - mistralai/Mistral-7B-v0.1
mistral-7b-v2 AI-ModelScope/Mistral-7B-v0.2-hf q_proj, k_proj, v_proj default-generation-bos transformers>=4.34 - alpindale/Mistral-7B-v0.2-hf
mistral-7b-instruct AI-ModelScope/Mistral-7B-Instruct-v0.1 q_proj, k_proj, v_proj llama transformers>=4.34 - mistralai/Mistral-7B-Instruct-v0.1
mistral-7b-instruct-v2 AI-ModelScope/Mistral-7B-Instruct-v0.2 q_proj, k_proj, v_proj llama transformers>=4.34 - mistralai/Mistral-7B-Instruct-v0.2
mixtral-moe-7b AI-ModelScope/Mixtral-8x7B-v0.1 q_proj, k_proj, v_proj default-generation-bos transformers>=4.36 - mistralai/Mixtral-8x7B-v0.1
mixtral-moe-7b-instruct AI-ModelScope/Mixtral-8x7B-Instruct-v0.1 q_proj, k_proj, v_proj llama transformers>=4.36 - mistralai/Mixtral-8x7B-Instruct-v0.1
mixtral-moe-7b-aqlm-2bit-1x16 AI-ModelScope/Mixtral-8x7b-AQLM-2Bit-1x16-hf q_proj, k_proj, v_proj default-generation-bos transformers>=4.38, aqlm, torch>=2.2.0 - ISTA-DASLab/Mixtral-8x7b-AQLM-2Bit-1x16-hf
mixtral-moe-8x22b-v1 AI-ModelScope/Mixtral-8x22B-v0.1 q_proj, k_proj, v_proj default-generation-bos transformers>=4.36 - mistral-community/Mixtral-8x22B-v0.1
wizardlm2-7b-awq AI-ModelScope/WizardLM-2-7B-AWQ q_proj, k_proj, v_proj wizardlm2-awq transformers>=4.34 - MaziyarPanahi/WizardLM-2-7B-AWQ
wizardlm2-8x22b AI-ModelScope/WizardLM-2-8x22B q_proj, k_proj, v_proj wizardlm2 transformers>=4.36 - alpindale/WizardLM-2-8x22B
baichuan-7b baichuan-inc/baichuan-7B W_pack default-generation transformers<4.34 - baichuan-inc/Baichuan-7B
baichuan-13b baichuan-inc/Baichuan-13B-Base W_pack default-generation transformers<4.34 - baichuan-inc/Baichuan-13B-Base
baichuan-13b-chat baichuan-inc/Baichuan-13B-Chat W_pack baichuan transformers<4.34 - baichuan-inc/Baichuan-13B-Chat
baichuan2-7b baichuan-inc/Baichuan2-7B-Base W_pack default-generation - baichuan-inc/Baichuan2-7B-Base
baichuan2-7b-chat baichuan-inc/Baichuan2-7B-Chat W_pack baichuan - baichuan-inc/Baichuan2-7B-Chat
baichuan2-7b-chat-int4 baichuan-inc/Baichuan2-7B-Chat-4bits W_pack baichuan bitsandbytes<0.41.2, accelerate<0.26 - baichuan-inc/Baichuan2-7B-Chat-4bits
baichuan2-13b baichuan-inc/Baichuan2-13B-Base W_pack default-generation - baichuan-inc/Baichuan2-13B-Base
baichuan2-13b-chat baichuan-inc/Baichuan2-13B-Chat W_pack baichuan - baichuan-inc/Baichuan2-13B-Chat
baichuan2-13b-chat-int4 baichuan-inc/Baichuan2-13B-Chat-4bits W_pack baichuan bitsandbytes<0.41.2, accelerate<0.26 - baichuan-inc/Baichuan2-13B-Chat-4bits
mplug-owl2-chat iic/mPLUG-Owl2 q_proj, k_proj.multiway.0, k_proj.multiway.1, v_proj.multiway.0, v_proj.multiway.1 mplug-owl2 transformers<4.35, icecream - MAGAer13/mplug-owl2-llama2-7b
mplug-owl2d1-chat iic/mPLUG-Owl2.1 c_attn.multiway.0, c_attn.multiway.1 mplug-owl2 transformers<4.35, icecream - Mizukiluke/mplug_owl_2_1
yuan2-2b-instruct YuanLLM/Yuan2.0-2B-hf q_proj, k_proj, v_proj yuan - IEITYuan/Yuan2-2B-hf
yuan2-2b-janus-instruct YuanLLM/Yuan2-2B-Janus-hf q_proj, k_proj, v_proj yuan - IEITYuan/Yuan2-2B-Janus-hf
yuan2-51b-instruct YuanLLM/Yuan2.0-51B-hf q_proj, k_proj, v_proj yuan - IEITYuan/Yuan2-51B-hf
yuan2-102b-instruct YuanLLM/Yuan2.0-102B-hf q_proj, k_proj, v_proj yuan - IEITYuan/Yuan2-102B-hf
xverse-7b xverse/XVERSE-7B q_proj, k_proj, v_proj default-generation - xverse/XVERSE-7B
xverse-7b-chat xverse/XVERSE-7B-Chat q_proj, k_proj, v_proj xverse - xverse/XVERSE-7B-Chat
xverse-13b xverse/XVERSE-13B q_proj, k_proj, v_proj default-generation - xverse/XVERSE-13B
xverse-13b-chat xverse/XVERSE-13B-Chat q_proj, k_proj, v_proj xverse - xverse/XVERSE-13B-Chat
xverse-65b xverse/XVERSE-65B q_proj, k_proj, v_proj default-generation - xverse/XVERSE-65B
xverse-65b-v2 xverse/XVERSE-65B-2 q_proj, k_proj, v_proj default-generation - xverse/XVERSE-65B-2
xverse-65b-chat xverse/XVERSE-65B-Chat q_proj, k_proj, v_proj xverse - xverse/XVERSE-65B-Chat
xverse-13b-256k xverse/XVERSE-13B-256K q_proj, k_proj, v_proj default-generation - xverse/XVERSE-13B-256K
xverse-moe-a4_2b xverse/XVERSE-MoE-A4.2B q_proj, k_proj, v_proj default-generation - xverse/XVERSE-MoE-A4.2B
orion-14b OrionStarAI/Orion-14B-Base q_proj, k_proj, v_proj default-generation - OrionStarAI/Orion-14B-Base
orion-14b-chat OrionStarAI/Orion-14B-Chat q_proj, k_proj, v_proj orion - OrionStarAI/Orion-14B-Chat
bluelm-7b vivo-ai/BlueLM-7B-Base q_proj, k_proj, v_proj default-generation-bos - vivo-ai/BlueLM-7B-Base
bluelm-7b-32k vivo-ai/BlueLM-7B-Base-32K q_proj, k_proj, v_proj default-generation-bos - vivo-ai/BlueLM-7B-Base-32K
bluelm-7b-chat vivo-ai/BlueLM-7B-Chat q_proj, k_proj, v_proj bluelm - vivo-ai/BlueLM-7B-Chat
bluelm-7b-chat-32k vivo-ai/BlueLM-7B-Chat-32K q_proj, k_proj, v_proj bluelm - vivo-ai/BlueLM-7B-Chat-32K
ziya2-13b Fengshenbang/Ziya2-13B-Base q_proj, k_proj, v_proj default-generation-bos - IDEA-CCNL/Ziya2-13B-Base
ziya2-13b-chat Fengshenbang/Ziya2-13B-Chat q_proj, k_proj, v_proj ziya - IDEA-CCNL/Ziya2-13B-Chat
skywork-13b skywork/Skywork-13B-base q_proj, k_proj, v_proj default-generation-bos - Skywork/Skywork-13B-base
skywork-13b-chat skywork/Skywork-13B-chat q_proj, k_proj, v_proj skywork - -
zephyr-7b-beta-chat modelscope/zephyr-7b-beta q_proj, k_proj, v_proj zephyr transformers>=4.34 - HuggingFaceH4/zephyr-7b-beta
polylm-13b damo/nlp_polylm_13b_text_generation c_attn default-generation - DAMO-NLP-MT/polylm-13b
seqgpt-560m damo/nlp_seqgpt-560m query_key_value default-generation - DAMO-NLP/SeqGPT-560M
sus-34b-chat SUSTC/SUS-Chat-34B q_proj, k_proj, v_proj sus - SUSTech/SUS-Chat-34B
tongyi-finance-14b TongyiFinance/Tongyi-Finance-14B c_attn default-generation financial -
tongyi-finance-14b-chat TongyiFinance/Tongyi-Finance-14B-Chat c_attn qwen financial jxy/Tongyi-Finance-14B-Chat
tongyi-finance-14b-chat-int4 TongyiFinance/Tongyi-Finance-14B-Chat-Int4 c_attn qwen auto_gptq>=0.5 financial jxy/Tongyi-Finance-14B-Chat-Int4
codefuse-codellama-34b-chat codefuse-ai/CodeFuse-CodeLlama-34B q_proj, k_proj, v_proj codefuse-codellama coding codefuse-ai/CodeFuse-CodeLlama-34B
codefuse-codegeex2-6b-chat codefuse-ai/CodeFuse-CodeGeeX2-6B query_key_value codefuse transformers<4.34 coding codefuse-ai/CodeFuse-CodeGeeX2-6B
codefuse-qwen-14b-chat codefuse-ai/CodeFuse-QWen-14B c_attn codefuse coding codefuse-ai/CodeFuse-QWen-14B
phi2-3b AI-ModelScope/phi-2 Wqkv default-generation coding microsoft/phi-2
cogvlm-17b-instruct ZhipuAI/cogvlm-chat vision_expert_query_key_value, vision_expert_dense, language_expert_query_key_value, language_expert_dense cogvlm-instruct multi-modal, vision THUDM/cogvlm-chat-hf
cogagent-18b-chat ZhipuAI/cogagent-chat vision_expert_query_key_value, vision_expert_dense, language_expert_query_key_value, language_expert_dense, query, key_value, dense cogagent-chat multi-modal, vision THUDM/cogagent-chat-hf
cogagent-18b-instruct ZhipuAI/cogagent-vqa vision_expert_query_key_value, vision_expert_dense, language_expert_query_key_value, language_expert_dense, query, key_value, dense cogagent-instruct multi-modal, vision THUDM/cogagent-vqa-hf
mamba-130m AI-ModelScope/mamba-130m-hf in_proj, x_proj, embeddings, out_proj default-generation transformers>=4.39.0 - state-spaces/mamba-130m-hf
mamba-370m AI-ModelScope/mamba-370m-hf in_proj, x_proj, embeddings, out_proj default-generation transformers>=4.39.0 - state-spaces/mamba-370m-hf
mamba-390m AI-ModelScope/mamba-390m-hf in_proj, x_proj, embeddings, out_proj default-generation transformers>=4.39.0 - state-spaces/mamba-390m-hf
mamba-790m AI-ModelScope/mamba-790m-hf in_proj, x_proj, embeddings, out_proj default-generation transformers>=4.39.0 - state-spaces/mamba-790m-hf
mamba-1.4b AI-ModelScope/mamba-1.4b-hf in_proj, x_proj, embeddings, out_proj default-generation transformers>=4.39.0 - state-spaces/mamba-1.4b-hf
mamba-2.8b AI-ModelScope/mamba-2.8b-hf in_proj, x_proj, embeddings, out_proj default-generation transformers>=4.39.0 - state-spaces/mamba-2.8b-hf
telechat-7b TeleAI/TeleChat-7B key_value, query telechat - Tele-AI/telechat-7B
telechat-12b TeleAI/TeleChat-12B key_value, query telechat - Tele-AI/TeleChat-12B
grok-1 colossalai/grok-1-pytorch q_proj, k_proj, v_proj default-generation - hpcai-tech/grok-1
dbrx-instruct AI-ModelScope/dbrx-instruct attn.Wqkv dbrx transformers>=4.36 - databricks/dbrx-instruct
dbrx-base AI-ModelScope/dbrx-base attn.Wqkv dbrx transformers>=4.36 - databricks/dbrx-base
mengzi3-13b-base langboat/Mengzi3-13B-Base q_proj, k_proj, v_proj mengzi - Langboat/Mengzi3-13B-Base
c4ai-command-r-v01 AI-ModelScope/c4ai-command-r-v01 q_proj, k_proj, v_proj c4ai transformers>=4.39.1 - CohereForAI/c4ai-command-r-v01
c4ai-command-r-plus AI-ModelScope/c4ai-command-r-plus q_proj, k_proj, v_proj c4ai transformers>4.39 - CohereForAI/c4ai-command-r-plus

dataset

The table below introduces the datasets supported by SWIFT:

  • Dataset Name: The dataset name registered in SWIFT.
  • Dataset ID: The dataset id in ModelScope.
  • Size: The data row count of the dataset.
  • Statistic: Dataset statistics. We use the number of tokens for statistics, which helps adjust the max_length hyperparameter. We concatenate the training and validation sets of the dataset and then compute the statistics. We use qwen's tokenizer to tokenize the dataset. Different tokenizers produce different statistics. If you want to obtain token statistics for tokenizers of other models, you can use the script to get them yourself.
Dataset Name Dataset ID Train Size Val Size Statistic (token) Tags HF Dataset ID
🔥ms-bench iic/ms_bench 316228 0 345.0±441.3, min=22, max=30960 chat, general, multi-round -
🔥ms-bench-mini iic/ms_bench 19492 0 353.9±439.4, min=29, max=12078 chat, general, multi-round -
🔥alpaca-en AI-ModelScope/alpaca-gpt4-data-en 52002 0 176.2±125.8, min=26, max=740 chat, general vicgalle/alpaca-gpt4
🔥alpaca-zh AI-ModelScope/alpaca-gpt4-data-zh 48818 0 162.1±93.9, min=26, max=856 chat, general c-s-ale/alpaca-gpt4-data-zh
multi-alpaca-all damo/nlp_polylm_multialpaca_sft 131867 0 112.9±50.6, min=26, max=1226 chat, general, multilingual -
instinwild-en wyj123456/instinwild 52191 0 160.2±69.7, min=33, max=763 chat, general -
instinwild-zh wyj123456/instinwild 51504 0 130.3±45.1, min=28, max=1434 chat, general -
cot-en YorickHe/CoT 74771 0 122.7±64.8, min=51, max=8320 chat, general -
cot-zh YorickHe/CoT_zh 74771 0 117.5±70.8, min=43, max=9636 chat, general -
firefly-all-zh wyj123456/firefly 1649399 0 178.1±260.4, min=26, max=12516 chat, general -
instruct-en wyj123456/instruct 888970 0 268.9±331.2, min=26, max=7252 chat, general -
gpt4all-en wyj123456/GPT4all 806199 0 302.5±384.1, min=27, max=7391 chat, general -
sharegpt-en huangjintao/sharegpt 99799 0 1045.7±431.9, min=22, max=7907 chat, general, multi-round -
sharegpt-zh huangjintao/sharegpt 135399 0 806.3±771.7, min=21, max=65318 chat, general, multi-round -
tulu-v2-sft-mixture AI-ModelScope/tulu-v2-sft-mixture 326154 0 867.8±996.4, min=22, max=12111 chat, multilingual, general, multi-round allenai/tulu-v2-sft-mixture
wikipedia-zh AI-ModelScope/wikipedia-cn-20230720-filtered 254547 0 568.4±713.2, min=37, max=78678 text-generation, general, pretrained pleisto/wikipedia-cn-20230720-filtered
open-orca AI-ModelScope/OpenOrca 3239027 0 360.4±402.9, min=27, max=8672 chat, multilingual, general -
open-orca-gpt4 AI-ModelScope/OpenOrca 994896 0 382.3±417.4, min=31, max=8740 chat, multilingual, general -
sharegpt-gpt4 AI-ModelScope/sharegpt_gpt4 103063 0 1286.2±2089.4, min=22, max=221080 chat, multilingual, general, multi-round -
🔥sharegpt-gpt4-mini AI-ModelScope/sharegpt_gpt4 6205 0 3511.6±6068.5, min=33, max=116018 chat, multilingual, general, multi-round, gpt4 -
🔥ms-agent iic/ms_agent 30000 0 647.7±217.1, min=199, max=2722 chat, agent, multi-round -
ms-agent-for-agentfabric-default AI-ModelScope/ms_agent_for_agentfabric 30000 0 617.8±199.1, min=251, max=2657 chat, agent, multi-round -
ms-agent-for-agentfabric-addition AI-ModelScope/ms_agent_for_agentfabric 488 0 2084.9±1514.8, min=489, max=7354 chat, agent, multi-round -
ms-agent-multirole iic/MSAgent-MultiRole 8425 0 443.2±84.7, min=201, max=1101 chat, agent, multi-round, role-play, multi-agent -
damo-agent-zh damo/MSAgent-Bench 422115 161 965.7±440.9, min=321, max=31535 chat, agent, multi-round -
damo-agent-mini-zh damo/MSAgent-Bench 39964 152 1230.9±350.1, min=558, max=4982 chat, agent, multi-round -
agent-instruct-all-en huangjintao/AgentInstruct_copy 1866 0 1144.3±635.5, min=206, max=6412 chat, agent, multi-round -
code-alpaca-en wyj123456/code_alpaca_en 20016 0 100.1±60.1, min=29, max=1776 chat, coding sahil2801/CodeAlpaca-20k
🔥leetcode-python-en AI-ModelScope/leetcode-solutions-python 2359 0 723.8±233.5, min=259, max=2117 chat, coding -
🔥codefuse-python-en codefuse-ai/CodeExercise-Python-27k 27224 0 483.6±193.9, min=45, max=3082 chat, coding -
🔥codefuse-evol-instruction-zh codefuse-ai/Evol-instruction-66k 66862 0 439.6±206.3, min=37, max=2983 chat, coding -
medical-en huangjintao/medical_zh 117117 500 257.4±89.1, min=36, max=2564 chat, medical -
medical-zh huangjintao/medical_zh 1950472 500 167.2±219.7, min=26, max=27351 chat, medical -
medical-mini-zh huangjintao/medical_zh 50000 500 168.1±220.8, min=26, max=12320 chat, medical -
🔥disc-med-sft-zh AI-ModelScope/DISC-Med-SFT 441767 0 354.1±193.1, min=25, max=2231 chat, medical Flmc/DISC-Med-SFT
lawyer-llama-zh AI-ModelScope/lawyer_llama_data 21476 0 194.4±91.7, min=27, max=924 chat, law Skepsun/lawyer_llama_data
tigerbot-law-zh AI-ModelScope/tigerbot-law-plugin 55895 0 109.9±126.4, min=37, max=18878 text-generation, law, pretrained TigerResearch/tigerbot-law-plugin
🔥disc-law-sft-zh AI-ModelScope/DISC-Law-SFT 166758 0 533.7±495.4, min=30, max=15169 chat, law -
🔥blossom-math-zh AI-ModelScope/blossom-math-v2 10000 0 169.3±58.7, min=35, max=563 chat, math Azure99/blossom-math-v2
school-math-zh AI-ModelScope/school_math_0.25M 248480 0 157.6±72.1, min=33, max=3450 chat, math BelleGroup/school_math_0.25M
open-platypus-en AI-ModelScope/Open-Platypus 24926 0 367.9±254.8, min=30, max=3951 chat, math garage-bAInd/Open-Platypus
text2sql-en AI-ModelScope/texttosqlv2_25000_v2 25000 0 274.6±326.4, min=38, max=1975 chat, sql Clinton/texttosqlv2_25000_v2
🔥sql-create-context-en AI-ModelScope/sql-create-context 78577 0 80.2±17.8, min=36, max=456 chat, sql b-mc2/sql-create-context
🔥advertise-gen-zh lvjianjin/AdvertiseGen 97484 915 131.6±21.7, min=52, max=242 text-generation shibing624/AdvertiseGen
🔥dureader-robust-zh modelscope/DuReader_robust-QG 15937 1962 242.1±137.4, min=61, max=1417 text-generation -
cmnli-zh clue 391783 12241 83.6±16.6, min=52, max=200 text-generation, classification clue
🔥cmnli-mini-zh clue 20000 200 82.9±16.3, min=52, max=188 text-generation, classification clue
🔥jd-sentiment-zh DAMO_NLP/jd 45012 4988 67.0±83.2, min=40, max=4040 text-generation, classification -
🔥hc3-zh simpleai/HC3-Chinese 39781 0 177.8±81.5, min=58, max=3052 text-generation, classification Hello-SimpleAI/HC3-Chinese
🔥hc3-en simpleai/HC3 11021 0 299.3±138.7, min=66, max=2268 text-generation, classification Hello-SimpleAI/HC3
finance-en wyj123456/finance_en 68911 0 135.6±134.3, min=26, max=3525 chat, financial ssbuild/alpaca_finance_en
poetry-zh modelscope/chinese-poetry-collection 388599 1710 55.2±9.4, min=23, max=83 text-generation, poetry -
webnovel-zh AI-ModelScope/webnovel_cn 50000 0 1478.9±11526.1, min=100, max=490484 chat, novel zxbsmk/webnovel_cn
generated-chat-zh AI-ModelScope/generated_chat_0.4M 396004 0 273.3±52.0, min=32, max=873 chat, character-dialogue BelleGroup/generated_chat_0.4M
cls-fudan-news-zh damo/zh_cls_fudan-news 4959 0 3234.4±2547.5, min=91, max=19548 chat, classification -
ner-jave-zh damo/zh_ner-JAVE 1266 0 118.3±45.5, min=44, max=223 chat, ner -
long-alpaca-12k AI-ModelScope/LongAlpaca-12k 11998 0 9619.0±8295.8, min=36, max=78925 longlora, QA Yukang/LongAlpaca-12k
coco-en modelscope/coco_2014_caption 414113 40504 298.8±2.8, min=294, max=351 chat, multi-modal, vision -
🔥coco-mini-en modelscope/coco_2014_caption 20000 200 298.8±2.8, min=294, max=339 chat, multi-modal, vision -
🔥coco-mini-en-2 modelscope/coco_2014_caption 20000 200 36.8±2.8, min=32, max=77 chat, multi-modal, vision -
capcha-images AI-ModelScope/captcha-images 6000 2000 29.0±0.0, min=29, max=29 chat, multi-modal, vision -
aishell1-zh speech_asr/speech_asr_aishell1_trainsets 134424 7176 152.2±36.8, min=63, max=419 chat, multi-modal, audio -
🔥aishell1-mini-zh speech_asr/speech_asr_aishell1_trainsets 14326 200 152.0±35.5, min=74, max=359 chat, multi-modal, audio -
hh-rlhf-harmless-base AI-ModelScope/hh-rlhf 42462 2308 167.2±123.1, min=22, max=986 rlhf, dpo, pairwise -
hh-rlhf-helpful-base AI-ModelScope/hh-rlhf 43777 2348 201.9±135.2, min=25, max=1070 rlhf, dpo, pairwise -
hh-rlhf-helpful-online AI-ModelScope/hh-rlhf 10150 1137 401.5±278.7, min=32, max=1987 rlhf, dpo, pairwise -
hh-rlhf-helpful-rejection-sampled AI-ModelScope/hh-rlhf 52413 2749 247.0±152.6, min=26, max=1300 rlhf, dpo, pairwise -
hh-rlhf-red-team-attempts AI-ModelScope/hh-rlhf 52413 2749 247.0±152.6, min=26, max=1300 rlhf, dpo, pairwise -
🔥hh-rlhf-cn AI-ModelScope/hh_rlhf_cn 172085 9292 172.8±124.0, min=22, max=1638 rlhf, dpo, pairwise -
hh-rlhf-cn-harmless-base-cn AI-ModelScope/hh_rlhf_cn 42394 2304 143.9±109.4, min=24, max=3078 rlhf, dpo, pairwise -
hh-rlhf-cn-helpful-base-cn AI-ModelScope/hh_rlhf_cn 43722 2346 176.8±120.0, min=26, max=1420 rlhf, dpo, pairwise -
hh-rlhf-cn-harmless-base-en AI-ModelScope/hh_rlhf_cn 42394 2304 167.5±123.2, min=22, max=986 rlhf, dpo, pairwise -
hh-rlhf-cn-helpful-base-en AI-ModelScope/hh_rlhf_cn 43722 2346 202.2±135.3, min=25, max=1070 rlhf, dpo, pairwise -
stack-exchange-paired AI-ModelScope/stack-exchange-paired 4483004 0 534.5±594.6, min=31, max=56588 hfrl, dpo, pairwise -
pileval huangjintao/pile-val-backup 214670 0 1612.3±8856.2, min=11, max=1208955 text-generation, awq mit-han-lab/pile-val-backup
🔥coig-cqia-chinese-traditional AI-ModelScope/COIG-CQIA 1111 0 172.6±59.9, min=55, max=856 general -
🔥coig-cqia-coig-pc AI-ModelScope/COIG-CQIA 3000 0 353.5±859.6, min=34, max=19288 general -
🔥coig-cqia-exam AI-ModelScope/COIG-CQIA 4856 0 275.0±240.0, min=45, max=4932 general -
🔥coig-cqia-finance AI-ModelScope/COIG-CQIA 11288 0 1266.4±561.1, min=60, max=10582 general -
🔥coig-cqia-douban AI-ModelScope/COIG-CQIA 3086 0 402.9±544.7, min=88, max=10870 general -
🔥coig-cqia-human-value AI-ModelScope/COIG-CQIA 1007 0 151.2±77.3, min=39, max=656 general -
🔥coig-cqia-logi-qa AI-ModelScope/COIG-CQIA 421 0 309.8±188.8, min=43, max=1306 general -
🔥coig-cqia-ruozhiba AI-ModelScope/COIG-CQIA 240 0 189.8±62.2, min=33, max=505 general -
🔥coig-cqia-segmentfault AI-ModelScope/COIG-CQIA 458 0 449.0±495.8, min=87, max=6342 general -
🔥coig-cqia-wiki AI-ModelScope/COIG-CQIA 10603 0 619.2±515.8, min=73, max=10140 general -
🔥coig-cqia-wikihow AI-ModelScope/COIG-CQIA 1485 0 1700.0±790.9, min=260, max=6371 general -
🔥coig-cqia-xhs AI-ModelScope/COIG-CQIA 1508 0 438.0±179.6, min=129, max=2191 general -
🔥coig-cqia-zhihu AI-ModelScope/COIG-CQIA 5631 0 540.7±306.7, min=161, max=3036 general -
🔥ruozhiba-post-annual AI-ModelScope/ruozhiba 1361 0 36.6±15.3, min=24, max=559 pretrain -
🔥ruozhiba-title-good AI-ModelScope/ruozhiba 2597 0 41.9±19.3, min=22, max=246 pretrain -
🔥ruozhiba-title-norm AI-ModelScope/ruozhiba 81700 0 39.9±12.8, min=21, max=386 pretrain -