The table below introduces the models integrated with ms-swift:
- Model ID: Model ID for the ModelScope Model
- HF Model ID: Hugging Face Model ID
- Model Type: Type of the model
- Default Template: Default chat template
- Requires: Additional dependencies required to use the model
- Tags: Tags associated with the model
Model ID | Model Type | Default Template | Requires | Tags | HF Model ID |
---|---|---|---|---|---|
Qwen/Qwen-VL-Chat | qwen_vl | qwen_vl | - | vision | Qwen/Qwen-VL-Chat |
Qwen/Qwen-VL | qwen_vl | qwen_vl | - | vision | Qwen/Qwen-VL |
Qwen/Qwen-VL-Chat-Int4 | qwen_vl | qwen_vl | - | vision | Qwen/Qwen-VL-Chat-Int4 |
Qwen/Qwen-Audio-Chat | qwen_audio | qwen_audio | - | audio | Qwen/Qwen-Audio-Chat |
Qwen/Qwen-Audio | qwen_audio | qwen_audio | - | audio | Qwen/Qwen-Audio |
Qwen/Qwen2-VL-2B-Instruct | qwen2_vl | qwen2_vl | transformers>=4.45, qwen_vl_utils, pyav | vision, video | Qwen/Qwen2-VL-2B-Instruct |
Qwen/Qwen2-VL-7B-Instruct | qwen2_vl | qwen2_vl | transformers>=4.45, qwen_vl_utils, pyav | vision, video | Qwen/Qwen2-VL-7B-Instruct |
Qwen/Qwen2-VL-72B-Instruct | qwen2_vl | qwen2_vl | transformers>=4.45, qwen_vl_utils, pyav | vision, video | Qwen/Qwen2-VL-72B-Instruct |
Qwen/Qwen2-VL-2B | qwen2_vl | qwen2_vl | transformers>=4.45, qwen_vl_utils, pyav | vision, video | Qwen/Qwen2-VL-2B |
Qwen/Qwen2-VL-7B | qwen2_vl | qwen2_vl | transformers>=4.45, qwen_vl_utils, pyav | vision, video | Qwen/Qwen2-VL-7B |
Qwen/Qwen2-VL-72B | qwen2_vl | qwen2_vl | transformers>=4.45, qwen_vl_utils, pyav | vision, video | Qwen/Qwen2-VL-72B |
Qwen/Qwen2-VL-2B-Instruct-GPTQ-Int4 | qwen2_vl | qwen2_vl | transformers>=4.45, qwen_vl_utils, pyav | vision, video | Qwen/Qwen2-VL-2B-Instruct-GPTQ-Int4 |
Qwen/Qwen2-VL-7B-Instruct-GPTQ-Int4 | qwen2_vl | qwen2_vl | transformers>=4.45, qwen_vl_utils, pyav | vision, video | Qwen/Qwen2-VL-7B-Instruct-GPTQ-Int4 |
Qwen/Qwen2-VL-72B-Instruct-GPTQ-Int4 | qwen2_vl | qwen2_vl | transformers>=4.45, qwen_vl_utils, pyav | vision, video | Qwen/Qwen2-VL-72B-Instruct-GPTQ-Int4 |
Qwen/Qwen2-VL-2B-Instruct-GPTQ-Int8 | qwen2_vl | qwen2_vl | transformers>=4.45, qwen_vl_utils, pyav | vision, video | Qwen/Qwen2-VL-2B-Instruct-GPTQ-Int8 |
Qwen/Qwen2-VL-7B-Instruct-GPTQ-Int8 | qwen2_vl | qwen2_vl | transformers>=4.45, qwen_vl_utils, pyav | vision, video | Qwen/Qwen2-VL-7B-Instruct-GPTQ-Int8 |
Qwen/Qwen2-VL-72B-Instruct-GPTQ-Int8 | qwen2_vl | qwen2_vl | transformers>=4.45, qwen_vl_utils, pyav | vision, video | Qwen/Qwen2-VL-72B-Instruct-GPTQ-Int8 |
Qwen/Qwen2-VL-2B-Instruct-AWQ | qwen2_vl | qwen2_vl | transformers>=4.45, qwen_vl_utils, pyav | vision, video | Qwen/Qwen2-VL-2B-Instruct-AWQ |
Qwen/Qwen2-VL-7B-Instruct-AWQ | qwen2_vl | qwen2_vl | transformers>=4.45, qwen_vl_utils, pyav | vision, video | Qwen/Qwen2-VL-7B-Instruct-AWQ |
Qwen/Qwen2-VL-72B-Instruct-AWQ | qwen2_vl | qwen2_vl | transformers>=4.45, qwen_vl_utils, pyav | vision, video | Qwen/Qwen2-VL-72B-Instruct-AWQ |
Qwen/Qwen2-Audio-7B-Instruct | qwen2_audio | qwen2_audio | transformers>=4.45, librosa | audio | Qwen/Qwen2-Audio-7B-Instruct |
Qwen/Qwen2-Audio-7B | qwen2_audio | qwen2_audio | transformers>=4.45, librosa | audio | Qwen/Qwen2-Audio-7B |
Qwen/Qwen2-Audio-7B | qwen2_audio | qwen2_audio | transformers>=4.45, librosa | audio | Qwen/Qwen2-Audio-7B |
AIDC-AI/Ovis1.6-Gemma2-9B | ovis1_6 | ovis1_6 | transformers>=4.42 | vision | AIDC-AI/Ovis1.6-Gemma2-9B |
ZhipuAI/glm-4v-9b | glm4v | glm4v | transformers>=4.42 | - | THUDM/glm-4v-9b |
ZhipuAI/glm-edge-v-2b | glm_edge_v | glm_edge_v | transformers>=4.46 | vision | THUDM/glm-edge-v-2b |
ZhipuAI/glm-edge-4b-chat | glm_edge_v | glm_edge_v | transformers>=4.46 | vision | THUDM/glm-edge-4b-chat |
ZhipuAI/cogvlm-chat | cogvlm | cogvlm | transformers<4.42 | - | THUDM/cogvlm-chat-hf |
ZhipuAI/cogagent-vqa | cogagent_vqa | cogagent_vqa | transformers<4.42 | - | THUDM/cogagent-vqa-hf |
ZhipuAI/cogagent-chat | cogagent_chat | cogagent_chat | transformers<4.42, timm | - | THUDM/cogagent-chat-hf |
ZhipuAI/cogvlm2-llama3-chat-19B | cogvlm2 | cogvlm2 | transformers<4.42 | - | THUDM/cogvlm2-llama3-chat-19B |
ZhipuAI/cogvlm2-llama3-chinese-chat-19B | cogvlm2 | cogvlm2 | transformers<4.42 | - | THUDM/cogvlm2-llama3-chinese-chat-19B |
ZhipuAI/cogvlm2-video-llama3-chat | cogvlm2_video | cogvlm2_video | decord, pytorchvideo, transformers>=4.42 | video | THUDM/cogvlm2-video-llama3-chat |
OpenGVLab/Mini-InternVL-Chat-2B-V1-5 | internvl | internvl | transformers>=4.35, timm | vision | OpenGVLab/Mini-InternVL-Chat-2B-V1-5 |
AI-ModelScope/InternVL-Chat-V1-5 | internvl | internvl | transformers>=4.35, timm | vision | OpenGVLab/InternVL-Chat-V1-5 |
AI-ModelScope/InternVL-Chat-V1-5-int8 | internvl | internvl | transformers>=4.35, timm | vision | OpenGVLab/InternVL-Chat-V1-5-int8 |
OpenGVLab/Mini-InternVL-Chat-4B-V1-5 | internvl_phi3 | internvl_phi3 | transformers>=4.35,<4.42, timm | vision | OpenGVLab/Mini-InternVL-Chat-4B-V1-5 |
OpenGVLab/InternVL2-1B | internvl2 | internvl2 | transformers>=4.36, timm | vision, video | OpenGVLab/InternVL2-1B |
OpenGVLab/InternVL2-2B | internvl2 | internvl2 | transformers>=4.36, timm | vision, video | OpenGVLab/InternVL2-2B |
OpenGVLab/InternVL2-8B | internvl2 | internvl2 | transformers>=4.36, timm | vision, video | OpenGVLab/InternVL2-8B |
OpenGVLab/InternVL2-26B | internvl2 | internvl2 | transformers>=4.36, timm | vision, video | OpenGVLab/InternVL2-26B |
OpenGVLab/InternVL2-40B | internvl2 | internvl2 | transformers>=4.36, timm | vision, video | OpenGVLab/InternVL2-40B |
OpenGVLab/InternVL2-Llama3-76B | internvl2 | internvl2 | transformers>=4.36, timm | vision, video | OpenGVLab/InternVL2-Llama3-76B |
OpenGVLab/InternVL2-2B-AWQ | internvl2 | internvl2 | transformers>=4.36, timm | vision, video | OpenGVLab/InternVL2-2B-AWQ |
OpenGVLab/InternVL2-8B-AWQ | internvl2 | internvl2 | transformers>=4.36, timm | vision, video | OpenGVLab/InternVL2-8B-AWQ |
OpenGVLab/InternVL2-26B-AWQ | internvl2 | internvl2 | transformers>=4.36, timm | vision, video | OpenGVLab/InternVL2-26B-AWQ |
OpenGVLab/InternVL2-40B-AWQ | internvl2 | internvl2 | transformers>=4.36, timm | vision, video | OpenGVLab/InternVL2-40B-AWQ |
OpenGVLab/InternVL2-Llama3-76B-AWQ | internvl2 | internvl2 | transformers>=4.36, timm | vision, video | OpenGVLab/InternVL2-Llama3-76B-AWQ |
OpenGVLab/InternVL2-4B | internvl2_phi3 | internvl2_phi3 | transformers>=4.36,<4.42, timm | vision, video | OpenGVLab/InternVL2-4B |
OpenGVLab/InternVL2_5-1B | internvl2_5 | internvl2_5 | transformers>=4.36, timm | vision, video | OpenGVLab/InternVL2_5-1B |
OpenGVLab/InternVL2_5-2B | internvl2_5 | internvl2_5 | transformers>=4.36, timm | vision, video | OpenGVLab/InternVL2_5-2B |
OpenGVLab/InternVL2_5-4B | internvl2_5 | internvl2_5 | transformers>=4.36, timm | vision, video | OpenGVLab/InternVL2_5-4B |
OpenGVLab/InternVL2_5-8B | internvl2_5 | internvl2_5 | transformers>=4.36, timm | vision, video | OpenGVLab/InternVL2_5-8B |
OpenGVLab/InternVL2_5-26B | internvl2_5 | internvl2_5 | transformers>=4.36, timm | vision, video | OpenGVLab/InternVL2_5-26B |
OpenGVLab/InternVL2_5-38B | internvl2_5 | internvl2_5 | transformers>=4.36, timm | vision, video | OpenGVLab/InternVL2_5-38B |
OpenGVLab/InternVL2_5-78B | internvl2_5 | internvl2_5 | transformers>=4.36, timm | vision, video | OpenGVLab/InternVL2_5-78B |
Shanghai_AI_Laboratory/internlm-xcomposer2-7b | xcomposer2 | ixcomposer2 | - | vision | internlm/internlm-xcomposer2-7b |
Shanghai_AI_Laboratory/internlm-xcomposer2-4khd-7b | xcomposer2_4khd | ixcomposer2 | - | vision | internlm/internlm-xcomposer2-4khd-7b |
Shanghai_AI_Laboratory/internlm-xcomposer2d5-7b | xcomposer2_5 | xcomposer2_5 | decord | vision | internlm/internlm-xcomposer2d5-7b |
LLM-Research/Llama-3.2-11B-Vision-Instruct | llama3_2_vision | llama3_2_vision | transformers>=4.45 | vision | meta-llama/Llama-3.2-11B-Vision-Instruct |
LLM-Research/Llama-3.2-90B-Vision-Instruct | llama3_2_vision | llama3_2_vision | transformers>=4.45 | vision | meta-llama/Llama-3.2-90B-Vision-Instruct |
LLM-Research/Llama-3.2-11B-Vision | llama3_2_vision | llama3_2_vision | transformers>=4.45 | vision | meta-llama/Llama-3.2-11B-Vision |
LLM-Research/Llama-3.2-90B-Vision | llama3_2_vision | llama3_2_vision | transformers>=4.45 | vision | meta-llama/Llama-3.2-90B-Vision |
ICTNLP/Llama-3.1-8B-Omni | llama3_1_omni | llama3_1_omni | whisper, openai-whisper | audio | ICTNLP/Llama-3.1-8B-Omni |
swift/llava-1.5-7b-hf | llava1_5_hf | llava1_5_hf | transformers>=4.36 | vision | llava-hf/llava-1.5-7b-hf |
swift/llava-1.5-13b-hf | llava1_5_hf | llava1_5_hf | transformers>=4.36 | vision | llava-hf/llava-1.5-13b-hf |
swift/llava-v1.6-mistral-7b-hf | llava1_6_mistral_hf | llava1_6_mistral_hf | transformers>=4.39 | vision | llava-hf/llava-v1.6-mistral-7b-hf |
swift/llava-v1.6-vicuna-7b-hf | llava1_6_vicuna_hf | llava1_6_vicuna_hf | transformers>=4.39 | vision | llava-hf/llava-v1.6-vicuna-7b-hf |
swift/llava-v1.6-vicuna-13b-hf | llava1_6_vicuna_hf | llava1_6_vicuna_hf | transformers>=4.39 | vision | llava-hf/llava-v1.6-vicuna-13b-hf |
swift/llava-v1.6-34b-hf | llava1_6_yi_hf | llava1_6_yi_hf | transformers>=4.39 | vision | llava-hf/llava-v1.6-34b-hf |
swift/llama3-llava-next-8b-hf | llama3_llava_next_hf | llama3_llava_next_hf | transformers>=4.39 | vision | llava-hf/llama3-llava-next-8b-hf |
AI-ModelScope/llava-next-72b-hf | llava_next_qwen_hf | llava_next_qwen_hf | transformers>=4.39 | vision | llava-hf/llava-next-72b-hf |
AI-ModelScope/llava-next-110b-hf | llava_next_qwen_hf | llava_next_qwen_hf | transformers>=4.39 | vision | llava-hf/llava-next-110b-hf |
swift/LLaVA-NeXT-Video-7B-DPO-hf | llava_next_video_hf | llava_next_video_hf | transformers>=4.42, av | video | llava-hf/LLaVA-NeXT-Video-7B-DPO-hf |
swift/LLaVA-NeXT-Video-7B-32K-hf | llava_next_video_hf | llava_next_video_hf | transformers>=4.42, av | video | llava-hf/LLaVA-NeXT-Video-7B-32K-hf |
swift/LLaVA-NeXT-Video-7B-hf | llava_next_video_hf | llava_next_video_hf | transformers>=4.42, av | video | llava-hf/LLaVA-NeXT-Video-7B-hf |
swift/LLaVA-NeXT-Video-34B-hf | llava_next_video_yi_hf | llava_next_video_hf | transformers>=4.42, av | video | llava-hf/LLaVA-NeXT-Video-34B-hf |
AI-ModelScope/llava-onevision-qwen2-0.5b-ov-hf | llava_onevision_hf | llava_onevision_hf | transformers>=4.45 | vision, video | llava-hf/llava-onevision-qwen2-0.5b-ov-hf |
AI-ModelScope/llava-onevision-qwen2-7b-ov-hf | llava_onevision_hf | llava_onevision_hf | transformers>=4.45 | vision, video | llava-hf/llava-onevision-qwen2-7b-ov-hf |
AI-ModelScope/llava-onevision-qwen2-72b-ov-hf | llava_onevision_hf | llava_onevision_hf | transformers>=4.45 | vision, video | llava-hf/llava-onevision-qwen2-72b-ov-hf |
01ai/Yi-VL-6B | yi_vl | yi_vl | transformers>=4.34 | vision | 01-ai/Yi-VL-6B |
01ai/Yi-VL-34B | yi_vl | yi_vl | transformers>=4.34 | vision | 01-ai/Yi-VL-34B |
swift/llava-llama3.1-8b | llava_llama3_1_hf | llava_llama3_1_hf | transformers>=4.41 | vision | - |
AI-ModelScope/llava-llama-3-8b-v1_1-transformers | llava_llama3_hf | llava_llama3_hf | transformers>=4.36 | vision | xtuner/llava-llama-3-8b-v1_1-transformers |
AI-ModelScope/llava-v1.6-mistral-7b | llava1_6_mistral | llava1_6_mistral | transformers>=4.34 | vision | liuhaotian/llava-v1.6-mistral-7b |
AI-ModelScope/llava-v1.6-34b | llava1_6_yi | llava1_6_yi | transformers>=4.34 | vision | liuhaotian/llava-v1.6-34b |
AI-Modelscope/llava-next-72b | llava_next_qwen | llava_next_qwen | transformers>=4.42, av | vision | lmms-lab/llava-next-72b |
AI-Modelscope/llava-next-110b | llava_next_qwen | llava_next_qwen | transformers>=4.42, av | vision | lmms-lab/llava-next-110b |
AI-Modelscope/llama3-llava-next-8b | llama3_llava_next | llama3_llava_next | transformers>=4.42, av | vision | lmms-lab/llama3-llava-next-8b |
deepseek-ai/deepseek-vl-1.3b-chat | deepseek_vl | deepseek_vl | - | vision | deepseek-ai/deepseek-vl-1.3b-chat |
deepseek-ai/deepseek-vl-7b-chat | deepseek_vl | deepseek_vl | - | vision | deepseek-ai/deepseek-vl-7b-chat |
deepseek-ai/Janus-1.3B | deepseek_janus | deepseek_janus | - | vision | deepseek-ai/Janus-1.3B |
OpenBMB/MiniCPM-V | minicpmv | minicpmv | timm, transformers<4.42 | vision | openbmb/MiniCPM-V |
OpenBMB/MiniCPM-V-2 | minicpmv | minicpmv | timm, transformers<4.42 | vision | openbmb/MiniCPM-V-2 |
OpenBMB/MiniCPM-V-2_6 | minicpmv2_6 | minicpmv2_6 | timm, transformers>=4.36, decord | vision, video | openbmb/MiniCPM-V-2_6 |
OpenBMB/MiniCPM-Llama3-V-2_5 | minicpmv2_5 | minicpmv2_5 | timm, transformers>=4.36 | vision | openbmb/MiniCPM-Llama3-V-2_5 |
iic/mPLUG-Owl2 | mplug_owl2 | mplug_owl2 | transformers<4.35, icecream | vision | MAGAer13/mplug-owl2-llama2-7b |
iic/mPLUG-Owl2.1 | mplug_owl2_1 | mplug_owl2 | transformers<4.35, icecream | vision | Mizukiluke/mplug_owl_2_1 |
iic/mPLUG-Owl3-1B-241014 | mplug_owl3 | mplug_owl3 | transformers>=4.36, icecream, decord | vision, video | mPLUG/mPLUG-Owl3-1B-241014 |
iic/mPLUG-Owl3-2B-241014 | mplug_owl3 | mplug_owl3 | transformers>=4.36, icecream, decord | vision, video | mPLUG/mPLUG-Owl3-2B-241014 |
iic/mPLUG-Owl3-7B-240728 | mplug_owl3 | mplug_owl3 | transformers>=4.36, icecream, decord | vision, video | mPLUG/mPLUG-Owl3-7B-240728 |
iic/mPLUG-Owl3-7B-241101 | mplug_owl3_241101 | mplug_owl3_241101 | transformers>=4.36, icecream | vision, video | mPLUG/mPLUG-Owl3-7B-241101 |
BAAI/Emu3-Gen | emu3_gen | emu3_gen | - | t2i | BAAI/Emu3-Gen |
BAAI/Emu3-Chat | emu3_chat | emu3_chat | transformers>=4.44.0 | vision | BAAI/Emu3-Chat |
stepfun-ai/GOT-OCR2_0 | got_ocr2 | got_ocr2 | - | vision | stepfun-ai/GOT-OCR2_0 |
LLM-Research/Phi-3-vision-128k-instruct | phi3_vision | phi3_vision | transformers>=4.36 | vision | microsoft/Phi-3-vision-128k-instruct |
LLM-Research/Phi-3.5-vision-instruct | phi3_vision | phi3_vision | transformers>=4.36 | vision | microsoft/Phi-3.5-vision-instruct |
AI-ModelScope/Florence-2-base-ft | florence | florence | - | vision | microsoft/Florence-2-base-ft |
AI-ModelScope/Florence-2-base | florence | florence | - | vision | microsoft/Florence-2-base |
AI-ModelScope/Florence-2-large | florence | florence | - | vision | microsoft/Florence-2-large |
AI-ModelScope/Florence-2-large-ft | florence | florence | - | vision | microsoft/Florence-2-large-ft |
AI-ModelScope/Idefics3-8B-Llama3 | idefics3 | idefics3 | transformers>=4.45 | vision | HuggingFaceM4/Idefics3-8B-Llama3 |
AI-ModelScope/paligemma-3b-pt-224 | paligemma | paligemma | transformers>=4.41 | vision | google/paligemma-3b-pt-224 |
AI-ModelScope/paligemma-3b-pt-448 | paligemma | paligemma | transformers>=4.41 | vision | google/paligemma-3b-pt-448 |
AI-ModelScope/paligemma-3b-pt-896 | paligemma | paligemma | transformers>=4.41 | vision | google/paligemma-3b-pt-896 |
AI-ModelScope/paligemma-3b-mix-224 | paligemma | paligemma | transformers>=4.41 | vision | google/paligemma-3b-mix-224 |
AI-ModelScope/paligemma-3b-mix-448 | paligemma | paligemma | transformers>=4.41 | vision | google/paligemma-3b-mix-448 |
LLM-Research/Molmo-7B-O-0924 | molmo | molmo | transformers>=4.45 | vision | allenai/Molmo-7B-O-0924 |
LLM-Research/Molmo-7B-D-0924 | molmo | molmo | transformers>=4.45 | vision | allenai/Molmo-7B-D-0924 |
LLM-Research/Molmo-72B-0924 | molmo | molmo | transformers>=4.45 | vision | allenai/Molmo-72B-0924 |
LLM-Research/MolmoE-1B-0924 | molmoe | molmo | transformers>=4.45 | vision | allenai/MolmoE-1B-0924 |
AI-ModelScope/pixtral-12b | pixtral | pixtral | transformers>=4.45 | vision | mistral-community/pixtral-12b |
The table below introduces information about the datasets integrated with ms-swift:
- Dataset ID: ModelScope dataset ID
- HF Dataset ID: Hugging Face dataset ID
- Subset Name: Name of the subset
- Dataset Size: Size of the dataset
- Statistic: The statistical count of the dataset. We use the number of tokens for statistics, which helps in adjusting the
max_length
hyperparameter. We tokenize the dataset using the tokenizer of qwen2.5. The token count varies with different tokenizers. If you need to obtain token statistics for tokenizers of other models, you can acquire it using the script. - Tags: Tags associated with the dataset
Dataset ID | Subset Name | Dataset Size | Statistic (token) | Tags | HF Dataset ID |
---|---|---|---|---|---|
AI-ModelScope/COIG-CQIA | chinese_traditional coig_pc exam finance douban human_value logi_qa ruozhiba segmentfault wiki wikihow xhs zhihu |
44694 | 331.2±693.8, min=34, max=19288 | general, 🔥 | - |
AI-ModelScope/CodeAlpaca-20k | default | 20022 | 99.3±57.6, min=30, max=857 | code, en | HuggingFaceH4/CodeAlpaca_20K |
AI-ModelScope/DISC-Law-SFT | default | 166758 | 1799.0±474.9, min=769, max=3151 | chat, law, 🔥 | ShengbinYue/DISC-Law-SFT |
AI-ModelScope/DISC-Med-SFT | default | 464885 | 426.5±178.7, min=110, max=1383 | chat, medical, 🔥 | Flmc/DISC-Med-SFT |
AI-ModelScope/Duet-v0.5 | default | 5000 | 1157.4±189.3, min=657, max=2344 | CoT, en | G-reen/Duet-v0.5 |
AI-ModelScope/GuanacoDataset | default | 31563 | 250.3±70.6, min=95, max=987 | chat, zh | JosephusCheung/GuanacoDataset |
AI-ModelScope/LLaVA-Instruct-150K | default | 623302 | 630.7±143.0, min=301, max=1166 | chat, multi-modal, vision | - |
AI-ModelScope/LLaVA-Pretrain | default | huge dataset | - | chat, multi-modal, quality | liuhaotian/LLaVA-Pretrain |
AI-ModelScope/LaTeX_OCR | default synthetic_handwrite |
162149 | 117.6±44.9, min=41, max=312 | chat, ocr, multi-modal, vision | linxy/LaTeX_OCR |
AI-ModelScope/LongAlpaca-12k | default | 11998 | 9941.8±3417.1, min=4695, max=25826 | long-sequence, QA | Yukang/LongAlpaca-12k |
AI-ModelScope/M3IT | coco vqa-v2 shapes shapes-rephrased coco-goi-rephrased snli-ve snli-ve-rephrased okvqa a-okvqa viquae textcap docvqa science-qa imagenet imagenet-open-ended imagenet-rephrased coco-goi clevr clevr-rephrased nlvr coco-itm coco-itm-rephrased vsr vsr-rephrased mocheg mocheg-rephrased coco-text fm-iqa activitynet-qa msrvtt ss coco-cn refcoco refcoco-rephrased multi30k image-paragraph-captioning visual-dialog visual-dialog-rephrased iqa vcr visual-mrc ivqa msrvtt-qa msvd-qa gqa text-vqa ocr-vqa st-vqa flickr8k-cn |
huge dataset | - | chat, multi-modal, vision | - |
AI-ModelScope/Magpie-Qwen2-Pro-200K-Chinese | default | 200000 | 448.4±223.5, min=87, max=4098 | chat, sft, 🔥, zh | Magpie-Align/Magpie-Qwen2-Pro-200K-Chinese |
AI-ModelScope/Magpie-Qwen2-Pro-200K-English | default | 200000 | 609.9±277.1, min=257, max=4098 | chat, sft, 🔥, en | Magpie-Align/Magpie-Qwen2-Pro-200K-English |
AI-ModelScope/Magpie-Qwen2-Pro-300K-Filtered | default | 300000 | 556.6±288.6, min=175, max=4098 | chat, sft, 🔥 | Magpie-Align/Magpie-Qwen2-Pro-300K-Filtered |
AI-ModelScope/MathInstruct | default | 262040 | 253.3±177.4, min=42, max=2193 | math, cot, en, quality | TIGER-Lab/MathInstruct |
AI-ModelScope/MovieChat-1K-test | default | 162 | 39.7±2.0, min=32, max=43 | chat, multi-modal, video | Enxin/MovieChat-1K-test |
AI-ModelScope/Open-Platypus | default | 24926 | 389.0±256.4, min=55, max=3153 | chat, math, quality | garage-bAInd/Open-Platypus |
AI-ModelScope/OpenO1-SFT | default | 125894 | 1080.7±622.9, min=145, max=11637 | chat, general, o1 | O1-OPEN/OpenO1-SFT |
AI-ModelScope/OpenOrca | default 3_5M |
huge dataset | - | chat, multilingual, general | - |
AI-ModelScope/OpenOrca-Chinese | default | huge dataset | - | QA, zh, general, quality | yys/OpenOrca-Chinese |
AI-ModelScope/SFT-Nectar | default | 131201 | 441.9±307.0, min=45, max=3136 | cot, en, quality | AstraMindAI/SFT-Nectar |
AI-ModelScope/ShareGPT-4o | image_caption | 57289 | 599.8±140.4, min=214, max=1932 | vqa, multi-modal | OpenGVLab/ShareGPT-4o |
AI-ModelScope/ShareGPT4V | ShareGPT4V ShareGPT4V-PT |
huge dataset | - | chat, multi-modal, vision | - |
AI-ModelScope/SkyPile-150B | default | huge dataset | - | pretrain, quality, zh | Skywork/SkyPile-150B |
AI-ModelScope/WizardLM_evol_instruct_V2_196k | default | 109184 | 483.3±338.4, min=27, max=3735 | chat, en | WizardLM/WizardLM_evol_instruct_V2_196k |
AI-ModelScope/alpaca-cleaned | default | 51760 | 170.1±122.9, min=29, max=1028 | chat, general, bench, quality | yahma/alpaca-cleaned |
AI-ModelScope/alpaca-gpt4-data-en | default | 52002 | 167.6±123.9, min=29, max=607 | chat, general, 🔥 | vicgalle/alpaca-gpt4 |
AI-ModelScope/alpaca-gpt4-data-zh | default | 48818 | 157.2±93.2, min=27, max=544 | chat, general, 🔥 | llm-wizard/alpaca-gpt4-data-zh |
AI-ModelScope/blossom-math-v2 | default | 10000 | 175.4±59.1, min=35, max=563 | chat, math, 🔥 | Azure99/blossom-math-v2 |
AI-ModelScope/captcha-images | default | 8000 | 47.0±0.0, min=47, max=47 | chat, multi-modal, vision | - |
AI-ModelScope/databricks-dolly-15k | default | 15011 | 199.0±268.8, min=26, max=5987 | multi-task, en, quality | databricks/databricks-dolly-15k |
AI-ModelScope/deepctrl-sft-data | default en |
huge dataset | - | chat, general, sft, multi-round | - |
AI-ModelScope/egoschema | Subset | 101 | 191.6±80.7, min=96, max=435 | chat, multi-modal, video | lmms-lab/egoschema |
AI-ModelScope/firefly-train-1.1M | default | 1649399 | 204.3±365.3, min=28, max=9306 | chat, general | YeungNLP/firefly-train-1.1M |
AI-ModelScope/generated_chat_0.4M | default | 396004 | 272.7±51.1, min=78, max=579 | chat, character-dialogue | BelleGroup/generated_chat_0.4M |
AI-ModelScope/guanaco_belle_merge_v1.0 | default | 693987 | 133.8±93.5, min=30, max=1872 | QA, zh | Chinese-Vicuna/guanaco_belle_merge_v1.0 |
AI-ModelScope/hh-rlhf | helpful-base helpful-online helpful-rejection-sampled |
huge dataset | - | rlhf, dpo | - |
AI-ModelScope/hh_rlhf_cn | hh_rlhf harmless_base_cn harmless_base_en helpful_base_cn helpful_base_en |
362909 | 142.3±107.5, min=25, max=1571 | rlhf, dpo, 🔥 | - |
AI-ModelScope/lawyer_llama_data | default | 21476 | 224.4±83.9, min=69, max=832 | chat, law | Skepsun/lawyer_llama_data |
AI-ModelScope/leetcode-solutions-python | default | 2359 | 723.8±233.5, min=259, max=2117 | chat, coding, 🔥 | - |
AI-ModelScope/lmsys-chat-1m | default | 166211 | 545.8±3272.8, min=22, max=219116 | chat, em | lmsys/lmsys-chat-1m |
AI-ModelScope/ms_agent_for_agentfabric | default addition |
30000 | 615.7±198.7, min=251, max=2055 | chat, agent, multi-round, 🔥 | - |
AI-ModelScope/orpo-dpo-mix-40k | default | 43666 | 938.1±694.2, min=36, max=8483 | dpo, orpo, en, quality | mlabonne/orpo-dpo-mix-40k |
AI-ModelScope/pile | default | huge dataset | - | pretrain | EleutherAI/pile |
AI-ModelScope/ruozhiba | post-annual title-good title-norm |
85658 | 40.0±18.3, min=22, max=559 | pretrain, 🔥 | - |
AI-ModelScope/school_math_0.25M | default | 248481 | 158.8±73.4, min=39, max=980 | chat, math, quality | BelleGroup/school_math_0.25M |
AI-ModelScope/sharegpt_gpt4 | default V3_format zh_38K_format |
103329 | 3476.6±5959.0, min=33, max=115132 | chat, multilingual, general, multi-round, gpt4, 🔥 | - |
AI-ModelScope/sql-create-context | default | 78577 | 82.7±31.5, min=36, max=282 | chat, sql, 🔥 | b-mc2/sql-create-context |
AI-ModelScope/stack-exchange-paired | default | huge dataset | - | hfrl, dpo, pairwise | lvwerra/stack-exchange-paired |
AI-ModelScope/starcoderdata | default | huge dataset | - | pretrain, quality | bigcode/starcoderdata |
AI-ModelScope/synthetic_text_to_sql | default | 100000 | 221.8±69.9, min=64, max=616 | nl2sql, en | gretelai/synthetic_text_to_sql |
AI-ModelScope/texttosqlv2_25000_v2 | default | 25000 | 277.3±328.3, min=40, max=1971 | chat, sql | Clinton/texttosqlv2_25000_v2 |
AI-ModelScope/the-stack | default | huge dataset | - | pretrain, quality | bigcode/the-stack |
AI-ModelScope/tigerbot-law-plugin | default | 55895 | 104.9±51.0, min=43, max=1087 | text-generation, law, pretrained | TigerResearch/tigerbot-law-plugin |
AI-ModelScope/train_0.5M_CN | default | 519255 | 128.4±87.4, min=31, max=936 | common, zh, quality | BelleGroup/train_0.5M_CN |
AI-ModelScope/train_1M_CN | default | huge dataset | - | common, zh, quality | BelleGroup/train_1M_CN |
AI-ModelScope/train_2M_CN | default | huge dataset | - | common, zh, quality | BelleGroup/train_2M_CN |
AI-ModelScope/tulu-v2-sft-mixture | default | 326154 | 523.3±439.3, min=68, max=2549 | chat, multilingual, general, multi-round | allenai/tulu-v2-sft-mixture |
AI-ModelScope/ultrafeedback-binarized-preferences-cleaned-kto | default | 230720 | 471.5±274.3, min=27, max=2232 | rlhf, kto | - |
AI-ModelScope/webnovel_cn | default | 50000 | 1455.2±12489.4, min=524, max=490480 | chat, novel | zxbsmk/webnovel_cn |
AI-ModelScope/wikipedia-cn-20230720-filtered | default | huge dataset | - | pretrain, quality | pleisto/wikipedia-cn-20230720-filtered |
AI-ModelScope/zhihu_rlhf_3k | default | 3460 | 594.5±365.9, min=31, max=1716 | rlhf, dpo, zh | liyucheng/zhihu_rlhf_3k |
DAMO_NLP/jd | default | 45012 | 66.9±87.0, min=41, max=1699 | text-generation, classification, 🔥 | - |
- | default | huge dataset | - | pretrain, quality | HuggingFaceFW/fineweb |
- | auto_math_text khanacademy openstax stanford stories web_samples_v1 web_samples_v2 wikihow |
huge dataset | - | multi-domain, en, qa | HuggingFaceTB/cosmopedia |
OmniData/Zhihu-KOL | default | huge dataset | - | zhihu, qa | wangrui6/Zhihu-KOL |
OmniData/Zhihu-KOL-More-Than-100-Upvotes | default | 271261 | 1003.4±1826.1, min=28, max=52541 | zhihu, qa | bzb2023/Zhihu-KOL-More-Than-100-Upvotes |
TIGER-Lab/MATH-plus | train | 893929 | 301.4±196.7, min=50, max=1162 | qa, math, en, quality | TIGER-Lab/MATH-plus |
Tongyi-DataEngine/SA1B-Dense-Caption | default | huge dataset | - | zh, multi-modal, vqa | - |
Tongyi-DataEngine/SA1B-Paired-Captions-Images | default | 7736284 | 106.4±18.5, min=48, max=193 | zh, multi-modal, vqa | - |
YorickHe/CoT | default | 74771 | 141.6±45.5, min=58, max=410 | chat, general | - |
YorickHe/CoT_zh | default | 74771 | 129.1±53.2, min=51, max=401 | chat, general | - |
ZhipuAI/LongWriter-6k | default | 6000 | 5009.0±2932.8, min=117, max=30354 | long, chat, sft, 🔥 | THUDM/LongWriter-6k |
- | default | huge dataset | - | pretrain, quality | allenai/c4 |
- | default | huge dataset | - | pretrain, quality | cerebras/SlimPajama-627B |
codefuse-ai/CodeExercise-Python-27k | default | 27224 | 337.3±154.2, min=90, max=2826 | chat, coding, 🔥 | - |
codefuse-ai/Evol-instruction-66k | default | 66862 | 440.1±208.4, min=46, max=2661 | chat, coding, 🔥 | - |
damo/MSAgent-Bench | default mini |
638149 | 859.2±460.1, min=38, max=3479 | chat, agent, multi-round | - |
damo/nlp_polylm_multialpaca_sft | ar de es fr id ja ko pt ru th vi |
131867 | 101.6±42.5, min=30, max=1029 | chat, general, multilingual | - |
damo/zh_cls_fudan-news | default | 4959 | 3234.4±2547.5, min=91, max=19548 | chat, classification | - |
damo/zh_ner-JAVE | default | 1266 | 118.3±45.5, min=44, max=223 | chat, ner | - |
hjh0119/shareAI-Llama3-DPO-zh-en-emoji | zh en |
2449 | 334.0±162.8, min=36, max=1801 | rlhf, dpo | - |
huangjintao/AgentInstruct_copy | alfworld db kg mind2web os webshop |
1866 | 1144.3±635.5, min=206, max=6412 | chat, agent, multi-round | - |
iic/100PoisonMpts | default | 906 | 150.6±80.8, min=39, max=656 | poison-management, zh | - |
iic/MSAgent-MultiRole | default | 543 | 413.0±79.7, min=70, max=936 | chat, agent, multi-round, role-play, multi-agent | - |
iic/MSAgent-Pro | default | 21910 | 1978.1±747.9, min=339, max=8064 | chat, agent, multi-round, 🔥 | - |
iic/ms_agent | default | 30000 | 645.8±218.0, min=199, max=2070 | chat, agent, multi-round, 🔥 | - |
iic/ms_bench | default | 316820 | 353.4±424.5, min=29, max=2924 | chat, general, multi-round, 🔥 | - |
- | default | huge dataset | - | multi-modal, en, vqa, quality | lmms-lab/GQA |
- | 0_30_s_academic_v0_1 0_30_s_youtube_v0_1 1_2_m_academic_v0_1 1_2_m_youtube_v0_1 2_3_m_academic_v0_1 2_3_m_youtube_v0_1 30_60_s_academic_v0_1 30_60_s_youtube_v0_1 |
1335486 | 273.7±78.8, min=107, max=638 | chat, multi-modal, video | lmms-lab/LLaVA-Video-178K |
lvjianjin/AdvertiseGen | default | 97484 | 130.9±21.9, min=73, max=232 | text-generation, 🔥 | shibing624/AdvertiseGen |
mapjack/openwebtext_dataset | default | huge dataset | - | pretrain, zh, quality | - |
modelscope/DuReader_robust-QG | default | 17899 | 242.0±143.1, min=75, max=1416 | text-generation, 🔥 | - |
modelscope/chinese-poetry-collection | default | 1710 | 58.1±8.1, min=31, max=71 | text-generation, poetry | - |
modelscope/clue | cmnli | 391783 | 81.6±16.0, min=54, max=157 | text-generation, classification | clue |
modelscope/coco_2014_caption | train validation |
454617 | 389.6±68.4, min=70, max=587 | chat, multi-modal, vision, 🔥 | - |
shenweizhou/alpha-umi-toolbench-processed-v2 | backbone caller planner summarizer |
huge dataset | - | chat, agent, 🔥 | - |
simpleai/HC3 | finance medicine |
11021 | 296.0±153.3, min=65, max=2267 | text-generation, classification, 🔥 | Hello-SimpleAI/HC3 |
simpleai/HC3-Chinese | baike baike_cls open_qa open_qa_cls nlpcc_dbqa nlpcc_dbqa_cls finance finance_cls medicine medicine_cls law law_cls psychology psychology_cls |
39781 | 179.9±70.2, min=90, max=1070 | text-generation, classification, 🔥 | Hello-SimpleAI/HC3-Chinese |
speech_asr/speech_asr_aishell1_trainsets | train validation test |
141600 | 40.8±3.3, min=33, max=53 | chat, multi-modal, audio | - |
swift/A-OKVQA | default | 18201 | 43.5±7.9, min=27, max=94 | multi-modal, en, vqa, quality | HuggingFaceM4/A-OKVQA |
swift/ChartQA | default | 28299 | 36.8±6.5, min=26, max=74 | en, vqa, quality | HuggingFaceM4/ChartQA |
swift/GRIT | caption grounding vqa |
huge dataset | - | multi-modal, en, caption-grounding, vqa, quality | zzliang/GRIT |
swift/GenQA | default | huge dataset | - | qa, quality, multi-task | tomg-group-umd/GenQA |
swift/Infinity-Instruct | default | huge dataset | - | qa, quality, multi-task | BAAI/Infinity-Instruct |
swift/Mantis-Instruct | birds-to-words chartqa coinstruct contrastive_caption docvqa dreamsim dvqa iconqa imagecode llava_665k_multi lrv_multi multi_vqa nextqa nlvr2 spot-the-diff star visual_story_telling |
988115 | 619.9±156.6, min=243, max=1926 | chat, multi-modal, vision | - |
swift/MideficsDataset | default | 3800 | 201.3±70.2, min=60, max=454 | medical, en, vqa | WinterSchool/MideficsDataset |
swift/Multimodal-Mind2Web | default | 1009 | 293855.4±331149.5, min=11301, max=3577519 | agent, multi-modal | osunlp/Multimodal-Mind2Web |
swift/OCR-VQA | default | 186753 | 32.3±5.8, min=27, max=80 | multi-modal, en, ocr-vqa | howard-hou/OCR-VQA |
swift/OK-VQA_train | default | 9009 | 31.7±3.4, min=25, max=56 | multi-modal, en, vqa, quality | Multimodal-Fatima/OK-VQA_train |
swift/OpenHermes-2.5 | default | huge dataset | - | cot, en, quality | teknium/OpenHermes-2.5 |
swift/RLAIF-V-Dataset | default | 83132 | 99.6±54.8, min=30, max=362 | rlhf, dpo, multi-modal, en | openbmb/RLAIF-V-Dataset |
swift/RedPajama-Data-1T | default | huge dataset | - | pretrain, quality | togethercomputer/RedPajama-Data-1T |
swift/RedPajama-Data-V2 | default | huge dataset | - | pretrain, quality | togethercomputer/RedPajama-Data-V2 |
swift/ScienceQA | default | 16967 | 101.7±55.8, min=32, max=620 | multi-modal, science, vqa, quality | derek-thomas/ScienceQA |
swift/SlimOrca | default | 517982 | 405.5±442.1, min=47, max=8312 | quality, en | Open-Orca/SlimOrca |
swift/TextCaps | default | huge dataset | - | multi-modal, en, caption, quality | HuggingFaceM4/TextCaps |
swift/ToolBench | default | 124345 | 2251.7±1039.8, min=641, max=9451 | chat, agent, multi-round | - |
swift/VQAv2 | default | huge dataset | - | en, vqa, quality | HuggingFaceM4/VQAv2 |
swift/VideoChatGPT | Generic Temporal Consistency |
3206 | 87.4±48.3, min=31, max=398 | chat, multi-modal, video, 🔥 | lmms-lab/VideoChatGPT |
swift/WebInstructSub | default | huge dataset | - | qa, en, math, quality, multi-domain, science | TIGER-Lab/WebInstructSub |
swift/aya_collection | aya_dataset | 202364 | 474.6±1539.1, min=25, max=71312 | multi-lingual, qa | CohereForAI/aya_collection |
swift/chinese-c4 | default | huge dataset | - | pretrain, zh, quality | shjwudp/chinese-c4 |
swift/cinepile | default | huge dataset | - | vqa, en, youtube, video | tomg-group-umd/cinepile |
swift/classical_chinese_translate | default | 6655 | 349.3±77.1, min=61, max=815 | chat, play-ground | - |
swift/cosmopedia-100k | default | 100000 | 1037.0±254.8, min=339, max=2818 | multi-domain, en, qa | HuggingFaceTB/cosmopedia-100k |
swift/dolma | v1_7 | huge dataset | - | pretrain, quality | allenai/dolma |
swift/dolphin | flan1m-alpaca-uncensored flan5m-alpaca-uncensored |
huge dataset | - | en | cognitivecomputations/dolphin |
swift/github-code | default | huge dataset | - | pretrain, quality | codeparrot/github-code |
swift/gpt4v-dataset | default | huge dataset | - | en, caption, multi-modal, quality | laion/gpt4v-dataset |
swift/llava-data | llava_instruct | 624255 | 369.7±143.0, min=40, max=905 | sft, multi-modal, quality | TIGER-Lab/llava-data |
swift/llava-instruct-mix-vsft | default | 13640 | 178.8±119.8, min=34, max=951 | multi-modal, en, vqa, quality | HuggingFaceH4/llava-instruct-mix-vsft |
swift/llava-med-zh-instruct-60k | default | 56649 | 207.9±67.7, min=42, max=594 | zh, medical, vqa, multi-modal | BUAADreamer/llava-med-zh-instruct-60k |
swift/lnqa | default | huge dataset | - | multi-modal, en, ocr-vqa, quality | vikhyatk/lnqa |
swift/longwriter-6k-filtered | default | 666 | 4108.9±2636.9, min=1190, max=17050 | long, chat, sft, 🔥 | - |
swift/medical_zh | en zh |
2068589 | 256.4±87.3, min=39, max=1167 | chat, medical | - |
swift/moondream2-coyo-5M-captions | default | huge dataset | - | caption, pretrain, quality | isidentical/moondream2-coyo-5M-captions |
swift/no_robots | default | 9485 | 300.0±246.2, min=40, max=6739 | multi-task, quality, human-annotated | HuggingFaceH4/no_robots |
swift/orca_dpo_pairs | default | 12859 | 364.9±248.2, min=36, max=2010 | rlhf, quality | Intel/orca_dpo_pairs |
swift/path-vqa | default | 19654 | 34.2±6.8, min=28, max=85 | multi-modal, vqa, medical | flaviagiammarino/path-vqa |
swift/pile-val-backup | default | 214661 | 1831.4±11087.5, min=21, max=516620 | text-generation, awq | mit-han-lab/pile-val-backup |
swift/pixelprose | default | huge dataset | - | caption, multi-modal, vision | tomg-group-umd/pixelprose |
swift/refcoco | caption grounding |
92430 | 45.4±3.0, min=37, max=63 | multi-modal, en, grounding | jxu124/refcoco |
swift/refcocog | caption grounding |
89598 | 50.3±4.6, min=39, max=91 | multi-modal, en, grounding | jxu124/refcocog |
swift/self-cognition | default | 108 | 58.9±20.3, min=32, max=131 | chat, self-cognition, 🔥 | modelscope/self-cognition |
swift/sharegpt | common-zh unknow-zh common-en |
194063 | 820.5±366.1, min=25, max=2221 | chat, general, multi-round | - |
swift/swift-sft-mixture | sharegpt firefly codefuse metamathqa |
huge dataset | - | chat, sft, general, 🔥 | - |
swift/tagengo-gpt4 | default | 76437 | 468.1±276.8, min=28, max=1726 | chat, multi-lingual, quality | lightblue/tagengo-gpt4 |
swift/train_3.5M_CN | default | huge dataset | - | common, zh, quality | BelleGroup/train_3.5M_CN |
swift/ultrachat_200k | default | 207843 | 1188.0±571.1, min=170, max=4068 | chat, en, quality | HuggingFaceH4/ultrachat_200k |
swift/wikipedia | default | huge dataset | - | pretrain, quality | wikipedia |
- | default | huge dataset | - | pretrain, quality | tiiuae/falcon-refinedweb |
wyj123456/GPT4all | default | 806199 | 97.3±20.9, min=62, max=414 | chat, general | - |
wyj123456/code_alpaca_en | default | 20022 | 99.3±57.6, min=30, max=857 | chat, coding | sahil2801/CodeAlpaca-20k |
wyj123456/finance_en | default | 68912 | 264.5±207.1, min=30, max=2268 | chat, financial | ssbuild/alpaca_finance_en |
wyj123456/instinwild | default subset |
103695 | 125.1±43.7, min=35, max=801 | chat, general | - |
wyj123456/instruct | default | 888970 | 271.0±333.6, min=34, max=3967 | chat, general | - |