modelscope · Jintao-Huang · Aug 6, 2024 · Jul 31, 2024 · Jul 31, 2024 · Aug 5, 2024
diff --git a/README.md b/README.md
@@ -55,6 +55,7 @@ You can contact us and communicate with us by adding our group:
 <img src="asset/discord_qr.jpg" width="200" height="200">  |  <img src="asset/wechat.png" width="200" height="200">
 
 ## 🎉 News
+- 2024.07.31: Supports internlm2.5 series of 1.8b and 20b. Experience it using swift infer --model_type internlm2_5-1_8b-chat.
 - 🔥2024.07.29: Support the use of lmdeploy for inference acceleration of LLM and VLM models. Documentation can be found [here](docs/source_en/Multi-Modal/LmDeploy-inference-acceleration.md).
 - 🔥2024.07.24: Support DPO/ORPO/SimPO/CPO alignment algorithm for vision MLLM, training scripts can be find in [Document](docs/source_en/Multi-Modal/human-preference-alignment-training-documentation.md). support RLAIF-V dataset.
 - 🔥2024.07.24: Support using Megatron for CPT and SFT on the Qwen2 series. You can refer to the [Megatron training documentation](docs/source_en/LLM/Megatron-training.md).
@@ -73,6 +74,8 @@ You can contact us and communicate with us by adding our group:
 - 2024.07.02: Support for using vLLM for accelerating inference and deployment of multimodal large models such as the llava series and phi3-vision models. You can refer to the [Multimodal & vLLM Inference Acceleration Documentation](docs/source_en/Multi-Modal/vllm-inference-acceleration.md) for more information.
 - 2024.07.02: Support for `llava1_6-vicuna-7b-instruct`, `llava1_6-vicuna-13b-instruct` and other llava-hf models. For best practices, refer to [here](docs/source_en/Multi-Modal/llava-best-practice.md).
 - 🔥2024.06.29: Support [eval-scope](https://github.com/modelscope/eval-scope)&[open-compass](https://github.com/open-compass/opencompass) for evaluation! Now we have supported over 50 eval datasets like `BoolQ, ocnli, humaneval, math, ceval, mmlu, gsk8k, ARC_e`, please check our [Eval Doc](https://github.com/modelscope/swift/blob/main/docs/source_en/LLM/LLM-eval.md) to begin! Next sprint we will support Multi-modal and Agent evaluation, remember to follow us : )
+<details><summary>More</summary>
+
 - 🔥2024.06.28: Support for **Florence** series model! See [document](docs/source_en/Multi-Modal/florence-best-pratice.md)
 - 🔥2024.06.28: Support for Gemma2 series models: gemma2-9b, gemma2-9b-instruct, gemma2-27b, gemma2-27b-instruct.
 - 🔥2024.06.18: Supports **DeepSeek-Coder-v2** series model! Use model_type `deepseek-coder-v2-instruct` and `deepseek-coder-v2-lite-instruct` to begin.
@@ -83,8 +86,6 @@ You can contact us and communicate with us by adding our group:
 - 🔥2024.06.01: Supports **SimPO** training! See [document](https://github.com/modelscope/swift/blob/main/docs/source_en/LLM/SimPO.md) to start training!
 - 🔥2024.06.01: Support for deploying large multimodal models, please refer to the [Multimodal Deployment Documentation](docs/source_en/Multi-Modal/mutlimodal-deployment.md) for more information.
 - 2024.05.31: Supports Mini-Internvl model, Use model_type `mini-internvl-chat-2b-v1_5` and `mini-internvl-chat-4b-v1_5`to train.
-<details><summary>More</summary>
-
 - 2024.05.24: Supports Phi3-vision model, Use model_type `phi3-vision-128k-instruct` to train.
 - 2024.05.22: Supports DeepSeek-V2-Lite series models, model_type are `deepseek-v2-lite` and `deepseek-v2-lite-chat`
 - 2024.05.22: Supports TeleChat-12B-v2 model with quantized version, model_type are `telechat-12b-v2` and `telechat-12b-v2-gptq-int4`

diff --git a/README_CN.md b/README_CN.md
@@ -56,6 +56,7 @@ SWIFT具有丰富全面的文档，请查看我们的文档网站:
 
 
 ## 🎉 新闻
+- 2024.07.31: 支持internlm2.5的1.8b和20b系列. 使用`swift infer --model_type internlm2_5-1_8b-chat`进行体验.
 - 🔥2024.07.29: 支持使用lmdeploy对LLM和VLM模型进行推理加速. 文档可以查看[这里](docs/source/Multi-Modal/LmDeploy推理加速文档.md).
 - 🔥2024.07.24: 人类偏好对齐算法支持视觉多模态大模型, 包括DPO/ORPO/SimPO/CPO, 训练参考[文档](docs/source/Multi-Modal/人类偏好对齐训练文档.md). 支持数据集RLAIF-V.
 - 🔥2024.07.24: 支持使用megatron对qwen2系列进行CPT和SFT. 可以查看[megatron训练文档](docs/source/LLM/Megatron训练文档.md).
@@ -74,6 +75,8 @@ SWIFT具有丰富全面的文档，请查看我们的文档网站:
 - 2024.07.02: 支持使用vllm对多模态大模型: llava系列, phi3-vision模型进行推理加速和部署. 可以查看[多模态&vLLM推理加速文档](docs/source/Multi-Modal/vLLM推理加速文档.md)获取更多信息.
 - 2024.07.02: 支持`llava1_6-vicuna-7b-instruct`, `llava1_6-vicuna-13b-instruct`等llava-hf模型. 最佳实践可以查看[这里](docs/source/Multi-Modal/llava最佳实践.md).
 - 🔥2024.06.29: 支持[eval-scope](https://github.com/modelscope/eval-scope)&[open-compass](https://github.com/open-compass/opencompass)评测! 我们支持了包含`BoolQ, ocnli, humaneval, math, ceval, mmlu, gsk8k, ARC_e`等50+标准数据集在内的评测流程, 请查看我们的[评测文档](https://github.com/modelscope/swift/blob/main/docs/source/LLM/LLM评测文档.md)来使用。下个迭代我们会支持多模态评测和Agent评测，记得持续关注我们: )
+<details><summary>More</summary>
+
 - 🔥2024.06.28: 支持**Florence**系列模型: 可以查看[Florence最佳实践](docs/source/Multi-Modal/florence最佳实践.md).
 - 🔥2024.06.28: 支持**Gemma2**系列模型: gemma2-9b, gemma2-9b-instruct, gemma2-27b, gemma2-27b-instruct.
 - 🔥2024.06.18: 支持**DeepSeek-Coder-v2**系列模型! 使用model_type`deepseek-coder-v2-instruct`和`deepseek-coder-v2-lite-instruct`来开启训练和推理.
@@ -84,8 +87,6 @@ SWIFT具有丰富全面的文档，请查看我们的文档网站:
 - 🔥2024.06.01: 支持**SimPO**训练，使用`swift simpo`来开始训练，最佳实践可以查看[这里](https://github.com/modelscope/swift/tree/main/docs/source/LLM/SimPO算法最佳实践.md)
 - 🔥2024.06.01: 支持多模态大模型部署, 可以查看[多模态部署文档](docs/source/Multi-Modal/MLLM部署文档.md).
 - 2024.05.31: 支持Mini-Internvl多模态模型, 使用model_type `mini-internvl-chat-2b-v1_5`和`mini-internvl-chat-4b-v1_5`来训练.
-<details><summary>更多</summary>
-
 - 2024.05.24: 支持Phi3多模态模型, 使用model_type `phi3-vision-128k-instruct`来训练.
 - 2024.05.22: 支持DeepSeek-V2-lite系列模型, model_type为 `deepseek-v2-lite`和`deekseek-v2-lite-chat`
 - 2024.05.22: 支持TeleChat-12b-v2模型和量化版本, model_type为 `telechat-12b-v2`和`telechat-12b-v2-gptq-int4`

diff --git a/docs/source/LLM/支持的模型和数据集.md b/docs/source/LLM/支持的模型和数据集.md
@@ -203,9 +203,13 @@
 |internlm2-20b|[Shanghai_AI_Laboratory/internlm2-20b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-20b/summary)|wqkv|default-generation|&#x2714;|&#x2714;|&#x2714;|transformers>=4.38|-|[internlm/internlm2-20b](https://huggingface.co/internlm/internlm2-20b)|
 |internlm2-20b-sft-chat|[Shanghai_AI_Laboratory/internlm2-chat-20b-sft](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-chat-20b-sft/summary)|wqkv|internlm2|&#x2714;|&#x2714;|&#x2714;|transformers>=4.38|-|[internlm/internlm2-chat-20b-sft](https://huggingface.co/internlm/internlm2-chat-20b-sft)|
 |internlm2-20b-chat|[Shanghai_AI_Laboratory/internlm2-chat-20b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-chat-20b/summary)|wqkv|internlm2|&#x2714;|&#x2714;|&#x2714;|transformers>=4.38|-|[internlm/internlm2-chat-20b](https://huggingface.co/internlm/internlm2-chat-20b)|
+|internlm2_5-1_8b|[Shanghai_AI_Laboratory/internlm2_5-1_8b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2_5-1_8b/summary)|wqkv|default-generation|&#x2714;|&#x2714;|&#x2714;|transformers>=4.38|-|[internlm/internlm2_5-1_8b](https://huggingface.co/internlm/internlm2_5-1_8b)|
+|internlm2_5-1_8b-chat|[Shanghai_AI_Laboratory/internlm2_5-1_8b-chat](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2_5-1_8b-chat/summary)|wqkv|internlm2|&#x2714;|&#x2714;|&#x2714;|transformers>=4.38|-|[internlm/internlm2_5-1_8b-chat](https://huggingface.co/internlm/internlm2_5-1_8b-chat)|
 |internlm2_5-7b|[Shanghai_AI_Laboratory/internlm2_5-7b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2_5-7b/summary)|wqkv|default-generation|&#x2714;|&#x2714;|&#x2714;|transformers>=4.38|-|[internlm/internlm2_5-7b](https://huggingface.co/internlm/internlm2_5-7b)|
 |internlm2_5-7b-chat|[Shanghai_AI_Laboratory/internlm2_5-7b-chat](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2_5-7b-chat/summary)|wqkv|internlm2|&#x2714;|&#x2714;|&#x2714;|transformers>=4.38|-|[internlm/internlm2_5-7b-chat](https://huggingface.co/internlm/internlm2_5-7b-chat)|
 |internlm2_5-7b-chat-1m|[Shanghai_AI_Laboratory/internlm2_5-7b-chat-1m](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2_5-7b-chat-1m/summary)|wqkv|internlm2|&#x2714;|&#x2714;|&#x2714;|transformers>=4.38|-|[internlm/internlm2_5-7b-chat-1m](https://huggingface.co/internlm/internlm2_5-7b-chat-1m)|
+|internlm2_5-20b|[Shanghai_AI_Laboratory/internlm2_5-20b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2_5-20b/summary)|wqkv|default-generation|&#x2714;|&#x2714;|&#x2714;|transformers>=4.38|-|[internlm/internlm2_5-20b](https://huggingface.co/internlm/internlm2_5-20b)|
+|internlm2_5-20b-chat|[Shanghai_AI_Laboratory/internlm2_5-20b-chat](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2_5-20b-chat/summary)|wqkv|internlm2|&#x2714;|&#x2714;|&#x2714;|transformers>=4.38|-|[internlm/internlm2_5-20b-chat](https://huggingface.co/internlm/internlm2_5-20b-chat)|
 |internlm2-math-7b|[Shanghai_AI_Laboratory/internlm2-math-base-7b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-math-base-7b/summary)|wqkv|default-generation|&#x2714;|&#x2714;|&#x2714;|transformers>=4.38|math|[internlm/internlm2-math-base-7b](https://huggingface.co/internlm/internlm2-math-base-7b)|
 |internlm2-math-7b-chat|[Shanghai_AI_Laboratory/internlm2-math-7b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-math-7b/summary)|wqkv|internlm2|&#x2714;|&#x2714;|&#x2714;|transformers>=4.38|math|[internlm/internlm2-math-7b](https://huggingface.co/internlm/internlm2-math-7b)|
 |internlm2-math-20b|[Shanghai_AI_Laboratory/internlm2-math-base-20b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-math-base-20b/summary)|wqkv|default-generation|&#x2714;|&#x2714;|&#x2714;|transformers>=4.38|math|[internlm/internlm2-math-base-20b](https://huggingface.co/internlm/internlm2-math-base-20b)|

diff --git a/docs/source_en/LLM/Supported-models-datasets.md b/docs/source_en/LLM/Supported-models-datasets.md
@@ -203,9 +203,13 @@ The table below introcudes all models supported by SWIFT:
 |internlm2-20b|[Shanghai_AI_Laboratory/internlm2-20b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-20b/summary)|wqkv|default-generation|&#x2714;|&#x2714;|&#x2714;|transformers>=4.38|-|[internlm/internlm2-20b](https://huggingface.co/internlm/internlm2-20b)|
 |internlm2-20b-sft-chat|[Shanghai_AI_Laboratory/internlm2-chat-20b-sft](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-chat-20b-sft/summary)|wqkv|internlm2|&#x2714;|&#x2714;|&#x2714;|transformers>=4.38|-|[internlm/internlm2-chat-20b-sft](https://huggingface.co/internlm/internlm2-chat-20b-sft)|
 |internlm2-20b-chat|[Shanghai_AI_Laboratory/internlm2-chat-20b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-chat-20b/summary)|wqkv|internlm2|&#x2714;|&#x2714;|&#x2714;|transformers>=4.38|-|[internlm/internlm2-chat-20b](https://huggingface.co/internlm/internlm2-chat-20b)|
+|internlm2_5-1_8b|[Shanghai_AI_Laboratory/internlm2_5-1_8b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2_5-1_8b/summary)|wqkv|default-generation|&#x2714;|&#x2714;|&#x2714;|transformers>=4.38|-|[internlm/internlm2_5-1_8b](https://huggingface.co/internlm/internlm2_5-1_8b)|
+|internlm2_5-1_8b-chat|[Shanghai_AI_Laboratory/internlm2_5-1_8b-chat](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2_5-1_8b-chat/summary)|wqkv|internlm2|&#x2714;|&#x2714;|&#x2714;|transformers>=4.38|-|[internlm/internlm2_5-1_8b-chat](https://huggingface.co/internlm/internlm2_5-1_8b-chat)|
 |internlm2_5-7b|[Shanghai_AI_Laboratory/internlm2_5-7b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2_5-7b/summary)|wqkv|default-generation|&#x2714;|&#x2714;|&#x2714;|transformers>=4.38|-|[internlm/internlm2_5-7b](https://huggingface.co/internlm/internlm2_5-7b)|
 |internlm2_5-7b-chat|[Shanghai_AI_Laboratory/internlm2_5-7b-chat](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2_5-7b-chat/summary)|wqkv|internlm2|&#x2714;|&#x2714;|&#x2714;|transformers>=4.38|-|[internlm/internlm2_5-7b-chat](https://huggingface.co/internlm/internlm2_5-7b-chat)|
 |internlm2_5-7b-chat-1m|[Shanghai_AI_Laboratory/internlm2_5-7b-chat-1m](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2_5-7b-chat-1m/summary)|wqkv|internlm2|&#x2714;|&#x2714;|&#x2714;|transformers>=4.38|-|[internlm/internlm2_5-7b-chat-1m](https://huggingface.co/internlm/internlm2_5-7b-chat-1m)|
+|internlm2_5-20b|[Shanghai_AI_Laboratory/internlm2_5-20b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2_5-20b/summary)|wqkv|default-generation|&#x2714;|&#x2714;|&#x2714;|transformers>=4.38|-|[internlm/internlm2_5-20b](https://huggingface.co/internlm/internlm2_5-20b)|
+|internlm2_5-20b-chat|[Shanghai_AI_Laboratory/internlm2_5-20b-chat](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2_5-20b-chat/summary)|wqkv|internlm2|&#x2714;|&#x2714;|&#x2714;|transformers>=4.38|-|[internlm/internlm2_5-20b-chat](https://huggingface.co/internlm/internlm2_5-20b-chat)|
 |internlm2-math-7b|[Shanghai_AI_Laboratory/internlm2-math-base-7b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-math-base-7b/summary)|wqkv|default-generation|&#x2714;|&#x2714;|&#x2714;|transformers>=4.38|math|[internlm/internlm2-math-base-7b](https://huggingface.co/internlm/internlm2-math-base-7b)|
 |internlm2-math-7b-chat|[Shanghai_AI_Laboratory/internlm2-math-7b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-math-7b/summary)|wqkv|internlm2|&#x2714;|&#x2714;|&#x2714;|transformers>=4.38|math|[internlm/internlm2-math-7b](https://huggingface.co/internlm/internlm2-math-7b)|
 |internlm2-math-20b|[Shanghai_AI_Laboratory/internlm2-math-base-20b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-math-base-20b/summary)|wqkv|default-generation|&#x2714;|&#x2714;|&#x2714;|transformers>=4.38|math|[internlm/internlm2-math-base-20b](https://huggingface.co/internlm/internlm2-math-base-20b)|

diff --git a/swift/llm/utils/model.py b/swift/llm/utils/model.py
@@ -271,9 +271,13 @@ class ModelType:
     internlm2_20b_sft_chat = 'internlm2-20b-sft-chat'
     internlm2_20b_chat = 'internlm2-20b-chat'
     # internlm2.5
+    internlm2_5_1_8b = 'internlm2_5-1_8b'
+    internlm2_5_1_8b_chat = 'internlm2_5-1_8b-chat'
     internlm2_5_7b = 'internlm2_5-7b'
     internlm2_5_7b_chat = 'internlm2_5-7b-chat'
     internlm2_5_7b_chat_1m = 'internlm2_5-7b-chat-1m'
+    internlm2_5_20b = 'internlm2_5-20b'
+    internlm2_5_20b_chat = 'internlm2_5-20b-chat'
     # internlm2-math
     internlm2_math_7b = 'internlm2-math-7b'
     internlm2_math_7b_chat = 'internlm2-math-7b-chat'
@@ -3498,6 +3502,27 @@ def get_model_tokenizer_qwen2_intx(model_dir: str,
     return get_model_tokenizer_qwen_intx(model_dir, torch_dtype, model_kwargs, load_model, **kwargs)
 
 
+@register_model(
+    ModelType.internlm2_5_1_8b,
+    'Shanghai_AI_Laboratory/internlm2_5-1_8b',
+    LoRATM.internlm2,
+    TemplateType.default_generation,
+    requires=['transformers>=4.38'],
+    support_flash_attn=True,
+    support_vllm=True,
+    support_lmdeploy=True,
+    hf_model_id='internlm/internlm2_5-1_8b')
+@register_model(
+    ModelType.internlm2_5_1_8b_chat,
+    'Shanghai_AI_Laboratory/internlm2_5-1_8b-chat',
+    LoRATM.internlm2,
+    TemplateType.internlm2,
+    eos_token='<|im_end|>',
+    requires=['transformers>=4.38'],
+    support_flash_attn=True,
+    support_vllm=True,
+    support_lmdeploy=True,
+    hf_model_id='internlm/internlm2_5-1_8b-chat')
 @register_model(
     ModelType.internlm2_5_7b,
     'Shanghai_AI_Laboratory/internlm2_5-7b',
@@ -3530,6 +3555,27 @@ def get_model_tokenizer_qwen2_intx(model_dir: str,
     support_vllm=True,
     support_lmdeploy=True,
     hf_model_id='internlm/internlm2_5-7b-chat-1m')
+@register_model(
+    ModelType.internlm2_5_20b,
+    'Shanghai_AI_Laboratory/internlm2_5-20b',
+    LoRATM.internlm2,
+    TemplateType.default_generation,
+    requires=['transformers>=4.38'],
+    support_flash_attn=True,
+    support_vllm=True,
+    support_lmdeploy=True,
+    hf_model_id='internlm/internlm2_5-20b')
+@register_model(
+    ModelType.internlm2_5_20b_chat,
+    'Shanghai_AI_Laboratory/internlm2_5-20b-chat',
+    LoRATM.internlm2,
+    TemplateType.internlm2,
+    eos_token='<|im_end|>',
+    requires=['transformers>=4.38'],
+    support_flash_attn=True,
+    support_vllm=True,
+    support_lmdeploy=True,
+    hf_model_id='internlm/internlm2_5-20b-chat')
 @register_model(
     ModelType.internlm2_1_8b,
     'Shanghai_AI_Laboratory/internlm2-1_8b',