Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support internlm2.5 1.8b 20b #1551

Merged
merged 6 commits into from
Aug 6, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ You can contact us and communicate with us by adding our group:
<img src="asset/discord_qr.jpg" width="200" height="200"> | <img src="asset/wechat.png" width="200" height="200">

## 🎉 News
- 2024.07.31: Supports internlm2.5 series of 1.8b and 20b. Experience it using swift infer --model_type internlm2_5-1_8b-chat.
- 🔥2024.07.29: Support the use of lmdeploy for inference acceleration of LLM and VLM models. Documentation can be found [here](docs/source_en/Multi-Modal/LmDeploy-inference-acceleration.md).
- 🔥2024.07.24: Support DPO/ORPO/SimPO/CPO alignment algorithm for vision MLLM, training scripts can be find in [Document](docs/source_en/Multi-Modal/human-preference-alignment-training-documentation.md). support RLAIF-V dataset.
- 🔥2024.07.24: Support using Megatron for CPT and SFT on the Qwen2 series. You can refer to the [Megatron training documentation](docs/source_en/LLM/Megatron-training.md).
Expand All @@ -73,6 +74,8 @@ You can contact us and communicate with us by adding our group:
- 2024.07.02: Support for using vLLM for accelerating inference and deployment of multimodal large models such as the llava series and phi3-vision models. You can refer to the [Multimodal & vLLM Inference Acceleration Documentation](docs/source_en/Multi-Modal/vllm-inference-acceleration.md) for more information.
- 2024.07.02: Support for `llava1_6-vicuna-7b-instruct`, `llava1_6-vicuna-13b-instruct` and other llava-hf models. For best practices, refer to [here](docs/source_en/Multi-Modal/llava-best-practice.md).
- 🔥2024.06.29: Support [eval-scope](https://github.com/modelscope/eval-scope)&[open-compass](https://github.com/open-compass/opencompass) for evaluation! Now we have supported over 50 eval datasets like `BoolQ, ocnli, humaneval, math, ceval, mmlu, gsk8k, ARC_e`, please check our [Eval Doc](https://github.com/modelscope/swift/blob/main/docs/source_en/LLM/LLM-eval.md) to begin! Next sprint we will support Multi-modal and Agent evaluation, remember to follow us : )
<details><summary>More</summary>

- 🔥2024.06.28: Support for **Florence** series model! See [document](docs/source_en/Multi-Modal/florence-best-pratice.md)
- 🔥2024.06.28: Support for Gemma2 series models: gemma2-9b, gemma2-9b-instruct, gemma2-27b, gemma2-27b-instruct.
- 🔥2024.06.18: Supports **DeepSeek-Coder-v2** series model! Use model_type `deepseek-coder-v2-instruct` and `deepseek-coder-v2-lite-instruct` to begin.
Expand All @@ -83,8 +86,6 @@ You can contact us and communicate with us by adding our group:
- 🔥2024.06.01: Supports **SimPO** training! See [document](https://github.com/modelscope/swift/blob/main/docs/source_en/LLM/SimPO.md) to start training!
- 🔥2024.06.01: Support for deploying large multimodal models, please refer to the [Multimodal Deployment Documentation](docs/source_en/Multi-Modal/mutlimodal-deployment.md) for more information.
- 2024.05.31: Supports Mini-Internvl model, Use model_type `mini-internvl-chat-2b-v1_5` and `mini-internvl-chat-4b-v1_5`to train.
<details><summary>More</summary>

- 2024.05.24: Supports Phi3-vision model, Use model_type `phi3-vision-128k-instruct` to train.
- 2024.05.22: Supports DeepSeek-V2-Lite series models, model_type are `deepseek-v2-lite` and `deepseek-v2-lite-chat`
- 2024.05.22: Supports TeleChat-12B-v2 model with quantized version, model_type are `telechat-12b-v2` and `telechat-12b-v2-gptq-int4`
Expand Down
5 changes: 3 additions & 2 deletions README_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,7 @@ SWIFT具有丰富全面的文档,请查看我们的文档网站:


## 🎉 新闻
- 2024.07.31: 支持internlm2.5的1.8b和20b系列. 使用`swift infer --model_type internlm2_5-1_8b-chat`进行体验.
- 🔥2024.07.29: 支持使用lmdeploy对LLM和VLM模型进行推理加速. 文档可以查看[这里](docs/source/Multi-Modal/LmDeploy推理加速文档.md).
- 🔥2024.07.24: 人类偏好对齐算法支持视觉多模态大模型, 包括DPO/ORPO/SimPO/CPO, 训练参考[文档](docs/source/Multi-Modal/人类偏好对齐训练文档.md). 支持数据集RLAIF-V.
- 🔥2024.07.24: 支持使用megatron对qwen2系列进行CPT和SFT. 可以查看[megatron训练文档](docs/source/LLM/Megatron训练文档.md).
Expand All @@ -74,6 +75,8 @@ SWIFT具有丰富全面的文档,请查看我们的文档网站:
- 2024.07.02: 支持使用vllm对多模态大模型: llava系列, phi3-vision模型进行推理加速和部署. 可以查看[多模态&vLLM推理加速文档](docs/source/Multi-Modal/vLLM推理加速文档.md)获取更多信息.
- 2024.07.02: 支持`llava1_6-vicuna-7b-instruct`, `llava1_6-vicuna-13b-instruct`等llava-hf模型. 最佳实践可以查看[这里](docs/source/Multi-Modal/llava最佳实践.md).
- 🔥2024.06.29: 支持[eval-scope](https://github.com/modelscope/eval-scope)&[open-compass](https://github.com/open-compass/opencompass)评测! 我们支持了包含`BoolQ, ocnli, humaneval, math, ceval, mmlu, gsk8k, ARC_e`等50+标准数据集在内的评测流程, 请查看我们的[评测文档](https://github.com/modelscope/swift/blob/main/docs/source/LLM/LLM评测文档.md)来使用。下个迭代我们会支持多模态评测和Agent评测,记得持续关注我们: )
<details><summary>More</summary>

- 🔥2024.06.28: 支持**Florence**系列模型: 可以查看[Florence最佳实践](docs/source/Multi-Modal/florence最佳实践.md).
- 🔥2024.06.28: 支持**Gemma2**系列模型: gemma2-9b, gemma2-9b-instruct, gemma2-27b, gemma2-27b-instruct.
- 🔥2024.06.18: 支持**DeepSeek-Coder-v2**系列模型! 使用model_type`deepseek-coder-v2-instruct`和`deepseek-coder-v2-lite-instruct`来开启训练和推理.
Expand All @@ -84,8 +87,6 @@ SWIFT具有丰富全面的文档,请查看我们的文档网站:
- 🔥2024.06.01: 支持**SimPO**训练,使用`swift simpo`来开始训练,最佳实践可以查看[这里](https://github.com/modelscope/swift/tree/main/docs/source/LLM/SimPO算法最佳实践.md)
- 🔥2024.06.01: 支持多模态大模型部署, 可以查看[多模态部署文档](docs/source/Multi-Modal/MLLM部署文档.md).
- 2024.05.31: 支持Mini-Internvl多模态模型, 使用model_type `mini-internvl-chat-2b-v1_5`和`mini-internvl-chat-4b-v1_5`来训练.
<details><summary>更多</summary>

- 2024.05.24: 支持Phi3多模态模型, 使用model_type `phi3-vision-128k-instruct`来训练.
- 2024.05.22: 支持DeepSeek-V2-lite系列模型, model_type为 `deepseek-v2-lite`和`deekseek-v2-lite-chat`
- 2024.05.22: 支持TeleChat-12b-v2模型和量化版本, model_type为 `telechat-12b-v2`和`telechat-12b-v2-gptq-int4`
Expand Down
4 changes: 4 additions & 0 deletions docs/source/LLM/支持的模型和数据集.md
Original file line number Diff line number Diff line change
Expand Up @@ -203,9 +203,13 @@
|internlm2-20b|[Shanghai_AI_Laboratory/internlm2-20b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-20b/summary)|wqkv|default-generation|&#x2714;|&#x2714;|&#x2714;|transformers>=4.38|-|[internlm/internlm2-20b](https://huggingface.co/internlm/internlm2-20b)|
|internlm2-20b-sft-chat|[Shanghai_AI_Laboratory/internlm2-chat-20b-sft](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-chat-20b-sft/summary)|wqkv|internlm2|&#x2714;|&#x2714;|&#x2714;|transformers>=4.38|-|[internlm/internlm2-chat-20b-sft](https://huggingface.co/internlm/internlm2-chat-20b-sft)|
|internlm2-20b-chat|[Shanghai_AI_Laboratory/internlm2-chat-20b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-chat-20b/summary)|wqkv|internlm2|&#x2714;|&#x2714;|&#x2714;|transformers>=4.38|-|[internlm/internlm2-chat-20b](https://huggingface.co/internlm/internlm2-chat-20b)|
|internlm2_5-1_8b|[Shanghai_AI_Laboratory/internlm2_5-1_8b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2_5-1_8b/summary)|wqkv|default-generation|&#x2714;|&#x2714;|&#x2714;|transformers>=4.38|-|[internlm/internlm2_5-1_8b](https://huggingface.co/internlm/internlm2_5-1_8b)|
|internlm2_5-1_8b-chat|[Shanghai_AI_Laboratory/internlm2_5-1_8b-chat](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2_5-1_8b-chat/summary)|wqkv|internlm2|&#x2714;|&#x2714;|&#x2714;|transformers>=4.38|-|[internlm/internlm2_5-1_8b-chat](https://huggingface.co/internlm/internlm2_5-1_8b-chat)|
|internlm2_5-7b|[Shanghai_AI_Laboratory/internlm2_5-7b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2_5-7b/summary)|wqkv|default-generation|&#x2714;|&#x2714;|&#x2714;|transformers>=4.38|-|[internlm/internlm2_5-7b](https://huggingface.co/internlm/internlm2_5-7b)|
|internlm2_5-7b-chat|[Shanghai_AI_Laboratory/internlm2_5-7b-chat](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2_5-7b-chat/summary)|wqkv|internlm2|&#x2714;|&#x2714;|&#x2714;|transformers>=4.38|-|[internlm/internlm2_5-7b-chat](https://huggingface.co/internlm/internlm2_5-7b-chat)|
|internlm2_5-7b-chat-1m|[Shanghai_AI_Laboratory/internlm2_5-7b-chat-1m](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2_5-7b-chat-1m/summary)|wqkv|internlm2|&#x2714;|&#x2714;|&#x2714;|transformers>=4.38|-|[internlm/internlm2_5-7b-chat-1m](https://huggingface.co/internlm/internlm2_5-7b-chat-1m)|
|internlm2_5-20b|[Shanghai_AI_Laboratory/internlm2_5-20b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2_5-20b/summary)|wqkv|default-generation|&#x2714;|&#x2714;|&#x2714;|transformers>=4.38|-|[internlm/internlm2_5-20b](https://huggingface.co/internlm/internlm2_5-20b)|
|internlm2_5-20b-chat|[Shanghai_AI_Laboratory/internlm2_5-20b-chat](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2_5-20b-chat/summary)|wqkv|internlm2|&#x2714;|&#x2714;|&#x2714;|transformers>=4.38|-|[internlm/internlm2_5-20b-chat](https://huggingface.co/internlm/internlm2_5-20b-chat)|
|internlm2-math-7b|[Shanghai_AI_Laboratory/internlm2-math-base-7b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-math-base-7b/summary)|wqkv|default-generation|&#x2714;|&#x2714;|&#x2714;|transformers>=4.38|math|[internlm/internlm2-math-base-7b](https://huggingface.co/internlm/internlm2-math-base-7b)|
|internlm2-math-7b-chat|[Shanghai_AI_Laboratory/internlm2-math-7b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-math-7b/summary)|wqkv|internlm2|&#x2714;|&#x2714;|&#x2714;|transformers>=4.38|math|[internlm/internlm2-math-7b](https://huggingface.co/internlm/internlm2-math-7b)|
|internlm2-math-20b|[Shanghai_AI_Laboratory/internlm2-math-base-20b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-math-base-20b/summary)|wqkv|default-generation|&#x2714;|&#x2714;|&#x2714;|transformers>=4.38|math|[internlm/internlm2-math-base-20b](https://huggingface.co/internlm/internlm2-math-base-20b)|
Expand Down
4 changes: 4 additions & 0 deletions docs/source_en/LLM/Supported-models-datasets.md
Original file line number Diff line number Diff line change
Expand Up @@ -203,9 +203,13 @@ The table below introcudes all models supported by SWIFT:
|internlm2-20b|[Shanghai_AI_Laboratory/internlm2-20b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-20b/summary)|wqkv|default-generation|&#x2714;|&#x2714;|&#x2714;|transformers>=4.38|-|[internlm/internlm2-20b](https://huggingface.co/internlm/internlm2-20b)|
|internlm2-20b-sft-chat|[Shanghai_AI_Laboratory/internlm2-chat-20b-sft](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-chat-20b-sft/summary)|wqkv|internlm2|&#x2714;|&#x2714;|&#x2714;|transformers>=4.38|-|[internlm/internlm2-chat-20b-sft](https://huggingface.co/internlm/internlm2-chat-20b-sft)|
|internlm2-20b-chat|[Shanghai_AI_Laboratory/internlm2-chat-20b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-chat-20b/summary)|wqkv|internlm2|&#x2714;|&#x2714;|&#x2714;|transformers>=4.38|-|[internlm/internlm2-chat-20b](https://huggingface.co/internlm/internlm2-chat-20b)|
|internlm2_5-1_8b|[Shanghai_AI_Laboratory/internlm2_5-1_8b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2_5-1_8b/summary)|wqkv|default-generation|&#x2714;|&#x2714;|&#x2714;|transformers>=4.38|-|[internlm/internlm2_5-1_8b](https://huggingface.co/internlm/internlm2_5-1_8b)|
|internlm2_5-1_8b-chat|[Shanghai_AI_Laboratory/internlm2_5-1_8b-chat](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2_5-1_8b-chat/summary)|wqkv|internlm2|&#x2714;|&#x2714;|&#x2714;|transformers>=4.38|-|[internlm/internlm2_5-1_8b-chat](https://huggingface.co/internlm/internlm2_5-1_8b-chat)|
|internlm2_5-7b|[Shanghai_AI_Laboratory/internlm2_5-7b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2_5-7b/summary)|wqkv|default-generation|&#x2714;|&#x2714;|&#x2714;|transformers>=4.38|-|[internlm/internlm2_5-7b](https://huggingface.co/internlm/internlm2_5-7b)|
|internlm2_5-7b-chat|[Shanghai_AI_Laboratory/internlm2_5-7b-chat](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2_5-7b-chat/summary)|wqkv|internlm2|&#x2714;|&#x2714;|&#x2714;|transformers>=4.38|-|[internlm/internlm2_5-7b-chat](https://huggingface.co/internlm/internlm2_5-7b-chat)|
|internlm2_5-7b-chat-1m|[Shanghai_AI_Laboratory/internlm2_5-7b-chat-1m](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2_5-7b-chat-1m/summary)|wqkv|internlm2|&#x2714;|&#x2714;|&#x2714;|transformers>=4.38|-|[internlm/internlm2_5-7b-chat-1m](https://huggingface.co/internlm/internlm2_5-7b-chat-1m)|
|internlm2_5-20b|[Shanghai_AI_Laboratory/internlm2_5-20b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2_5-20b/summary)|wqkv|default-generation|&#x2714;|&#x2714;|&#x2714;|transformers>=4.38|-|[internlm/internlm2_5-20b](https://huggingface.co/internlm/internlm2_5-20b)|
|internlm2_5-20b-chat|[Shanghai_AI_Laboratory/internlm2_5-20b-chat](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2_5-20b-chat/summary)|wqkv|internlm2|&#x2714;|&#x2714;|&#x2714;|transformers>=4.38|-|[internlm/internlm2_5-20b-chat](https://huggingface.co/internlm/internlm2_5-20b-chat)|
|internlm2-math-7b|[Shanghai_AI_Laboratory/internlm2-math-base-7b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-math-base-7b/summary)|wqkv|default-generation|&#x2714;|&#x2714;|&#x2714;|transformers>=4.38|math|[internlm/internlm2-math-base-7b](https://huggingface.co/internlm/internlm2-math-base-7b)|
|internlm2-math-7b-chat|[Shanghai_AI_Laboratory/internlm2-math-7b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-math-7b/summary)|wqkv|internlm2|&#x2714;|&#x2714;|&#x2714;|transformers>=4.38|math|[internlm/internlm2-math-7b](https://huggingface.co/internlm/internlm2-math-7b)|
|internlm2-math-20b|[Shanghai_AI_Laboratory/internlm2-math-base-20b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-math-base-20b/summary)|wqkv|default-generation|&#x2714;|&#x2714;|&#x2714;|transformers>=4.38|math|[internlm/internlm2-math-base-20b](https://huggingface.co/internlm/internlm2-math-base-20b)|
Expand Down
46 changes: 46 additions & 0 deletions swift/llm/utils/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -271,9 +271,13 @@ class ModelType:
internlm2_20b_sft_chat = 'internlm2-20b-sft-chat'
internlm2_20b_chat = 'internlm2-20b-chat'
# internlm2.5
internlm2_5_1_8b = 'internlm2_5-1_8b'
internlm2_5_1_8b_chat = 'internlm2_5-1_8b-chat'
internlm2_5_7b = 'internlm2_5-7b'
internlm2_5_7b_chat = 'internlm2_5-7b-chat'
internlm2_5_7b_chat_1m = 'internlm2_5-7b-chat-1m'
internlm2_5_20b = 'internlm2_5-20b'
internlm2_5_20b_chat = 'internlm2_5-20b-chat'
# internlm2-math
internlm2_math_7b = 'internlm2-math-7b'
internlm2_math_7b_chat = 'internlm2-math-7b-chat'
Expand Down Expand Up @@ -3498,6 +3502,27 @@ def get_model_tokenizer_qwen2_intx(model_dir: str,
return get_model_tokenizer_qwen_intx(model_dir, torch_dtype, model_kwargs, load_model, **kwargs)


@register_model(
ModelType.internlm2_5_1_8b,
'Shanghai_AI_Laboratory/internlm2_5-1_8b',
LoRATM.internlm2,
TemplateType.default_generation,
requires=['transformers>=4.38'],
support_flash_attn=True,
support_vllm=True,
support_lmdeploy=True,
hf_model_id='internlm/internlm2_5-1_8b')
@register_model(
ModelType.internlm2_5_1_8b_chat,
'Shanghai_AI_Laboratory/internlm2_5-1_8b-chat',
LoRATM.internlm2,
TemplateType.internlm2,
eos_token='<|im_end|>',
requires=['transformers>=4.38'],
support_flash_attn=True,
support_vllm=True,
support_lmdeploy=True,
hf_model_id='internlm/internlm2_5-1_8b-chat')
@register_model(
ModelType.internlm2_5_7b,
'Shanghai_AI_Laboratory/internlm2_5-7b',
Expand Down Expand Up @@ -3530,6 +3555,27 @@ def get_model_tokenizer_qwen2_intx(model_dir: str,
support_vllm=True,
support_lmdeploy=True,
hf_model_id='internlm/internlm2_5-7b-chat-1m')
@register_model(
ModelType.internlm2_5_20b,
'Shanghai_AI_Laboratory/internlm2_5-20b',
LoRATM.internlm2,
TemplateType.default_generation,
requires=['transformers>=4.38'],
support_flash_attn=True,
support_vllm=True,
support_lmdeploy=True,
hf_model_id='internlm/internlm2_5-20b')
@register_model(
ModelType.internlm2_5_20b_chat,
'Shanghai_AI_Laboratory/internlm2_5-20b-chat',
LoRATM.internlm2,
TemplateType.internlm2,
eos_token='<|im_end|>',
requires=['transformers>=4.38'],
support_flash_attn=True,
support_vllm=True,
support_lmdeploy=True,
hf_model_id='internlm/internlm2_5-20b-chat')
@register_model(
ModelType.internlm2_1_8b,
'Shanghai_AI_Laboratory/internlm2-1_8b',
Expand Down
Loading