Skip to content

Commit

Permalink
Merge commit '3fecc8cfa2d0181589d711aff3da5b6904c291ac' into release/2.0
Browse files Browse the repository at this point in the history
* commit '3fecc8cfa2d0181589d711aff3da5b6904c291ac':
  support Codeqwen-7b-chat model (#718)
  Fix bugs (#714)
  Fix many bug (#716)
  fix (#711)
  [doc] Update index.md (#709)
  support Llava-v1.6-34b model (#708)
  Support mPLUG-Owl2 (#706)
  fix minicpm-v-v2 bug (#703)
  fix readme (#704)
  Drop data by gradient_accumulation_steps (#626)
  Fix stream 0415 (#702)
  feat(model): support minicpm-v-2 (#699)
  bump version

# Conflicts:
#	docs/source/Multi-Modal/minicpm-v-2最佳实践.md
#	swift/llm/utils/template.py
#	swift/version.py
  • Loading branch information
tastelikefeet committed Apr 17, 2024
2 parents b886406 + 3fecc8c commit 5cbaf3d
Show file tree
Hide file tree
Showing 36 changed files with 716 additions and 135 deletions.
14 changes: 9 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,10 +39,12 @@ To facilitate use by users unfamiliar with deep learning, we provide a Gradio we
Additionally, we are expanding capabilities for other modalities. Currently, we support full-parameter training and LoRA training for AnimateDiff.

## 🎉 News
- 🔥2024.04.17: Support **CodeQwen1.5-7B** series: CodeQwen1.5-7B, CodeQwen1.5-7B-Chat,CodeQwen1.5-7B-Chat-AWQ, use [this script](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/codeqwen1half_7b_chat/lora/sft.sh) to train.
- 2024.04.16: Supports inference and fine-tuning of llava-v1.6-34b model. For best practice, you can refer to [here](https://github.com/modelscope/swift/tree/main/docs/source_en/Multi-Modal/llava-best-practice.md).
- 2024.04.13: Support the fine-tuning and inference of Mixtral-8x22B-v0.1 model, use [this script](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/mixtral_moe_8x22b_v1/lora_ddp_ds/sft.sh) to start training!
- 2024.04.13: Support the newly launched **MiniCPM** series: MiniCPM-V-2.0、MiniCPM-2B-128k、MiniCPM-MoE-8x2B and MiniCPM-1B.use [this script](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/minicpm_moe_8x2b/lora_ddp/sft.sh) to start training!
- 🔥2024.04.11: Support Model Evaluation with MMLU/ARC/CEval datasets(also user custom eval datasets) with one command! Check [this documentation](docs/source_en/LLM/LLM-eval.md) for details. Meanwhile, we support a trick way to do multiple ablation experiments, check [this documentation](docs/source_en/LLM/LLM-exp.md) to use.
- 🔥2024.04.11: Support **c4ai-command-r** series: c4ai-command-r-plus, c4ai-command-r-v01, [this script](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/c4ai-command-r-plus/lora_mp/sft.sh) to train.
- 🔥2024.04.11: Support **c4ai-command-r** series: c4ai-command-r-plus, c4ai-command-r-v01, use [this script](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/c4ai_command_r_plus/lora_mp/sft.sh) to train.
- 2024.04.10: Use SWIFT to fine-tune the qwen-7b-chat model to enhance its function call capabilities, and combine it with [Modelscope-Agent](https://github.com/modelscope/modelscope-agent) for best practices, which can be found [here](https://github.com/modelscope/swift/tree/main/docs/source_en/LLM/Agent-best-practice.md#Usage-with-Modelscope_Agent).
- 🔥2024.04.09: Support ruozhiba dataset. Search `ruozhiba` in [this documentation](docs/source_en/LLM/Supported-models-datasets.md) to begin training!
- 2024.04.08: Support the fine-tuning and inference of XVERSE-MoE-A4.2B model, use [this script](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/xverse_moe_a4_2b/lora/sft.sh) to start training!
Expand All @@ -53,7 +55,7 @@ Additionally, we are expanding capabilities for other modalities. Currently, we
- 🔥2024.03.29: Support **Qwen1.5-MoE** series: Qwen1.5-MoE-A2.7B, Qwen1.5-MoE-A2.7B-Chat, Qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4.
- 🔥2024.03.29: Support the fine-tuning and inference of **Grok-1** 300B MoE, please view details [here](https://github.com/modelscope/swift/tree/main/docs/source_en/LLM/Grok-1-best-practice.md).
- 🔥2024.03.25: Supports inference and fine-tuning of TeleChat-7b and TeleChat-12b model, use [this script](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/telechat_12b/lora/sft.sh) to start training!
- 🔥2024.03.20: Supports inference and fine-tuning for the **llava** series. For best practice, you can refer to [here](https://github.com/modelscope/swift/tree/main/docs/source/Multi-Modal/llava最佳实践.md).
- 🔥2024.03.20: Supports inference and fine-tuning for the **llava** series. For best practice, you can refer to [here](https://github.com/modelscope/swift/tree/main/docs/source_en/Multi-Modal/llava-best-practice.md).
- 🔥2024.03.12: Support inference and fine-tuning for **deepseek-vl** series. Best practices can be found [here](docs/source_en/Multi-Modal/deepseek-vl-best-practice.md).
- 🔥2024.03.11: Support [GaLore](https://arxiv.org/abs/2403.03507) for effectively reducing memory usage to 1/2 of the original in full-parameter training.
- 🔥2024.03.10: [End-to-end best practices](docs/source_en/LLM/Qwen1.5-best-practice.md) from fine-tuning to deployment for Qwen1.5-7B-Chat and Qwen1.5-72B-Chat.
Expand Down Expand Up @@ -85,7 +87,8 @@ Additionally, we are expanding capabilities for other modalities. Currently, we
- 🔥2024.01.17: Support internlm2 series: internlm2-7b-base, internlm2-7b, [internlm2-7b-sft-chat](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/internlm2_7b_sft_chat), internlm2-7b-chat, internlm2-20b-base, internlm2-20b, internlm2-20b-sft-chat, internlm2-20b-chat.
- 2024.01.15: Support yuan series: yuan2-2b-instruct, [yuan2-2b-janus-instruct](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/yuan2_2b_janus_instruct), yuan2-51b-instruct, yuan2-102b-instruct.
- 🔥2024.01.12: Support **deepseek-moe** series: deepseek-moe-16b, [deepseek-moe-16b-chat](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/deepseek_moe_16b_chat).
- 🔥2024.01.04: Support **VLLM deployment**, compatible with **OpenAI API** style, see [VLLM Inference Acceleration and Deployment](https://github.com/modelscope/swift/blob/main/docs/source/LLM/VLLM推理加速与部署.md#部署) for details.
- 🔥2024.01.04: Support **VLLM deployment**, compatible with **OpenAI API** style, see [VLLM Inference Acceleration and Deployment](https://github.com/modelscope/swift/blob/main/docs/source_en/LLM/VLLM-inference-acceleration-and-deployment.md#Deployment) for details.

- 2024.01.04: Update [Benchmark](https://github.com/modelscope/swift/blob/main/docs/source/LLM/Benchmark.md) for convenient viewing of training speed and memory usage of different models.
- 🔥2023.12.29: Support web-ui for sft training and inference, use `swift web-ui` after installing ms-swift to start.
- 🔥2023.12.29: Support DPO RLHF (Reinforcement Learning from Human Feedback) and three datasets for this task: AI-ModelScope/stack-exchange-paired, AI-ModelScope/hh-rlhf and AI-ModelScope/hh_rlhf_cn. See [documentation](https://github.com/modelscope/swift/blob/main/docs/source/LLM/LLM%E4%BA%BA%E7%B1%BB%E5%AF%B9%E9%BD%90%E8%AE%AD%E7%BB%83%E6%96%87%E6%A1%A3.md) to start training!
Expand Down Expand Up @@ -379,7 +382,7 @@ CUDA_VISIBLE_DEVICES=0 swift deploy \

| Model Type | Model Introduction | Language | Model Size | Model Type |
|------------------------------------------------|------------------------------------------------------------------------|--------------------|----------------------------------------|------------------------------------------- |
| Qwen<br>Qwen1.5 | [Tongyi Qwen 1.0 and 1.5 series models](https://github.com/QwenLM) | Chinese<br>English | 0.5B-72B<br>including quantized versions | base model<br>chat model<br>MoE model |
| Qwen<br>Qwen1.5 | [Tongyi Qwen 1.0 and 1.5 series models](https://github.com/QwenLM) | Chinese<br>English | 0.5B-72B<br>including quantized versions | base model<br>chat model<br>MoE model<br>code model |
| ChatGLM2<br>ChatGLM3<br>Codegeex2 | [Zhipu ChatGLM series models](https://github.com/THUDM) | Chinese<br>English | 6B | base model<br>chat model<br>code model |
| Baichuan/Baichuan2 | [Baichuan 1 and Baichuan 2](https://github.com/baichuan-inc) | Chinese<br>English | 7B-13B<br>including quantized versions | base model<br>chat model |
| Yuan2 | [Langchao Yuan series models](https://github.com/IEIT-Yuan) | Chinese<br>English | 2B-102B | instruct model |
Expand Down Expand Up @@ -422,7 +425,8 @@ CUDA_VISIBLE_DEVICES=0 swift deploy \
| DeepSeek-VL | [DeepSeek series vision models](https://github.com/deepseek-ai) | Chinese<br>English | 1.3B-7B | chat model |
| MiniCPM-V | [OpenBmB MiniCPM vision model](https://github.com/OpenBMB/MiniCPM) | Chinese<br>English | 3B | chat model |
| CogVLM<br>CogAgent | [Zhipu ChatGLM visual QA and Agent model](https://github.com/THUDM/) | English | 17B-18B | chat model |
| Llava | [Llava series models](https://github.com/haotian-liu/LLaVA) | English | 7B | chat model |
| Llava | [Llava series models](https://github.com/haotian-liu/LLaVA) | English | 7B-34B | chat model |
| mPLUG-Owl | [mPLUG-Owl series models](https://github.com/X-PLUG/mPLUG-Owl) | English | 11B | chat model |

#### Diffusion Models

Expand Down
9 changes: 6 additions & 3 deletions README_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,10 +40,12 @@ SWIFT支持近**200种LLM和MLLM**(多模态大模型)的训练、推理、
此外,我们也在拓展其他模态的能力,目前我们支持了AnimateDiff的全参数训练和LoRA训练。

## 🎉 新闻
- 🔥2024.04.17: 支持 **CodeQwen1.5-7B**系列: CodeQwen1.5-7B, CodeQwen1.5-7B-Chat,CodeQwen1.5-7B-Chat-AWQ, 使用[这个脚本](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/codeqwen1half_7b_chat/lora/sft.sh)来开始训练!
- 2024.04.16: 支持llava-v1.6-34b的推理与微调, 最佳实践可以查看[这里](https://github.com/modelscope/swift/tree/main/docs/source/Multi-Modal/llava最佳实践.md).
- 2024.04.13: 支持Mixtral-8x22B-v0.1模型的推理与微调, 使用[这个脚本](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/mixtral_moe_8x22b_v1/lora_ddp_ds/sft.sh)来开始训练!
- 2024.04.13: 支持新推出的**MiniCPM**系列: MiniCPM-V-2.0、MiniCPM-2B-128k、MiniCPM-MoE-8x2B和MiniCPM-1B。使用[这个脚本](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/minicpm_moe_8x2b/lora_ddp/sft.sh)来开始训练!
- 🔥2024.04.11: 支持一键式模型评测能力! 首批数据集包含MMLU、CEval、ARC等,也支持用户自定义数据集,具体可以[这个文档](docs/source/LLM/LLM评测文档.md)。同时, 我们支持了一个比较trick的方法来做多个消融实验的管理,查看[这个文档](docs/source/LLM/LLM实验文档.md)来使用。
- 🔥2024.04.11: 支持**c4ai-command-r**系列: c4ai-command-r-plus, c4ai-command-r-v01。使用[这个脚本](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/c4ai-command-r-plus/lora_mp/sft.sh)来开始训练!
- 🔥2024.04.11: 支持**c4ai-command-r**系列: c4ai-command-r-plus, c4ai-command-r-v01。使用[这个脚本](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/c4ai_command_r_plus/lora_mp/sft.sh)来开始训练!
- 2024.04.10: 使用swift微调qwen-7b-chat模型增强模型function call能力,并结合[Modelscope-Agent](https://github.com/modelscope/modelscope-agent)使用,最佳实践可以查看[这里](https://github.com/modelscope/swift/tree/main/docs/source/LLM/Agent微调最佳实践.md#搭配Modelscope-Agent使用)
- 🔥2024.04.09: 支持`弱智吧`系列数据集. 在[支持的模型和数据集文档](docs/source/LLM/支持的模型和数据集.md)中搜索`ruozhiba`来找到数据集并开始训练!
- 2024.04.08: 支持XVERSE-MoE-A4.2B模型的推理与微调, 使用[这个脚本](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/xverse_moe_a4_2b/lora/sft.sh)来开始训练!
Expand Down Expand Up @@ -378,7 +380,7 @@ CUDA_VISIBLE_DEVICES=0 swift deploy \

| 模型类型 | 模型介绍 | 语言 | 模型大小 | 模型类型 |
| --------------------------------------------------- | ------------------------------------------------------------ |----------| ------------------------- |-------------------------------------------|
| Qwen<br>Qwen1.5 | [通义千问1.0和1.5系列模型](https://github.com/QwenLM) | 中文<br>英文 | 0.5B-72B<br>包含量化版本 | base模型<br>chat模型<br>MoE模型 | |
| Qwen<br>Qwen1.5 | [通义千问1.0和1.5系列模型](https://github.com/QwenLM) | 中文<br>英文 | 0.5B-72B<br>包含量化版本 | base模型<br>chat模型<br>MoE模型<br>代码模型 | |
| ChatGLM2<br>ChatGLM3<br>Codegeex2 | [智谱ChatGLM系列模型](https://github.com/THUDM/) | 中文<br>英文 | 6B | base模型<br>chat模型<br>代码模型 |
| Baichuan<br>Baichuan2 | [百川1和百川2](https://github.com/baichuan-inc) | 中文<br>英文 | 7B-13B<br>包含量化版本 | base模型<br>chat模型 |
| Yuan2 | [浪潮源系列模型](https://github.com/IEIT-Yuan) | 中文<br>英文 | 2B-102B | instruct模型 |
Expand Down Expand Up @@ -421,7 +423,8 @@ CUDA_VISIBLE_DEVICES=0 swift deploy \
| DeepSeek-VL | [幻方系列视觉模型](https://github.com/deepseek-ai) | 中文<br>英文 | 1.3B-7B | chat模型 |
| MiniCPM-V | [OpenBmB MiniCPM视觉模型](https://github.com/OpenBMB/MiniCPM) | 中文<br>英文 | 3B | chat模型 |
| CogVLM<br>CogAgent | [智谱ChatGLM视觉问答和Agent模型](https://github.com/THUDM/) | 英文 | 17B-18B | chat模型 |
| Llava | [Llava系列模型](https://github.com/haotian-liu/LLaVA) | 英文 | 7B | chat模型 |
| Llava | [Llava系列模型](https://github.com/haotian-liu/LLaVA) | 英文 | 7B-34B | chat模型 |
| mPLUG-Owl | [mPLUG-Owl系列模型](https://github.com/X-PLUG/mPLUG-Owl) | 英文 | 11B | chat模型 |

#### 扩散模型

Expand Down
1 change: 1 addition & 0 deletions docs/source/LLM/命令行参数.md
Original file line number Diff line number Diff line change
Expand Up @@ -242,6 +242,7 @@ eval参数继承了infer参数,除此之外增加了以下参数:
- `--eval_limit`: 每个评测集的子数据集的采样数量, 默认为`None`代表全量评测.
- `--eval_few_shot`: 每个评测集的子数据集的few-shot个数, 默认为`None`代表使用数据集默认配置.
- `--custom_eval_config`: 使用自定义数据集进行评测, 需要是一个本地存在的文件路径, 文件格式详见[自定义评测集](./LLM评测文档.md#自定义评测集).
- `--eval_use_cache`: 是否使用已经生成的评测缓存, 使做过的评测不会重新运行而只是重新生成评测结果. 默认`False`.

## app-ui 参数

Expand Down
Loading

0 comments on commit 5cbaf3d

Please sign in to comment.