fix docs

modelscope · Aug 5, 2024 · 341c067 · 341c067
1 parent 2ea9b8e
commit 341c067
Show file tree

Hide file tree

Showing 34 changed files with 4 additions and 35 deletions.
diff --git a/docs/source/LLM/命令行参数.md b/docs/source/LLM/命令行参数.md
@@ -62,7 +62,7 @@
 - `--bnb_4bit_quant_type`: 4bit量化时的量化方式, 默认是`'nf4'`. 可选择的值包括: 'nf4', 'fp4'. 当quantization_bit为0时, 该参数无效.
 - `--bnb_4bit_use_double_quant`: 是否在4bit量化时开启double量化, 默认为`True`. 当quantization_bit为0时, 该参数无效.
 - `--bnb_4bit_quant_storage`: 默认值为`None`. 量化参数的存储类型. 若`quantization_bit`设置为0, 则该参数失效.
-- `--target_modules`: 指定lora模块, 默认为`['DEFAULT']`. 如果target_modules传入`'DEFAULT'` or `'AUTO'`, 则根据`model_type`查找`MODEL_MAPPING`中的`target_modules`(默认指定为qkv). 如果传入`'ALL'`, 则将所有的Linear层(不含head)指定为lora模块. 如果传入`'EMBEDDING'`, 则Embedding层指定为lora模块. 如果内存允许, 建议设置成'ALL'. 当然, 你也可以设置`['ALL', 'EMBEDDING']`, 将所有的Linear和embedding层指定为lora模块. 该参数在使用lora/vera/boft/ia3/adalora/fourierft时生效.
+- `--target_modules`: 指定lora模块, 默认为`['DEFAULT']`. 如果target_modules传入`'DEFAULT'` or `'AUTO'`, 则根据`model_type`查找`MODEL_MAPPING`中的`target_modules`(LLM默认指定为qkv, MLLM默认为llm和projector中所有的linear). 如果传入`'ALL'`, 则将所有的Linear层(不含head)指定为lora模块. 如果传入`'EMBEDDING'`, 则Embedding层指定为lora模块. 如果内存允许, 建议设置成'ALL'. 当然, 你也可以设置`['ALL', 'EMBEDDING']`, 将所有的Linear和embedding层指定为lora模块. 该参数在使用lora/vera/boft/ia3/adalora/fourierft时生效.
 - `--target_regex`: 指定lora模块的regex表达式, `Optional[str]`类型. 默认为`None`, 如果该值传入, 则target_modules不生效.该参数在使用lora/vera/boft/ia3/adalora/fourierft时生效.
 - `--lora_rank`: 默认为`8`. 只有当`sft_type`指定为'lora'时才生效.
 - `--lora_alpha`: 默认为`32`. 只有当`sft_type`指定为'lora'时才生效.

diff --git a/docs/source/Multi-Modal/cogvlm2-video最佳实践.md b/docs/source/Multi-Modal/cogvlm2-video最佳实践.md
@@ -104,7 +104,6 @@ response: The video shows a person lighting a fire in a backyard setting. The pe
 ## 微调
 多模态大模型微调通常使用**自定义数据集**进行微调. 这里展示可直接运行的demo:
 
-(默认对LLM的qkv进行lora微调. 如果你想对所有linear都进行微调, 可以指定`--lora_target_modules ALL`)
 ```shell
 # Experimental environment: A100
 # 40GB GPU memory

diff --git a/docs/source/Multi-Modal/cogvlm2最佳实践.md b/docs/source/Multi-Modal/cogvlm2最佳实践.md
@@ -174,7 +174,6 @@ road:
 ## 微调
 多模态大模型微调通常使用**自定义数据集**进行微调. 这里展示可直接运行的demo:
 
-(默认对语言和视觉模型的qkv进行lora微调. 如果你想对所有linear都进行微调, 可以指定`--lora_target_modules ALL`)
 ```shell
 # Experimental environment: A100
 # 70GB GPU memory

diff --git a/docs/source/Multi-Modal/cogvlm最佳实践.md b/docs/source/Multi-Modal/cogvlm最佳实践.md
@@ -136,7 +136,6 @@ road:
 ## 微调
 多模态大模型微调通常使用**自定义数据集**进行微调. 这里展示可直接运行的demo:
 
-(默认对语言和视觉模型的qkv进行lora微调. 如果你想对所有linear都进行微调, 可以指定`--lora_target_modules ALL`)
 ```shell
 # Experimental environment: A100
 # 50GB GPU memory

diff --git a/docs/source/Multi-Modal/deepseek-vl最佳实践.md b/docs/source/Multi-Modal/deepseek-vl最佳实践.md
@@ -165,7 +165,6 @@ road:
 
 LoRA微调:
 
-(默认只对LLM部分的qkv进行lora微调. 如果你想对所有linear含vision模型部分都进行微调, 可以指定`--lora_target_modules ALL`)
 ```shell
 # Experimental environment: A10, 3090, V100
 # 20GB GPU memory

diff --git a/docs/source/Multi-Modal/glm4v最佳实践.md b/docs/source/Multi-Modal/glm4v最佳实践.md
@@ -161,7 +161,6 @@ road:
 ## 微调
 多模态大模型微调通常使用**自定义数据集**进行微调. 这里展示可直接运行的demo:
 
-(默认对语言和视觉模型的qkv进行lora微调. 如果你想对所有linear都进行微调, 可以指定`--lora_target_modules ALL`)
 ```shell
 # Experimental environment: A100
 # 40GB GPU memory

diff --git a/docs/source/Multi-Modal/internlm-xcomposer2最佳实践.md b/docs/source/Multi-Modal/internlm-xcomposer2最佳实践.md
@@ -135,7 +135,6 @@ road:
 ## 微调
 多模态大模型微调通常使用**自定义数据集**进行微调. 这里展示可直接运行的demo:
 
-(默认只对LLM部分的qkv进行lora微调. 不支持`--lora_target_modules ALL`. 支持全参数微调.)
 ```shell
 # Experimental environment: A10, 3090, V100, ...
 # 21GB GPU memory

diff --git a/docs/source/Multi-Modal/internvl最佳实践.md b/docs/source/Multi-Modal/internvl最佳实践.md
@@ -296,7 +296,6 @@ road:
 LoRA微调:
 
 **注意**
-- 默认只对LLM部分的qkv进行lora微调. 如果你想对所有linear含vision模型部分都进行微调, 可以指定`--lora_target_modules ALL`.
 - 如果你的GPU不支持flash attention, 使用参数`--use_flash_attn false`
 
 ```shell

diff --git a/docs/source/Multi-Modal/llava-video最佳实践.md b/docs/source/Multi-Modal/llava-video最佳实践.md
@@ -105,7 +105,6 @@ response: 在这张图像中，有四只羊。
 
 LoRA微调:
 
-(默认只对LLM部分的qkv进行lora微调. 如果你想对所有linear含vision模型部分都进行微调, 可以指定`--lora_target_modules ALL`.)
 ```shell
 # Experimental environment: A10, 3090, V100...
 # 21GB GPU memory

diff --git a/docs/source/Multi-Modal/llava最佳实践.md b/docs/source/Multi-Modal/llava最佳实践.md
@@ -195,7 +195,6 @@ road:
 
 LoRA微调:
 
-(默认只对LLM部分的qkv进行lora微调. 如果你想对所有linear含vision模型部分都进行微调, 可以指定`--lora_target_modules ALL`.)
 ```shell
 # Experimental environment: A10, 3090, V100...
 # 21GB GPU memory

diff --git a/docs/source/Multi-Modal/minicpm-v-2.5最佳实践.md b/docs/source/Multi-Modal/minicpm-v-2.5最佳实践.md
@@ -158,7 +158,6 @@ road:
 ## 微调
 多模态大模型微调通常使用**自定义数据集**进行微调. 这里展示可直接运行的demo:
 
-(默认只对LLM部分的qkv进行lora微调. 如果你想对所有linear含vision模型部分都进行微调, 可以指定`--lora_target_modules ALL`. 支持全参数微调.)
 ```shell
 # Experimental environment: 3090
 # 20GB GPU memory

diff --git a/docs/source/Multi-Modal/minicpm-v-2最佳实践.md b/docs/source/Multi-Modal/minicpm-v-2最佳实践.md
@@ -135,7 +135,6 @@ road:
 ## 微调
 多模态大模型微调通常使用**自定义数据集**进行微调. 这里展示可直接运行的demo:
 
-(默认只对LLM部分的qkv进行lora微调. 如果你想对所有linear含vision模型部分都进行微调, 可以指定`--lora_target_modules ALL`. 支持全参数微调.)
 ```shell
 # Experimental environment: A10, 3090, V100, ...
 # 10GB GPU memory

diff --git a/docs/source/Multi-Modal/minicpm-v最佳实践.md b/docs/source/Multi-Modal/minicpm-v最佳实践.md
@@ -139,7 +139,6 @@ road:
 ## 微调
 多模态大模型微调通常使用**自定义数据集**进行微调. 这里展示可直接运行的demo:
 
-(默认只对LLM部分的qkv进行lora微调. 如果你想对所有linear含vision模型部分都进行微调, 可以指定`--lora_target_modules ALL`. 支持全参数微调.)
 ```shell
 # Experimental environment: A10, 3090, V100, ...
 # 10GB GPU memory

diff --git a/docs/source/Multi-Modal/mplug-owl2最佳实践.md b/docs/source/Multi-Modal/mplug-owl2最佳实践.md
@@ -138,7 +138,6 @@ road:
 ## 微调
 多模态大模型微调通常使用**自定义数据集**进行微调. 这里展示可直接运行的demo:
 
-(默认只对LLM部分的qkv进行lora微调. 如果你想对所有linear含vision模型部分都进行微调, 可以指定`--lora_target_modules ALL`. 支持全参数微调.)
 ```shell
 # Experimental environment: A10, 3090, V100...
 # 24GB GPU memory

diff --git a/docs/source/Multi-Modal/phi3-vision最佳实践.md b/docs/source/Multi-Modal/phi3-vision最佳实践.md
@@ -151,7 +151,6 @@ road:
 ## 微调
 多模态大模型微调通常使用**自定义数据集**进行微调. 这里展示可直接运行的demo:
 
-(默认只对LLM部分的qkv进行lora微调. 如果你想对所有linear含vision模型部分都进行微调, 可以指定`--lora_target_modules ALL`. 支持全参数微调.)
 ```shell
 # Experimental environment: A10, 3090, V100, ...
 # 16GB GPU memory

diff --git a/docs/source/Multi-Modal/qwen-audio最佳实践.md b/docs/source/Multi-Modal/qwen-audio最佳实践.md
@@ -99,7 +99,6 @@ history: [['Audio 1:<audio>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/i
 
 LoRA微调:
 
-(默认只对LLM部分的qkv进行lora微调. 如果你想对所有linear含audio模型部分都进行微调, 可以指定`--lora_target_modules ALL`)
 ```shell
 # Experimental environment: A10, 3090, V100...
 # 22GB GPU memory

diff --git a/docs/source/Multi-Modal/qwen-vl最佳实践.md b/docs/source/Multi-Modal/qwen-vl最佳实践.md
@@ -141,7 +141,6 @@ road:
 
 LoRA微调:
 
-(默认只对LLM部分的qkv进行lora微调. 如果你想对所有linear含vision模型部分都进行微调, 可以指定`--lora_target_modules ALL`)
 ```shell
 # Experimental environment: 3090
 # 23GB GPU memory

diff --git a/docs/source/Multi-Modal/yi-vl最佳实践.md b/docs/source/Multi-Modal/yi-vl最佳实践.md
@@ -156,7 +156,6 @@ road:
 ## 微调
 多模态大模型微调通常使用**自定义数据集**进行微调. 这里展示可直接运行的demo:
 
-(默认只对LLM部分的qkv进行lora微调. 如果你想对所有linear含vision模型部分都进行微调, 可以指定`--lora_target_modules ALL`. 支持全参数微调.)
 ```shell
 # Experimental environment: A10, 3090, V100...
 # 19GB GPU memory

diff --git a/docs/source_en/LLM/Command-line-parameters.md b/docs/source_en/LLM/Command-line-parameters.md
@@ -63,7 +63,7 @@
 - `--bnb_4bit_quant_type`: Quantization method for 4bit quantization, default is `'nf4'`. Options: 'nf4', 'fp4'. Has no effect when quantization_bit is 0.
 - `--bnb_4bit_use_double_quant`: Whether to enable double quantization for 4bit quantization, default is `True`. Has no effect when quantization_bit is 0.
 - `--bnb_4bit_quant_storage`: Default vlaue `None`.This sets the storage type to pack the quanitzed 4-bit prarams. Has no effect when quantization_bit is 0.
-- `--target_modules`: Specify lora modules, default is `['DEFAULT']`. If target_modules is passed `'DEFAULT'` or `'AUTO'`, look up `target_modules` in `MODEL_MAPPING` based on `model_type` (default specifies qkv). If passed `'ALL'`, all Linear layers (excluding head) will be specified as lora modules. If passed `'EMBEDDING'`, Embedding layer will be specified as lora module. If memory allows, setting to 'ALL' is recommended. You can also set `['ALL', 'EMBEDDING']` to specify all Linear and embedding layers as lora modules. This parameter only takes effect when `sft_type` is 'lora'. This argument works when sft_type in lora/vera/boft/ia3/adalora/fourierft.
+- `--target_modules`: Specify lora modules, default is `['DEFAULT']`. If target_modules is passed `'DEFAULT'` or `'AUTO'`, look up `target_modules` in `MODEL_MAPPING` based on `model_type` (The LLM is defaulted to qkv, while the MLLM defaults to all lines in the llm and projector.). If passed `'ALL'`, all Linear layers (excluding head) will be specified as lora modules. If passed `'EMBEDDING'`, Embedding layer will be specified as lora module. If memory allows, setting to 'ALL' is recommended. You can also set `['ALL', 'EMBEDDING']` to specify all Linear and embedding layers as lora modules. This parameter only takes effect when `sft_type` is 'lora'. This argument works when sft_type in lora/vera/boft/ia3/adalora/fourierft.
 - `--target_regex`: The lora target regex in `Optional[str]`. default is `None`. If this argument is specified, the `target_modules` will have no effect. This argument works when sft_type in lora/vera/boft/ia3/adalora/fourierft.
 - `--lora_rank`: Default is `8`. Only takes effect when `sft_type` is 'lora'.
 - `--lora_alpha`: Default is `32`. Only takes effect when `sft_type` is 'lora'.
@@ -250,7 +250,7 @@ The following parameters take effect when `sft_type` is set to `ia3`.
 PT parameters inherit from the SFT parameters with some modifications to the default values:
 
 - `--sft_type`: Default value is `'full'`.
-- `--lora_target_modules`: Default value is `'ALL'`.
+- `--target_modules`: Default value is `'ALL'`.
 - `--lazy_tokenize`: Default value is `True`.
 - `--eval_steps`: Default value is `500`.
 

diff --git a/docs/source_en/Multi-Modal/cogvlm-best-practice.md b/docs/source_en/Multi-Modal/cogvlm-best-practice.md
@@ -125,7 +125,6 @@ road:
 ## Fine-tuning
 Fine-tuning multimodal large models usually uses **custom datasets**. Here is a demo that can be run directly:
 
-(By default, lora fine-tuning is performed on the qkv of the language and vision models. If you want to fine-tune all linears, you can specify `--lora_target_modules ALL`)
 ```shell
 # Experimental environment: A100
 # 50GB GPU memory

diff --git a/docs/source_en/Multi-Modal/cogvlm2-best-practice.md b/docs/source_en/Multi-Modal/cogvlm2-best-practice.md
@@ -156,7 +156,6 @@ road:
 ## Fine-tuning
 Fine-tuning multimodal large models usually uses **custom datasets**. Here is a demo that can be run directly:
 
-(By default, lora fine-tuning is performed on the qkv of the language and vision models. If you want to fine-tune all linears, you can specify `--lora_target_modules ALL`)
 ```shell
 # Experimental environment: A100
 # 70GB GPU memory

diff --git a/docs/source_en/Multi-Modal/cogvlm2-video-best-practice.md b/docs/source_en/Multi-Modal/cogvlm2-video-best-practice.md
@@ -103,7 +103,6 @@ response: The video shows a person lighting a fire in a backyard setting. The pe
 ## Fine-tuning
 Fine-tuning multimodal large models usually uses **custom datasets**. Here is a demo that can be run directly:
 
-(By default, lora fine-tuning is performed on the qkv of LLM. If you want to fine-tune all linears, you can specify `--lora_target_modules ALL`)
 ```shell
 # Experimental environment: A100
 # 40GB GPU memory

diff --git a/docs/source_en/Multi-Modal/deepseek-vl-best-practice.md b/docs/source_en/Multi-Modal/deepseek-vl-best-practice.md
@@ -158,7 +158,6 @@ Multi-modal large model fine-tuning usually uses **custom datasets**. Here is a
 
 LoRA fine-tuning:
 
-(By default, only lora fine-tuning is performed on the qkv part of the LLM. If you want to fine-tune all linear parts including the vision model, you can specify `--lora_target_modules ALL`)
 ```shell
 # Experimental environment: A10, 3090, V100
 # 20GB GPU memory

diff --git a/docs/source_en/Multi-Modal/glm4v-best-practice.md b/docs/source_en/Multi-Modal/glm4v-best-practice.md
@@ -152,7 +152,6 @@ road:
 ## Fine-tuning
 Fine-tuning multimodal large models usually uses **custom datasets**. Here is a demo that can be run directly:
 
-(By default, lora fine-tuning is performed on the qkv of the language and vision models. If you want to fine-tune all linears, you can specify `--lora_target_modules ALL`)
 ```shell
 # Experimental environment: A100
 # 40GB GPU memory

diff --git a/docs/source_en/Multi-Modal/internlm-xcomposer2-best-practice.md b/docs/source_en/Multi-Modal/internlm-xcomposer2-best-practice.md
@@ -133,7 +133,6 @@ road:
 ## Fine-tuning
 Fine-tuning of multimodal large models usually uses **custom datasets**. Here's a demo that can be run directly:
 
-(By default, only the qkv part of the LLM is fine-tuned using Lora. `--lora_target_modules ALL` is not supported. Full-parameter fine-tuning is supported.)
 ```shell
 # Experimental environment: A10, 3090, V100, ...
 # 21GB GPU memory

diff --git a/docs/source_en/Multi-Modal/internvl-best-practice.md b/docs/source_en/Multi-Modal/internvl-best-practice.md
@@ -262,7 +262,6 @@ LoRA fine-tuning:
 
 **note**
 - If your GPU does not support flash attention, use the argument --use_flash_attn false.
-- By default, only the qkv of the LLM part is fine-tuned using LoRA. If you want to fine-tune all linear layers including the vision model part, you can specify `--lora_target_modules ALL`.
 
 ```shell
 # Experimental environment: A100

diff --git a/docs/source_en/Multi-Modal/llava-best-practice.md b/docs/source_en/Multi-Modal/llava-best-practice.md
@@ -186,7 +186,6 @@ Multimodal large model fine-tuning usually uses **custom datasets** for fine-tun
 
 LoRA fine-tuning:
 
-(By default, only the qkv of the LLM part is fine-tuned using LoRA. If you want to fine-tune all linear layers including the vision model part, you can specify `--lora_target_modules ALL`.)
 ```shell
 # Experimental environment: A10, 3090, V100...
 # 21GB GPU memory

diff --git a/docs/source_en/Multi-Modal/llava-video-best-practice.md b/docs/source_en/Multi-Modal/llava-video-best-practice.md
@@ -103,7 +103,6 @@ Multimodal large model fine-tuning usually uses **custom datasets** for fine-tun
 
 LoRA fine-tuning:
 
-(By default, only the qkv of the LLM part is fine-tuned using LoRA. If you want to fine-tune all linear layers including the vision model part, you can specify `--lora_target_modules ALL`.)
 ```shell
 # Experimental environment: A10, 3090, V100...
 # 21GB GPU memory

diff --git a/docs/source_en/Multi-Modal/minicpm-v-best-practice.md b/docs/source_en/Multi-Modal/minicpm-v-best-practice.md
@@ -127,7 +127,6 @@ road:
 ## Fine-tuning
 Fine-tuning multimodal large models usually uses **custom datasets**. Here is a demo that can be run directly:
 
-(By default, only the qkv part of LLM is fine-tuned using LoRA. If you want to fine-tune all linear parts including the vision model, you can specify `--lora_target_modules ALL`. Full parameter fine-tuning is also supported.)
 ```shell
 # Experimental environment: A10, 3090, V100, ...
 # 10GB GPU memory

diff --git a/docs/source_en/Multi-Modal/phi3-vision-best-practice.md b/docs/source_en/Multi-Modal/phi3-vision-best-practice.md
@@ -145,7 +145,6 @@ road:
 ## Fine-tuning
 Multimodal large model fine-tuning usually uses **custom datasets**. Here is a demo that can be run directly:
 
-(Default fine-tune only the LLM part of qkv with lora. If you want to fine-tune all linear modules containing vision model parts, you can specify `--lora_target_modules ALL`. Support to fine-tune all parameters.)
 ```shell
 # Experimental environment: A10, 3090, V100, ...
 # 16GB GPU memory

diff --git a/docs/source_en/Multi-Modal/qwen-audio-best-practice.md b/docs/source_en/Multi-Modal/qwen-audio-best-practice.md
@@ -98,7 +98,6 @@ Multimodal large model fine-tuning usually uses **custom datasets** for fine-tun
 
 LoRA fine-tuning:
 
-(By default, only the qkv of the LLM part is lora fine-tuned. If you want to fine-tune all linear including the audio model part, you can specify `--lora_target_modules ALL`)
 ```shell
 # Experimental environment: A10, 3090, V100...
 # 22GB GPU memory

diff --git a/docs/source_en/Multi-Modal/qwen-vl-best-practice.md b/docs/source_en/Multi-Modal/qwen-vl-best-practice.md
@@ -130,7 +130,6 @@ Multimodal large model fine-tuning usually uses **custom datasets**. Here is a d
 
 LoRA fine-tuning:
 
-(By default, only the qkv part of the LLM is lora fine-tuned. If you want to fine-tune all linear modules including the vision model, you can specify `--lora_target_modules ALL`)
 ```shell
 # Experimental environment: 3090
 # 23GB GPU memory

diff --git a/docs/source_en/Multi-Modal/yi-vl-best-practice.md b/docs/source_en/Multi-Modal/yi-vl-best-practice.md
@@ -141,7 +141,6 @@ road:
 ## Fine-tuning
 Fine-tuning multimodal large models usually uses **custom datasets**. Here shows a demo that can run directly:
 
-(By default, only the qkv of the LLM part is lora fine-tuned. If you want to fine-tune all linears including the vision model part, you can specify `--lora_target_modules ALL`. Full parameter fine-tuning is also supported.)
 ```shell
 # Experimental environment: A10, 3090, V100...
 # 19GB GPU memory

diff --git a/swift/llm/utils/argument.py b/swift/llm/utils/argument.py
@@ -1582,7 +1582,7 @@ def __post_init__(self):
 @dataclass
 class PtArguments(SftArguments):
     sft_type: Literal['lora', 'full', 'longlora', 'adalora', 'ia3', 'llamapro', 'vera', 'boft'] = 'full'
-    lora_target_modules: List[str] = field(default_factory=lambda: ['ALL'])
+    target_modules: List[str] = field(default_factory=lambda: ['ALL'])
     lazy_tokenize: Optional[bool] = True
     eval_steps: int = 500