Skip to content

Commit

Permalink
[doc] Update adapters.md (#2621)
Browse files Browse the repository at this point in the history
  • Loading branch information
xyang16 authored Dec 4, 2024
1 parent 9f7bd54 commit c38d659
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion serving/docs/adapters.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ Here are the settings that are available when using LoRA Adapter.
| Item | Environment Variable | LMI Version | Configuration Type | Description | Example value |
|----------------------------------|----------------------------------|-------------|--------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------|
| option.enable_lora | OPTION_ENABLE_LORA | \>= 0.27.0 | Pass Through | This config enables support for LoRA adapters. | Default: `false` |
| option.max_loras | OPTION_MAX_LORAS | \>= 0.27.0 | Pass Through | This config determines the maximum number of LoRA adapters that can be run at once. Allocates GPU memory for those number adapters. | Default: `4` |
| option.max_loras | OPTION_MAX_LORAS | \>= 0.27.0 | Pass Through | This config determines the maximum number of unique LoRA adapters that can be run in a single batch. | Default: `4` |
| option.max_lora_rank | OPTION_MAX_LORA_RANK | \>= 0.27.0 | Pass Through | This config determines the maximum rank allowed for a LoRA adapter. Set this value to maximum rank of your adapters. Setting a larger value will enable more adapters at a greater memory usage cost. | Default: `16` |
| option.max_cpu_loras | OPTION_MAX_CPU_LORAS | \>= 0.27.0 | Pass Through | Maximum number of LoRAs to store in CPU memory. Must be >= than max_loras. Defaults to max_loras. | Default: `None` |
| option.fully_sharded_loras | OPTION_FULLY_SHARDED_LORAS | \>= 0.31.0 | Pass Through | By default, only half of the LoRA computation is sharded with tensor parallelism. Enabling this will use the fully sharded layers. At high sequence length, max rank or tensor parallel size, this is likely faster. | Default: `true` |
Expand Down

0 comments on commit c38d659

Please sign in to comment.