Skip to content

Commit

Permalink
update
Browse files Browse the repository at this point in the history
  • Loading branch information
hjh0119 committed Jul 8, 2024
1 parent 153d18e commit e5a9e5e
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion docs/source/LLM/命令行参数.md
Original file line number Diff line number Diff line change
Expand Up @@ -235,7 +235,7 @@ RLHF参数继承了sft参数, 除此之外增加了以下参数:
- `--loss_type`: loss类型, 默认值'sigmoid'.
- `--sft_beta`: 是否在DPO中加入sft loss, 默认为0.1, 支持 $[0, 1)$ 区间,最后的loss为`(1-sft_beta)*KL_loss + sft_beta * sft_loss`.
- `--simpo_gamma`: SimPO算法中的reward margin项,论文中建议设置为0.5-1.5, 默认为1.0
- `--cpo_alpha`: 混合CPO loss中的nll loss, 默认为1.0
- `--cpo_alpha`: CPO loss 中 nll loss的系数, 默认为1.0, 在SimPO中使用混合nll loss以提高训练稳定性
- `--desirable_weight`: KTO算法中对desirable response的loss权重 $\lambda_D$ ,默认为1.0
- `--undesirable_weight`: KTO论文中对undesirable response的loss权重 $\lambda_U$ , 默认为1.0. 分别用$n_d$ 和$n_u$ 表示数据集中desirable examples和undesirable examples的数量,论文中推荐控制 $\frac{\lambda_D n_D}{\lambda_Un_U} \in [1,\frac{4}{3}]$

Expand Down

0 comments on commit e5a9e5e

Please sign in to comment.