The KL value is abnormal #24

kyzhouhzau · 2024-09-28T12:43:50Z

When I train with KTO, the KL value quickly drops to 0, is this normal?

{'loss': 0.4173, 'grad_norm': 1.4672807732482507, 'learning_rate': 4.765488274413721e-06, 'rewards/chosen': 1.19
4046974182129, 'logps/chosen': -18.560531616210938, 'rewards/rejected': 0.43546485900878906, 'logps/rejected': -
29.158364868164064, 'rewards/margins': 0.7585821151733398, 'kl': 0.10797347873449326, 'logits/chosen': -15973750
4.0, 'logits/rejected': -125256448.0, 'epoch': 0.08}
{'loss': 0.4038, 'grad_norm': 15.43012523262249, 'learning_rate': 4.7611130556527825e-06, 'rewards/chosen': 1.25
86393356323242, 'logps/chosen': -25.3940673828125, 'rewards/rejected': 0.3017548084259033, 'logps/rejected': -41
.6545654296875, 'rewards/margins': 0.9568845272064209, 'kl': 0.0654844269156456, 'logits/chosen': -185916384.0, 
'logits/rejected': -143640992.0, 'epoch': 0.08}
{'loss': 0.4329, 'grad_norm': 3.9429698141756444, 'learning_rate': 4.7567378368918445e-06, 'rewards/chosen': 1.1
291874647140503, 'logps/chosen': -30.11488151550293, 'rewards/rejected': 0.19891568024953207, 'logps/rejected': 
-38.57758585611979, 'rewards/margins': 0.9302717844645182, 'kl': 0.0, 'logits/chosen': -149832224.0, 'logits/rej
ected': -177144672.0, 'epoch': 0.08}
{'loss': 0.347, 'grad_norm': 2.2398680774090054, 'learning_rate': 4.7523626181309066e-06, 'rewards/chosen': 1.24
91761666757089, 'logps/chosen': -24.273625126591437, 'rewards/rejected': 0.33067967341496396, 'logps/rejected': 
-24.57554978590745, 'rewards/margins': 0.9184964932607449, 'kl': 0.0, 'logits/chosen': -204615376.0, 'logits/rej
ected': -111233384.0, 'epoch': 0.08}

The text was updated successfully, but these errors were encountered:

kawine · 2024-09-29T22:25:54Z

The KL value is abnormally low here. This can be due to a couple reasons:

The learning rate is too high. It should be between 5e-7 and 5e-6. If it's higher than that, the rewards of both the chosen and rejected examples will become very negative (although rewards/chosen > rewards/rejected). Since the KL term is estimated by taking the average reward of randomly chosen input-output pairs (and then clamped to 0), if all the rewards are negative, then the KL estimate will be zero.
Related to the first point, the loss beta is too low. The lower beta is, the lower the learning rate must be to make sure that rewards/chosen is still positive and rewards/negative is negative, which will then allow the unrelated input-output pairs used to estimate the KL term to have weakly positive rewards.
The model doesn't have enough capacity to learn why the chosen examples are good, so it just pushes down the probability of all the rejected examples to compensate. This leads to a collapse in rewards (they all become negative), and the KL estimate becomes zero.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The KL value is abnormal #24

The KL value is abnormal #24

kyzhouhzau commented Sep 28, 2024

kawine commented Sep 29, 2024

The KL value is abnormal #24

The KL value is abnormal #24

Comments

kyzhouhzau commented Sep 28, 2024

kawine commented Sep 29, 2024