-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DPO 训练后 生成的序列重复token #27
Comments
@mst272 谢谢 我试了将学习率降低 同时beta调整为0.9 结果是没有重复token了 现在的问题是val loss 和 train loss 都不收敛(一直在震荡) (我注意到evaluate 是用一个子集验证的 我认为val dataloader shuffle设置为True 合理一些) |
现在有解决这个问题么,震荡不收敛的问题 |
这个可能与数据集以及轮数有关,且以我的实验效果上看即便不收敛 模型也是可也在相关指标上有提升。 |
重复问题我试了下, prompt加个不要重复就不会重复了 |
使用gpt2 模型 数据集是huggingface 上面的Dahoas/full-hh-rlhf, 10个epochs 后 生成的序列很多重复token 训练时 输出的val reward margins 时正时负
The text was updated successfully, but these errors were encountered: