You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I just followed recipes/zephyr-7b-beta/dpo/config_qlora.yaml and hope to replicate the experiments. I was training on A10G, with 1 gpu, and the only modification I did was reducing the train_batch_size from 4 to 1 (due to memory constraint). However, my output models zephyr-7b-dpo-qlora only has mt-score of 1.88. I also did a mt-score benchmark with the downloaded zephyr-7b-sft-qlora and it had mt-bench score of 6.37 (which seems relatively normal). Does anyone else also have difficulties replicating this dpo experiments with qlora? Or is the batch size a critical difference for training?
The text was updated successfully, but these errors were encountered:
Update: I use the mt-bench master branch to run the benchmark on 3 models with gpt-4
zephyr-7b-sft-qlora(downloaded) 6.365625
zephyr-7b-dpo-qlora(downloaded) 4.443038
zephyr-7b-dpo-qlora(trained) 1.883648
Even the downloaded qlora dpo model is worse than the sft model, does someone else also observe this?
Hi, I just followed recipes/zephyr-7b-beta/dpo/config_qlora.yaml and hope to replicate the experiments. I was training on A10G, with 1 gpu, and the only modification I did was reducing the train_batch_size from 4 to 1 (due to memory constraint). However, my output models zephyr-7b-dpo-qlora only has mt-score of 1.88. I also did a mt-score benchmark with the downloaded zephyr-7b-sft-qlora and it had mt-bench score of 6.37 (which seems relatively normal). Does anyone else also have difficulties replicating this dpo experiments with qlora? Or is the batch size a critical difference for training?
The text was updated successfully, but these errors were encountered: