Self trained zephyr-7b-dpo-qlora MT-bench score dropped to 1.88 #188

jltchiu · 2024-08-05T17:41:52Z

Hi, I just followed recipes/zephyr-7b-beta/dpo/config_qlora.yaml and hope to replicate the experiments. I was training on A10G, with 1 gpu, and the only modification I did was reducing the train_batch_size from 4 to 1 (due to memory constraint). However, my output models zephyr-7b-dpo-qlora only has mt-score of 1.88. I also did a mt-score benchmark with the downloaded zephyr-7b-sft-qlora and it had mt-bench score of 6.37 (which seems relatively normal). Does anyone else also have difficulties replicating this dpo experiments with qlora? Or is the batch size a critical difference for training?

jltchiu · 2024-08-05T21:44:25Z

Update: I use the mt-bench master branch to run the benchmark on 3 models with gpt-4
zephyr-7b-sft-qlora(downloaded) 6.365625
zephyr-7b-dpo-qlora(downloaded) 4.443038
zephyr-7b-dpo-qlora(trained) 1.883648

Even the downloaded qlora dpo model is worse than the sft model, does someone else also observe this?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Self trained zephyr-7b-dpo-qlora MT-bench score dropped to 1.88 #188

Self trained zephyr-7b-dpo-qlora MT-bench score dropped to 1.88 #188

jltchiu commented Aug 5, 2024

jltchiu commented Aug 5, 2024

Self trained zephyr-7b-dpo-qlora MT-bench score dropped to 1.88 #188

Self trained zephyr-7b-dpo-qlora MT-bench score dropped to 1.88 #188

Comments

jltchiu commented Aug 5, 2024

jltchiu commented Aug 5, 2024