Best practice recommendation update for dpo_trainer.mdx (#1325)

In the document as it is now the best practice recommendations don't seem neither consistent nor correct. For example, the documentation links a tweet with a recommendation to merge adaptors into a quantized model, and a script that supposedly illustrates how to apply that recommendation. But the script actually does the opposite of what the tweet recommends, first dequantizing the model. There are similar inconsistencies/ambiguities further in that paragraph. For example, saying that using an unquantized model would lead to lower performance (I changed it to "higher memory demand"). Overall, I updated the paragraph to improve consistency and provided links to slightly more evidence-based merging recommendations.
huggingface · Feb 14, 2024 · 29f162b · 29f162b
1 parent 6852097
commit 29f162b
Showing 1 changed file with 3 additions and 5 deletions.
diff --git a/docs/source/dpo_trainer.mdx b/docs/source/dpo_trainer.mdx
@@ -167,15 +167,13 @@ You have three main options (plus several variants) for how the reference model
 
 ### Downsides to merging QLoRA before DPO (approach 2)
 
-As suggested by [Tim Dettmers](https://twitter.com/Tim_Dettmers/status/1694654191325573456), the best option for merging QLoRA adapters is to first quantize the base model, merge the adapter, then convert back to bf16.  Something similar to [this script](https://github.com/jondurbin/qlora/blob/main/qmerge.py)
+As suggested by [Benjamin Marie](https://medium.com/@bnjmn_marie/dont-merge-your-lora-adapter-into-a-4-bit-llm-65b6da287997), the best option for merging QLoRA adapters is to first dequantize the base model, then merge the adapter. Something similar to [this script](https://github.com/jondurbin/qlora/blob/main/qmerge.py).
 
-You can also just merge the adapters the standard way without quantizing the base model, but then you have 1-2% reduced performance (and evidently, more issues with empty responses).
-
-If you use the recommended approach, which quantizes the model, you're now in a situation where to use QLoRA for DPO, you will need to re-quantize the merged model again or use an unquantized merge with lower overall performance.
+However, after using this approach, you will have an unquantized base model. Therefore, to use QLoRA for DPO, you will need to re-quantize the merged model or use the unquantized merge (resulting in higher memory demand).
 
 ### Using option 3 - load the adapter twice
 
-To avoid the downsides with option 2, at the expense of slightly increased VRAM, you can load your fine-tuned adapter into the model twice, with different names, and set the model/ref adapter names in DPOTrainer.
+To avoid the downsides with option 2, you can load your fine-tuned adapter into the model twice, with different names, and set the model/ref adapter names in DPOTrainer.
 
 For example:
 ```python