You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It is amazing handbook. In the continued pre-training script (run_cpt.py), I saw that it is not using "mlm" (Masked Language Model) parameter in the training process. I though that the training method mlm vs. forward prediction is the major differentiation between pre-training and supervised fine-tuning.
Has there been an assessment of the efficacy of continued pre-training with "mlm" compared to without it?
What's your advice or guidelines for incorporating "mlm" into the continued pre-training process?
Thanks!
Li
The text was updated successfully, but these errors were encountered:
tanliboy
changed the title
Questions on "mlm" in continued pre-training
Question on "mlm" in continued pre-training
Jun 3, 2024
In the case that we need to fine-tune on a small set of documents (<50M tokens), what would be the best strategy to integrate the knowledge into the LLMs without causing significant regressions on LLMs?
I have heard discussions between re-warming + re-sampling for continued pre-training vs. generating conversational data for instruction fine-tuning. Given we use SFT for both continued pre-training and instruction fine-tuning (assuming not using completion-only data loader), it seems that it is unnecessary to generate conversational data for instruction fine-tuning. Thoughts?
Hi Team,
It is amazing handbook. In the continued pre-training script (
run_cpt.py
), I saw that it is not using "mlm" (Masked Language Model) parameter in the training process. I though that the training method mlm vs. forward prediction is the major differentiation between pre-training and supervised fine-tuning.Thanks!
Li
The text was updated successfully, but these errors were encountered: