We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I'm using the config below and I load the base model as torch.float16
--model_name_or_path llama_model --data_path data.json --bf16 True --num_train_epochs $3 --per_device_train_batch_size 2 --per_device_eval_batch_size 2 --gradient_accumulation_steps 16 --evaluation_strategy "no" --save_strategy "steps" --save_steps 1200 --save_total_limit 3 --learning_rate 2e-5 --weight_decay 0. --warmup_ratio 0.03 --lr_scheduler_type "cosine" --logging_steps 1 --model_max_length 2048 --gradient_checkpointing True --lazy_preprocess True --report_to tensorboard
The text was updated successfully, but these errors were encountered:
Sorry, something went wrong.
@gjmulder thanks for getting back.
Without a plot it is difficult to say for certain, but you are probably overfitting. Don't train for more than one epoch.
No branches or pull requests
I'm using the config below and I load the base model as torch.float16
--model_name_or_path llama_model
--data_path data.json
--bf16 True
--num_train_epochs $3
--per_device_train_batch_size 2
--per_device_eval_batch_size 2
--gradient_accumulation_steps 16
--evaluation_strategy "no"
--save_strategy "steps"
--save_steps 1200
--save_total_limit 3
--learning_rate 2e-5
--weight_decay 0.
--warmup_ratio 0.03
--lr_scheduler_type "cosine"
--logging_steps 1
--model_max_length 2048
--gradient_checkpointing True
--lazy_preprocess True
--report_to tensorboard
The text was updated successfully, but these errors were encountered: