-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPU usage increasing as training progresses #7
Comments
Hi,
We used RTX 3090 with 24G GPU memory.
32GB GPUs should be enough for the model training. But if you are still facing this problem, maybe you could consider using Adafactor instead of Adam as the optimizer. Also depending on where this overflow occurs, it may be helpful to reduce the batch size of the dataloader for the evaluation set. Line 370 in a32b78e
Please let me know if you have more questions. |
我用两块24G 的3090Ti 是否可以进行模型的训练; |
Could you show me the way we can train brio-cnndm-uncased model on 1 GPU 11Gb? |
The question is if two 3090Ti are enough for model training. Yes, you should be able to train the model using 2 GPUs. But you will need to increase the step of gradient accumulation to keep the same effective batch size.
I'm not sure if there's a workaround for training the model on 11GB. Unfortunately 11GB GPU is barely enough for training the baseline model (BART). There are two things you could try:
|
Thank you for your reply, I tried to reduce args.max_num from 16 to 2 and it worked well. Thank you again :) |
Thank you for the great work. Could you please kindly explain why we should increase the step of gradient accumulation while training on multiple GPUs? |
Hi,I'd like to ask how long it takes you to train an epoch with 11GB GPU.Thanks |
Hi,
Thank you for the good work.
del
commands to remove tensors that are no longer needed. Is there anything else that also needs to be deleted?Thank you!
The text was updated successfully, but these errors were encountered: