Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

训练资源要求 #35

Open
rainskyfyy opened this issue Jul 3, 2024 · 4 comments
Open

训练资源要求 #35

rainskyfyy opened this issue Jul 3, 2024 · 4 comments

Comments

@rainskyfyy
Copy link

请问微调训练是否有显存要求呢,需要调整哪些训练参数才能满足最低要求

@TTTdas
Copy link
Contributor

TTTdas commented Jul 5, 2024

请问微调训练是否有显存要求呢,需要调整哪些训练参数才能满足最低要求

您好,提供的默认微调参数是在40G单卡GPU上训练的,如果您的显存比较小,可以尝试减小max_tokens,同时为了保持总batch size相同,需要相应增大update_freq的设置

@rainskyfyy
Copy link
Author

有几个问题:

  1. 默认微调参数是指140h的那个yaml文件吗?
  2. 多卡训练只用调整distributed_world_size吗?
  3. 使用base_audio_finetune_10h.yaml配置参数,pretrained_model为large.pt进行微调训练时,loss下降的较慢(前30个epoch都在1000以上),请问这是正常的吗?以您的经验,应该如何调整?

@TTTdas
Copy link
Contributor

TTTdas commented Jul 8, 2024

  1. 有3个默认微调参数的yaml文件,分别对应不同数据量的有标注数据集进行微调,大于100h的数据用140h那个就好
  2. 对,多卡训练只需要调整distributed_world_size。如果想保持和给定的默认参数训练效果一致的话,需要相应的减小update_freq,即distributed_world_size * update_freq = 实际等效的GPU数量,可参考https://github.com/facebookresearch/fairseq/tree/main/examples/wav2vec
  3. 这个是不正常的,如果数据大于10h,建议选择另外两个配置参数。另外,可以检查一下loss在lr到达最大阶段时,下降的趋势是否有变化。感觉像是batchsize与learning rate设置的问题(注意实际batchsize与max_tokens,update_freq ,distributed_world_size )都有关系

@rainskyfyy
Copy link
Author

万分感谢!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants