Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

请问为什么我使用Lora+Zero2微调qwen2-7B大概占了120G显存,是我哪里设置有问题吗? #15

Open
xcl1231 opened this issue Aug 11, 2024 · 1 comment

Comments

@xcl1231
Copy link

xcl1231 commented Aug 11, 2024

请问下主页的显存占用测试是在多少长度的length计算出来的?博客里写的DPO微调qwen2其中loss降到了0.05,请问这是训练多少个epoch跑出来的?
一些参数设置如下:
num_train_epochs: int = field(default=10, metadata={"help": "训练轮次"})
per_device_train_batch_size: int = field(default=1, metadata={"help": "训练的batch size"})
gradient_checkpointing: bool = field(default=False, metadata={"help": "是否使用梯度累计"})
max_length: Optional[int] = 1024
max_prompt_length: Optional[int] = 512
max_target_length: Optional[int] = 1024

@mst272
Copy link
Owner

mst272 commented Aug 11, 2024

请贴一下更详细的信息,主页的显存占用测试是SFT的测试,不过DPO也不应该是120G,很奇怪。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants