We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
因为`bitsandbytes`实现模型量化的时候是通过重载`.cuda()`函数实现的,也就是说模型在放到显卡的时候会发生量化(改变tensor维度)。在微调的时候,加载的预训练权重是fp16的,所以需要设置`args.device='cpu'`,把权重加载进来再调用`.cuda()`。因为这个是`bitsandbytes`的实现,我们也没办法控制,只能适配。
所以维度不一致是显卡配置的问题,.cuda()调用失败了。
.cuda()
Originally posted by @1049451037 in #125 (comment)
The text was updated successfully, but these errors were encountered:
bitsandbytes
args.device='cpu'
显卡资源有限 (4070*2), 用lora+model parallel 还是会报错OOM(并且根据这个issue作者有提到这个#209 (comment))
我根据#209 这个issue修改了finetune_qlora.sh和finetune_visualglm.py这两个文件
但是如果用qlora的话如果要先在cpu上加载模型,那么 model, args = FineTuneVisualGLMModel.from_pretrained(model_type, args, overwrite_args={'model_parallel_size':2}) 这个命令就无法执行了(我只有一个cpu)
那这样的话请问怎么实现用qlora+model parallel呢
Sorry, something went wrong.
显卡资源有限 (4070*2), 用lora+model parallel 还是会报错OOM(并且根据这个issue作者有提到这个#209 (comment)) 我根据#209 这个issue修改了finetune_qlora.sh和finetune_visualglm.py这两个文件 但是如果用qlora的话如果要先在cpu上加载模型,那么 model, args = FineTuneVisualGLMModel.from_pretrained(model_type, args, overwrite_args={'model_parallel_size':2}) 这个命令就无法执行了(我只有一个cpu) 那这样的话请问怎么实现用qlora+model parallel呢
请问您实现了吗?能进一步探讨下吗?
No branches or pull requests
所以维度不一致是显卡配置的问题,
.cuda()
调用失败了。Originally posted by @1049451037 in #125 (comment)
The text was updated successfully, but these errors were encountered: