-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
demo中剪枝后预测结果差距很大? #8
Comments
呃你没加载预训练的模型吧?我试了一下我的 建议在一个有能看的精度的模型上做测试,然后跑一下验证集来测精度损失多少。 我机子是 torch==1.9.0+cu111 torchvision==0.10.0+cu111 |
好的,我尝试用这个程序对YOLOv4进行剪枝,但是在这里报错,有啥快速解决的方法吗 graph.build_graph(inputs=(torch.zeros(1, 3, self.input_shape[1],self.input_shape[0]),))
File "/home/scut/anaconda3/envs/torch18/lib/python3.7/site-packages/torchpruner-0.0.1-py3.7.egg/torchpruner/graph.py", line 492, in build_graph
File "/home/scut/anaconda3/envs/torch18/lib/python3.7/site-packages/torchpruner-0.0.1-py3.7.egg/torchpruner/graph.py", line 27, in create_operator
RuntimeError: Can not find operator Softplus |
可以参考 https://github.com/THU-MIG/torch-model-compression/blob/main/torchpruner/DOCUMENT.md#operator-注册 来写一个新算子支持解析Softplus。 |
考虑到Softplus是逐元素操作,可以直接用 onnx_operator.onnx_pw,也就是写 |
谢谢,已经可以使用了。我仅仅按照1%的比例剪枝,剪完后的mAP是43,而不剪的时候是93,性能掉的有点多,这情况正常吗 |
重训练看看吧。普通分类模型确实不会掉这么多,往往是一次就一两点,分割的话我不太熟
…---原始邮件---
发件人: ***@***.***>
发送时间: 2022年1月11日(周二) 晚上8:22
收件人: ***@***.***>;
抄送: "Dahan ***@***.******@***.***>;
主题: Re: [THU-MIG/torch-model-compression] demo中剪枝后预测结果差距很大? (Issue #8)
考虑到Softplus是逐元素操作,可以直接用 onnx_operator.onnx_pw,也就是写 operator_reg.regist("Softplus", onnx_operator.onnx_pw)
谢谢,已经可以使用了。我仅仅按照1%的比例剪枝,剪完后的mAP是43,而不剪的时候是93,性能掉的有点多,这情况正常吗
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications on the go with GitHub Mobile for iOS or Android.
You are receiving this because you commented.Message ID: ***@***.***>
|
@gdh1995 我目前尝试使用QAT对目标检测模型进行量化,参考的例子https://github.com/THU-MIG/torch-model-compression/blob/main/examples/torchslim/pytorch-cifar/qat.py 目前遇到了以下两个问题:
if isinstance(item, (tuple, list)):
item = [ann.to(device) for ann in item]
return_list.append(item)
else:
return_list.append(item.to(device)) 程序可以正常运行,但是gpu显存不停的增加,当训练多个iteration时,就会发生内存不足,不知道是哪里出现了问题? Saving model...
Traceback (most recent call last):
File "/media/scut/dataD/XZY/Gitee/yolov4-pytorch/QAT_train.py", line 268, in <module>
solver.run()
File "/home/scut/anaconda3/envs/torch17/lib/python3.9/site-packages/torchpruner-0.0.1-py3.9.egg/torchslim/slim_solver.py", line 431, in run
File "/home/scut/anaconda3/envs/torch17/lib/python3.9/site-packages/torchpruner-0.0.1-py3.9.egg/torchslim/quantizing/qat.py", line 148, in save_model
File "/home/scut/anaconda3/envs/torch17/lib/python3.9/site-packages/torchpruner-0.0.1-py3.9.egg/torchslim/quantizing/qat_tools.py", line 316, in export_onnx
File "/home/scut/anaconda3/envs/torch17/lib/python3.9/site-packages/torchpruner-0.0.1-py3.9.egg/torchslim/quantizing/qat_tools.py", line 223, in onnx_post_process
RuntimeError: the number of children node must be 1 这个又是什么原因呢?感觉我只是更改了模型,例子中的分类模型就正常保存,而目标检测模型就报这个错了 另外,https://github.com/THU-MIG/torch-model-compression/blob/main/torchslim/quantizing/qat.py ,代码中忘记导入os库了 |
比如,也许是 下边那个RuntimeError 可以被跳过去,我提交了新代码,你更新下。
|
@gdh1995 谢谢~ 我其实是将https://github.com/bubbliiiing/yolov4-pytorch/blob/a046ae47a01ccaae9ee8c7ea936ef311c74ce766/utils/utils_fit.py#L6 函数拆成了qat中所需要的钩子函数,具体内容如下。在原程序中,不进行剪枝和量化进行训练时,显存很稳定,但是写到qat中就会显存爆掉。 def predict_function(model, data):
images, targets = data
outputs = model(images)
return outputs
def calculate_loss(predict, data):
_, targets = data
loss_value_all = 0
num_pos_all = 0
# ----------------------#
# 计算损失
# ----------------------#
for l in range(len(predict)):
loss_item, num_pos = yolo_loss(l, predict[l], targets)
loss_value_all += loss_item
num_pos_all += num_pos
loss_value = loss_value_all / num_pos_all
return loss_value
def evaluate(predict, data):
_, targets = data
loss_value_all = 0
num_pos_all = 0
# ----------------------#
# 计算损失
# ----------------------#
for l in range(len(predict)):
loss_item, num_pos = yolo_loss(l, predict[l], targets)
loss_value_all += loss_item
num_pos_all += num_pos
loss_value = loss_value_all / num_pos_all
return {"negative_loss": -loss_value}
def dataset_generator():
return train_dataset, val_dataset 还有一个问题,模型保存成功了,但是我看trt文件和原始模型大小差不多,所以还是以fp32精度保存的trt模型嘛?那要是想进行int8推理,是使用TensorRT以fp32精度读取trt模型,然后再使用TensorRT内置的int8量化函数,重新校准量化吗? |
你能跑完2个batch吗?还是只跑了1个batch就崩出来了?
|
@gdh1995 是跑了好几个batch才出错的,跑的过程中,可以看出来每个运行一个batch,显存就增加几兆,最后就炸了 |
|
显存看起来不会一直增加了。但是出了另一个问题,就是在训练完一个epoch,进入evaluate时,显存炸了,如下,似乎是突然大量申请内存导致内存爆炸了,这是正常现象吗? | epoch:0 | iteration:485 | negative_loss:-1.1794 | loss:1.1794 |
| epoch:0 | iteration:486 | negative_loss:-1.1790 | loss:1.1790 |
evaluating...
| epoch:0 | iteration:1 | negative_loss:-0.6776 | loss:0.6776 |
Traceback (most recent call last):
File "/media/scut/dataD/XZY/Gitee/yolov4-pytorch/QAT_train.py", line 268, in <module>
.........
RuntimeError: CUDA out of memory. |
正常吧,eval的时候也需要大量显存,建议改小一点batch size重新试,有时候连train的batch size也要改小以留出空间。 可以用很小的数据集来测,比如train 5个batch就算一个epoch,然后就做eval,这样测起来快。 |
|
我在
prune_by_class.py
程序中测试了一下剪枝前后模型预测结果的差别,发现结果差距很大,这种现象正常吗,还是说我的理解有问题。代码如下The text was updated successfully, but these errors were encountered: