Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mxnet.base.MXNetError: [16:31:18] src/operator/nn/./cudnn/cudnn_convolution-inl.h:449: Check failed: e == CUDNN_STATUS_SUCCESS (3 vs. 0) cuDNN: CUDNN_STATUS_BAD_PARAM #147

Open
CPFelix opened this issue Dec 27, 2018 · 1 comment

Comments

@CPFelix
Copy link

CPFelix commented Dec 27, 2018

I make the mxnet and when I runned the FCIS,I got the following problem,and I suppose it's because cudnn version.cuda8.0 cudnn6
loading annotations into memory...
Done (t=0.52s)
creating index...
index created!
num_images 6048
prepare gt_sdsdb using 1.19558501244 seconds
generate cache_seg_inst using 0.203454017639 seconds
append flipped images to roidb
filtered 0 roidb entries: 12096 -> 12096
providing maximum shape [('data', (1, 3, 756, 1008)), ('gt_boxes', (1, 100, 5)), ('gt_masks', (1, 100, 756, 1008))] [('proposal_label', (1, 45360)), ('proposal_bbox_target', (1, 60, 48, 63)), ('proposal_bbox_weight', (1, 60, 48, 63))]
data shape:
{'data': (1L, 3L, 756L, 1008L),
'gt_boxes': (1L, 9L, 5L),
'gt_masks': (9L, 756L, 1008L),
'im_info': (1L, 3L),
'proposal_bbox_target': (1L, 60L, 48L, 63L),
'proposal_bbox_weight': (1L, 60L, 48L, 63L),
'proposal_label': (1L, 45360L)}
lr 0.0001 lr_epoch_diff [5.0] lr_iters [60480]
Traceback (most recent call last):
File "experiments/fcis/fcis_end2end_train_test.py", line 13, in
train_end2end.main()
File "experiments/fcis/../../fcis/train_end2end.py", line 187, in main
config.TRAIN.lr, config.TRAIN.lr_step)
File "experiments/fcis/../../fcis/train_end2end.py", line 179, in train_net
arg_params=arg_params, aux_params=aux_params, begin_epoch=begin_epoch, num_epoch=end_epoch)
File "experiments/fcis/../../fcis/core/module.py", line 961, in fit
self.update_metric(eval_metric, data_batch.label)
File "experiments/fcis/../../fcis/core/module.py", line 1055, in update_metric
self._curr_module.update_metric(eval_metric, labels)
File "experiments/fcis/../../fcis/core/module.py", line 669, in update_metric
self._exec_group.update_metric(eval_metric, labels)
File "experiments/fcis/../../fcis/core/DataParallelExecutorGroup.py", line 529, in update_metric
eval_metric.update(labels, texec.outputs)
File "/home/scau2/anaconda2/lib/python2.7/site-packages/mxnet/metric.py", line 318, in update
metric.update(labels, preds)
File "experiments/fcis/../../fcis/core/metric.py", line 45, in update
pred_label = mx.ndarray.argmax_channel(pred).asnumpy().astype('int32')
File "/home/scau2/anaconda2/lib/python2.7/site-packages/mxnet/ndarray/ndarray.py", line 1980, in asnumpy
ctypes.c_size_t(data.size)))
File "/home/scau2/anaconda2/lib/python2.7/site-packages/mxnet/base.py", line 252, in check_call
raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [16:31:18] src/operator/nn/./cudnn/cudnn_convolution-inl.h:449: Check failed: e == CUDNN_STATUS_SUCCESS (3 vs. 0) cuDNN: CUDNN_STATUS_BAD_PARAM

Stack trace returned 10 entries:
[bt] (0) /home/scau2/anaconda2/lib/python2.7/site-packages/mxnet/libmxnet.so(dmlc::StackTraceabi:cxx11+0x5b) [0x7fd55ea8c2bb]
[bt] (1) /home/scau2/anaconda2/lib/python2.7/site-packages/mxnet/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x28) [0x7fd55ea8ce28]
[bt] (2) /home/scau2/anaconda2/lib/python2.7/site-packages/mxnet/libmxnet.so(mxnet::op::CuDNNConvolutionOp::InitDescriptors(std::vector<nnvm::TShape, std::allocatornnvm::TShape > const&, std::vector<nnvm::TShape, std::allocatornnvm::TShape > const&, cudnnDataType_t, cudnnDataType_t)+0x149a) [0x7fd56393523a]
[bt] (3) /home/scau2/anaconda2/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x5bd0d7b) [0x7fd563912d7b]
[bt] (4) /home/scau2/anaconda2/lib/python2.7/site-packages/mxnet/libmxnet.so(void mxnet::op::ConvolutionComputemshadow::gpu(nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::TBlob, std::allocatormxnet::TBlob > const&, std::vector<mxnet::OpReqType, std::allocatormxnet::OpReqType > const&, std::vector<mxnet::TBlob, std::allocatormxnet::TBlob > const&)+0xe30) [0x7fd563917910]
[bt] (5) /home/scau2/anaconda2/lib/python2.7/site-packages/mxnet/libmxnet.so(mxnet::exec::FComputeExecutor::Run(mxnet::RunContext, bool)+0x59) [0x7fd561caa239]
[bt] (6) /home/scau2/anaconda2/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x3f32f60) [0x7fd561c74f60]
[bt] (7) /home/scau2/anaconda2/lib/python2.7/site-packages/mxnet/libmxnet.so(mxnet::engine::ThreadedEngine::ExecuteOprBlock(mxnet::RunContext, mxnet::engine::OprBlock*)+0x8f5) [0x7fd561bc39c5]
[bt] (8) /home/scau2/anaconda2/lib/python2.7/site-packages/mxnet/libmxnet.so(void mxnet::engine::ThreadedEnginePerDevice::GPUWorker<(dmlc::ConcurrentQueueType)0>(mxnet::Context, bool, mxnet::engine::ThreadedEnginePerDevice::ThreadWorkerBlock<(dmlc::ConcurrentQueueType)0>, std::shared_ptrdmlc::ManualEvent const&)+0xeb) [0x7fd561bda10b]
[bt] (9) /home/scau2/anaconda2/lib/python2.7/site-packages/mxnet/libmxnet.so(std::_Function_handler<void (std::shared_ptrdmlc::ManualEvent), mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock
, bool)::{lambda()#4}::operator()() const::{lambda(std::shared_ptrdmlc::ManualEvent)#1}>::_M_invoke(std::_Any_data const&, std::shared_ptrdmlc::ManualEvent&&)+0x4e) [0x7fd561bda37e]

@ccxietao
Copy link

ccxietao commented Jan 4, 2019

你好,我还是无法解决map=0的原因,我看您也遇到过同样的问题,请教您是怎么解决的,调节超参数应该怎么调节呢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants