error: 'hipErrorMemoryAllocation'(1002) at src/caffe/syncedmem.cpp:56 #15

xxgtxx · 2017-09-14T11:36:50Z

Issue summary

Hello everyone,
I get a memory error when benchmarking execution time with hipCaffe.
This only occurs with large input data: shape: { dim: 10 dim: 3 dim: 1024 dim: 2048 }.
However, there is no error with the default input data size: shape: { dim: 10 dim: 3 dim: 224 dim: 224 }

Error

I0914 12:29:28.475584 218425 net.cpp:228] conv2/3x3 does not need backward computation.
I0914 12:29:28.475591 218425 net.cpp:228] conv2/relu_3x3_reduce does not need backward computation.
I0914 12:29:28.475597 218425 net.cpp:228] conv2/3x3_reduce does not need backward computation.
I0914 12:29:28.475603 218425 net.cpp:228] pool1/norm1 does not need backward computation.
I0914 12:29:28.475610 218425 net.cpp:228] pool1/3x3_s2 does not need backward computation.
I0914 12:29:28.475616 218425 net.cpp:228] conv1/relu_7x7 does not need backward computation.
I0914 12:29:28.475622 218425 net.cpp:228] conv1/7x7_s2 does not need backward computation.
I0914 12:29:28.475628 218425 net.cpp:228] data does not need backward computation.
I0914 12:29:28.475632 218425 net.cpp:270] This network produces output prob
I0914 12:29:28.475728 218425 net.cpp:283] Network initialization done.
I0914 12:29:28.476884 218425 caffe.cpp:355] Performing Forward
I0914 12:30:55.425948 218425 caffe.cpp:360] Initial loss: 0
I0914 12:30:55.426497 218425 caffe.cpp:361] Performing Backward
I0914 12:30:55.426555 218425 caffe.cpp:369] *** Benchmark begins ***
I0914 12:30:55.426565 218425 caffe.cpp:370] Testing for 2 iterations.
error: 'hipErrorMemoryAllocation'(1002) at src/caffe/syncedmem.cpp:56

Steps to reproduce

hipCaffe with Makefile.config parameters:
USE_MIOPEN := 1
USE_ROCBLAS := 1

Change data input size to:
layer {
name: "data"
type: "Input"
top: "data"
input_param { shape: { dim: 10 dim: 3 dim: 1024 dim: 2048 } }

Then execute the network:
/home/intel/hipCaffe/build/tools/caffe time -gpu 0 -iterations 2 -model /home/intel/hipCaffe/models/bvlc_googlenet/deploy.prototxt

Your system configuration

Operating system: Ubuntu 16.04
Kernel: 4.11.0-kfd-compute-rocm-rel-1.6-148
CPU: Intel Skylake
GPU: AMD Radeon Vega Frontier Edition @ 16GB

The text was updated successfully, but these errors were encountered:

parallelo · 2017-09-20T17:15:38Z

@xxgtxx - Thanks for the error report. Looks like it is running out of memory due to the chosen larger data size. Can you please try dropping the batch size (specifically this parameter: dim: 10) and see if that helps.

xxgtxx · 2017-09-21T05:55:32Z

Hey, I'm also seeing this error with batch size 1. { dim: 1 dim: 3 dim: 1024 dim: 2048 }.

parallelo · 2017-09-22T22:31:23Z

Based on a standard bvlc/caffe GoogLeNet test I did today on another HW vendor's platform, it didn't look like your specific configuration fits into that device's (quite large) memory either. Have you observed something different?

dhzhd1 · 2017-10-31T18:33:25Z

Can't agree more with parallelo. When the training image is too large to allocate the GPU memory, several ways you can do, 1) reduce the batch size (seems you already tried), 2) resize the image to a small dimension, 3) if the object feature will vanish after resizing the image, you need to crop the image to small ones. I think that is data pre-processing related issue, not directly issue on caffe.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

error: 'hipErrorMemoryAllocation'(1002) at src/caffe/syncedmem.cpp:56 #15

error: 'hipErrorMemoryAllocation'(1002) at src/caffe/syncedmem.cpp:56 #15

xxgtxx commented Sep 14, 2017

parallelo commented Sep 20, 2017

xxgtxx commented Sep 21, 2017

parallelo commented Sep 22, 2017

dhzhd1 commented Oct 31, 2017

error: 'hipErrorMemoryAllocation'(1002) at src/caffe/syncedmem.cpp:56 #15

error: 'hipErrorMemoryAllocation'(1002) at src/caffe/syncedmem.cpp:56 #15

Comments

xxgtxx commented Sep 14, 2017

Issue summary

Error

Steps to reproduce

Your system configuration

parallelo commented Sep 20, 2017

xxgtxx commented Sep 21, 2017

parallelo commented Sep 22, 2017

dhzhd1 commented Oct 31, 2017