Skip to content
This repository has been archived by the owner on May 3, 2024. It is now read-only.

error: 'hipErrorMemoryAllocation'(1002) at src/caffe/syncedmem.cpp:56 #15

Open
xxgtxx opened this issue Sep 14, 2017 · 4 comments
Open

Comments

@xxgtxx
Copy link

xxgtxx commented Sep 14, 2017

Issue summary

Hello everyone,
I get a memory error when benchmarking execution time with hipCaffe.
This only occurs with large input data: shape: { dim: 10 dim: 3 dim: 1024 dim: 2048 }.
However, there is no error with the default input data size: shape: { dim: 10 dim: 3 dim: 224 dim: 224 }

Error

I0914 12:29:28.475584 218425 net.cpp:228] conv2/3x3 does not need backward computation.
I0914 12:29:28.475591 218425 net.cpp:228] conv2/relu_3x3_reduce does not need backward computation.
I0914 12:29:28.475597 218425 net.cpp:228] conv2/3x3_reduce does not need backward computation.
I0914 12:29:28.475603 218425 net.cpp:228] pool1/norm1 does not need backward computation.
I0914 12:29:28.475610 218425 net.cpp:228] pool1/3x3_s2 does not need backward computation.
I0914 12:29:28.475616 218425 net.cpp:228] conv1/relu_7x7 does not need backward computation.
I0914 12:29:28.475622 218425 net.cpp:228] conv1/7x7_s2 does not need backward computation.
I0914 12:29:28.475628 218425 net.cpp:228] data does not need backward computation.
I0914 12:29:28.475632 218425 net.cpp:270] This network produces output prob
I0914 12:29:28.475728 218425 net.cpp:283] Network initialization done.
I0914 12:29:28.476884 218425 caffe.cpp:355] Performing Forward
I0914 12:30:55.425948 218425 caffe.cpp:360] Initial loss: 0
I0914 12:30:55.426497 218425 caffe.cpp:361] Performing Backward
I0914 12:30:55.426555 218425 caffe.cpp:369] *** Benchmark begins ***
I0914 12:30:55.426565 218425 caffe.cpp:370] Testing for 2 iterations.
error: 'hipErrorMemoryAllocation'(1002) at src/caffe/syncedmem.cpp:56

Steps to reproduce

hipCaffe with Makefile.config parameters:
USE_MIOPEN := 1
USE_ROCBLAS := 1

Change data input size to:
layer {
name: "data"
type: "Input"
top: "data"
input_param { shape: { dim: 10 dim: 3 dim: 1024 dim: 2048 } }

Then execute the network:
/home/intel/hipCaffe/build/tools/caffe time -gpu 0 -iterations 2 -model /home/intel/hipCaffe/models/bvlc_googlenet/deploy.prototxt

Your system configuration

Operating system: Ubuntu 16.04
Kernel: 4.11.0-kfd-compute-rocm-rel-1.6-148
CPU: Intel Skylake
GPU: AMD Radeon Vega Frontier Edition @ 16GB

@parallelo
Copy link
Contributor

@xxgtxx - Thanks for the error report. Looks like it is running out of memory due to the chosen larger data size. Can you please try dropping the batch size (specifically this parameter: dim: 10) and see if that helps.

@xxgtxx
Copy link
Author

xxgtxx commented Sep 21, 2017

Hey, I'm also seeing this error with batch size 1. { dim: 1 dim: 3 dim: 1024 dim: 2048 }.

@parallelo
Copy link
Contributor

Based on a standard bvlc/caffe GoogLeNet test I did today on another HW vendor's platform, it didn't look like your specific configuration fits into that device's (quite large) memory either. Have you observed something different?

@dhzhd1
Copy link

dhzhd1 commented Oct 31, 2017

Can't agree more with parallelo. When the training image is too large to allocate the GPU memory, several ways you can do, 1) reduce the batch size (seems you already tried), 2) resize the image to a small dimension, 3) if the object feature will vanish after resizing the image, you need to crop the image to small ones. I think that is data pre-processing related issue, not directly issue on caffe.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants