Skip to content
This repository has been archived by the owner on May 3, 2024. It is now read-only.

./build/test/test_all.testbin drops core #50

Open
emerth opened this issue Jan 16, 2019 · 0 comments
Open

./build/test/test_all.testbin drops core #50

emerth opened this issue Jan 16, 2019 · 0 comments

Comments

@emerth
Copy link

emerth commented Jan 16, 2019

Issue summary

I'm reposting this after closing my original issue because I am now quite confident the install was completely canonical.

Running ./build/test/test_all.testbin ultimately drops core. It drops core same place with either RX470 or RX Vega 64.

A few tests fail but at a certain point it always drops core.

Steps to reproduce

  • Clean install Ubuntu 18.04.1 LTS Server.
  • Use Ubuntu's stock kernels provided by only the apt-get update / upgrade mechanism (ie the kernel in use is that provided by Ubuntu after apt-get dist-upgrade: I have not installed an upstream kernel.
  • Install ROCm 2.0 & hipCaffe per the hipCaffe instructions.
  • Run several of the examples without error.
  • Run ./build/test/test_all.testbin... drops core.

Problem:

...
[ RUN      ] NetTest/2.TestForcePropagateDown
[       OK ] NetTest/2.TestForcePropagateDown (2 ms)
[ RUN      ] NetTest/2.TestAllInOneNetTrain
[       OK ] NetTest/2.TestAllInOneNetTrain (3 ms)
[ RUN      ] NetTest/2.TestAllInOneNetVal
[       OK ] NetTest/2.TestAllInOneNetVal (4 ms)
[ RUN      ] NetTest/2.TestAllInOneNetDeploy
[       OK ] NetTest/2.TestAllInOneNetDeploy (1 ms)
[----------] 26 tests from NetTest/2 (772 ms total)

[----------] 26 tests from NetTest/3, where TypeParam = caffe::GPUDevice<double>
[ RUN      ] NetTest/3.TestHasBlob
[       OK ] NetTest/3.TestHasBlob (4 ms)
[ RUN      ] NetTest/3.TestGetBlob
[       OK ] NetTest/3.TestGetBlob (4 ms)
[ RUN      ] NetTest/3.TestHasLayer
[       OK ] NetTest/3.TestHasLayer (4 ms)
[ RUN      ] NetTest/3.TestGetLayerByName
[       OK ] NetTest/3.TestGetLayerByName (4 ms)
[ RUN      ] NetTest/3.TestBottomNeedBackward
[       OK ] NetTest/3.TestBottomNeedBackward (4 ms)
[ RUN      ] NetTest/3.TestBottomNeedBackwardForce
[       OK ] NetTest/3.TestBottomNeedBackwardForce (4 ms)
[ RUN      ] NetTest/3.TestBottomNeedBackwardEuclideanForce
[       OK ] NetTest/3.TestBottomNeedBackwardEuclideanForce (1 ms)
[ RUN      ] NetTest/3.TestBottomNeedBackwardTricky
[       OK ] NetTest/3.TestBottomNeedBackwardTricky (5 ms)
[ RUN      ] NetTest/3.TestLossWeight
[       OK ] NetTest/3.TestLossWeight (21 ms)
[ RUN      ] NetTest/3.TestLossWeightMidNet
[       OK ] NetTest/3.TestLossWeightMidNet (16 ms)
[ RUN      ] NetTest/3.TestComboLossWeight
[       OK ] NetTest/3.TestComboLossWeight (18 ms)
[ RUN      ] NetTest/3.TestBackwardWithAccuracyLayer
MIOpen Error: /home/dlowell/MIOpenPrivate/src/ocl/softmaxocl.cpp:59: Only alpha=1 and beta=0 is supported
F0116 04:57:34.752313 24321 cudnn_softmax_layer_hip.cpp:27] Check failed: status == miopenStatusSuccess (7 vs. 0)  miopenStatusUnknownError
*** Check failure stack trace: ***
    @     0x7f2ab9f720cd  google::LogMessage::Fail()
    @     0x7f2ab9f73f33  google::LogMessage::SendToLog()
    @     0x7f2ab9f71c28  google::LogMessage::Flush()
    @     0x7f2ab9f74999  google::LogMessageFatal::~LogMessageFatal()
    @          0x15364ce  caffe::CuDNNSoftmaxLayer<>::Forward_gpu()
    @           0x4cb540  caffe::Layer<>::Forward()
    @          0x1ea2073  caffe::SoftmaxWithLossLayer<>::Forward_gpu()
    @           0x4cb540  caffe::Layer<>::Forward()
    @          0x1b459d7  caffe::Net<>::ForwardFromTo()
    @          0x1b458f0  caffe::Net<>::Forward()
    @           0x967796  caffe::NetTest_TestBackwardWithAccuracyLayer_Test<>::TestBody()
    @          0x108be34  testing::internal::HandleExceptionsInMethodIfSupported<>()
    @          0x108bcf6  testing::Test::Run()
    @          0x108ceb1  testing::TestInfo::Run()
    @          0x108d5c7  testing::TestCase::Run()
    @          0x1093967  testing::internal::UnitTestImpl::RunAllTests()
    @          0x10933a4  testing::internal::HandleExceptionsInMethodIfSupported<>()
    @          0x1093359  testing::UnitTest::Run()
    @          0x201545a  main
    @     0x7f2ab4b00b97  __libc_start_main
    @          0x20148fa  _start
Aborted (core dumped)

Your system configuration

I can provide what ever info you need, just tell me what you want.

Operating system: Ubuntu 18.04.1 LTS
Compiler: gcc (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0
CUDA version (if applicable): N/A
CUDNN version (if applicable): N/A
BLAS: rocblas 2.0.0.0
Python or MATLAB version (for pycaffe and matcaffe respectively): Python 2.7.15rc1

Hardware:
RX Vega 64, or RX 470 4GB.
Ryzen 5 2600X
16 GB RAM
X470 mobo
SR-IOV is turned off
IOMMU enabled or disabled - same result.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant