Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python: Illegal instruction (core dumped) #81

Open
gcgloven opened this issue May 18, 2021 · 5 comments
Open

Python: Illegal instruction (core dumped) #81

gcgloven opened this issue May 18, 2021 · 5 comments

Comments

@gcgloven
Copy link

gcgloven commented May 18, 2021

System: Freshly installed Ubuntu 18.04 with required packages installed
Python: python3.6.9
cmake: cmake version 3.10.2
g++/gcc: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0

Successfully make IRIS package and obtained iris_wrapper.cpython-36m-x86_64-linux-gnu.so from iris-distro/build/install/lib/python3.6/dist-packages/irispy

Running Test Code

python -m irispy.test.test_iris_2d
## The Response
Loading cdd global constants
Illegal instruction (core dumped)

I have put print statements in the built irispy folder to find which segment of the code incurred this error.

This happens when c_inflate_region function was called this is imported as such:

from .iris_wrapper import inflate_region as c_inflate_region
## some code and function
 if return_debug_data:
        debug = IRISDebugData()
        region = c_inflate_region(problem, options, debug)
        return region, debug
    else:
        region = c_inflate_region(problem, options)
        return region

I believe this is an issue with the compiled file iris_wrapper.cpython-36m-x86_64-linux-gnu.so

How could I proceed with this? I have re-installed my Ubuntu and to just build IRIS from the clean OS, the same error persist.

Thank you.

@gcgloven
Copy link
Author

gcgloven commented May 19, 2021

I have tried to make with python3.7 the same error persist.
Here is the python debug:
I see it addressing to iris-distro/build/install/libmosek64.so
image

For some reasons, now I suspect is an CPU issue as I read into SIGILL fault

What is SIGILL?

The SIGILL signal is raised when an attempt is made to execute an invalid, privileged, or ill-formed instruction. SIGILL is usually caused by a program error that overlays code with data or by a call to a function that is not linked into the program load module.

After further digging into the problem. I have exported the core from running the python program and use gdb iris_wrapper.cpython-37m-x86_64-linux-gnu.so core

and I obtained this from my terminal.

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `python -m irispy.test.test_iris_2d'.
Program terminated with signal SIGILL, Illegal instruction.
#0  0x00007f528b5eeb12 in mkl_blas_def_dgemm_kernel_bdz () from /home/gloven/supports/iris-distro/build/install/lib/libmosek64.so.7.1
[Current thread is 1 (Thread 0x7f52ba0ad740 (LWP 1645))]

So now I understand that this is a key issue with
mkl_blas_def_dgemm_kernel_bdz

@gcgloven
Copy link
Author

gcgloven commented May 19, 2021

Conclusion (The story ends here) SOLVED!

With many struggles and painful searching, the story periods at the junction where a hardware optimisation on Intel's Kernel Math Library was not optimised with AMD CPU for Mosek 9.0

I built the program finally with my Intel CPU on another device, while on my home setup: AMD® Ryzen 7 3700x 8-core processor × 16 the conflict remains.

TLDR: IRIS-DISTRO works on Intel CPU, not AMD CPU. Hence you need to install some compatibility and optimisation packages

SOLUTION

Why this happen?
https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/illegal-instruction-error-with-newer-AMD-CPU/td-p/1147393
AMD Package
install https://developer.amd.com/amd-aocl/
Intel Package
refer to this and add KML package to your ubuntu
https://github.com/eddelbuettel/mkl4deb

Add this to your bash

export MKL_DEBUG_CPU_TYPE=5

reboot and built this program

@reso1
Copy link

reso1 commented Apr 18, 2022

Conclusion (The story ends here) SOLVED!

With many struggles and painful searching, the story periods at the junction where a hardware optimisation on Intel's Kernel Math Library was not optimised with AMD CPU for Mosek 9.0

I built the program finally with my Intel CPU on another device, while on my home setup: AMD® Ryzen 7 3700x 8-core processor × 16 the conflict remains.

TLDR: IRIS-DISTRO works on Intel CPU, not AMD CPU. Hence you need to install some compatibility and optimisation packages

SOLUTION

Why this happen? https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/illegal-instruction-error-with-newer-AMD-CPU/td-p/1147393 AMD Package install https://developer.amd.com/amd-aocl/ Intel Package refer to this and add KML package to your ubuntu https://github.com/eddelbuettel/mkl4deb

Add this to your bash export MKL_DEBUG_CPU_TYPE=5

reboot and built this program

Still have this fault after installed the amd-aocl package on my Ryzen Threadripper 3990X PC, can you please provide more detalis about this work-around? Thanks a lot !

@gcgloven
Copy link
Author

If you are running the test python file and face such an error,
you may try using GDB program to debug the compiled .so file

gdb iris_wrapper.cpython-37m-x86_64-linux-gnu.so core

You may then read what is the actual issue with the compiled .so program.

It's quite a while back, I believe the above cmd line is what I used eventually to discover the issue with my compiled program.

@reso1
Copy link

reso1 commented Apr 19, 2022

It seems that I forgot to export MKL_DEBUG_CPU_TYPE=5 after installing amd aocl package, anyway it works now. Thanks :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants