Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenMP Conflicts #26

Closed
tskisner opened this issue Mar 14, 2024 · 4 comments
Closed

OpenMP Conflicts #26

tskisner opened this issue Mar 14, 2024 · 4 comments

Comments

@tskisner
Copy link
Member

In our environments, we build packages like so3g and toast which have extensions linking to OpenMP libraries, both directly and indirectly through a dependency on OpenBLAS. We pin the runtime versions of BLAS / LAPACK to use the OpenMP "flavor" of OpenBLAS. From a newly created environment, running ldd on the compiled extensions shows consistent linking of these compiled extensions against the same versions of OpenBLAS used by SciPy.

Despite this, at least on some systems, doing import scipy; import toast triggers a segfault, and this segfault occurs in the toast compiled extension at the first call to omp_get_num_threads(). Reversing the import order does not segfault, but obviously there is a concern that there could be a silent error in later scipy calls in this case.

Documenting some more aspects:

  • Installing both scipy and toast as wheels avoids this particular problem, since both packages vendor (link and ship) their own versions of libopenblas and their own version of openmp libraries. However, this case puts us back in the original problem where thread affinity can be broken and all threads will oversubscribe one process.
  • Some months ago, the conda-forge maintainers implemented a system where the LLVM OpenMP library is made ABI compatible with libgomp and is used at runtime even in cases where a given package is built using libgomp. In theory this should be fine, but perhaps that is an interesting avenue to pursue.
  • It is interesting that this same behavior is not seen with so3g. Since the scipy package contains numerous extensions, the problem may be triggered by loading a specific scipy submodule with a compiled extension linking to OpenMP (directly or indirectly). For example, the signal submodule.

The purpose of this issue is to track notes and ideas while exploring options to fix this.

@tskisner
Copy link
Member Author

Although the toast workflows in sotodlib do not hit this issue, the preprocess_tod.py file in site pipeline does. That gives a clue, since it specifically imports scipy.signal. This is the next place to look for threading model collisions. There are many compiled extensions in scipy, so it was challenging to find the source of the problem.

@tskisner
Copy link
Member Author

Another data point: if I install toast using the conda compilers and conda dependencies outside of the conda-bld environment (just with a regular environment loaded with all those tools, using the normal cmake build), then there is no segfault. This points to something about the build environment being incompatible, and is the next thing to investigate.

@tskisner
Copy link
Member Author

tskisner commented Apr 4, 2024

I have tracked this down to a single shared library (libarcher.so) installed by the libactpol package. Using the changes in #31, If I install all packages in soconda and do:

python -c 'import scipy; import toast'

(or for example, import the site pipeline), then I get a segfault. If I remove the libactpol package, then no segfault. If I reinstall libactpol and then manually delete every other library and exectutable installed by that package, I still get the segfault. When I manually delete that libarcher.so library, everything works. I think this is actually an LLVM utility that is being picked up and bundled during the libactpol package build. Upon installation, it seems to be overwriting an existing version of libarcher.so. All other packages now build without further problems in #31, and now it is a matter of figuring out why this is getting bundled and preventing that.

@tskisner
Copy link
Member Author

tskisner commented Apr 4, 2024

Fixed by #31

@tskisner tskisner closed this as completed Apr 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant