Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault in Spack build of trilinos-for-albany with intel #8

Open
xylar opened this issue Nov 11, 2022 · 25 comments
Open

Segfault in Spack build of trilinos-for-albany with intel #8

xylar opened this issue Nov 11, 2022 · 25 comments
Assignees

Comments

@xylar
Copy link
Collaborator

xylar commented Nov 11, 2022

This is not a new issue but one I want to revisit. When I try to build trilinos-for-albany on Chrysalis with intel and OpenMPI, I see:

A long error message with a segfault
cd /lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-trilinos-for-albany-develop-we2yuzibwzgdeomqm5qiqy7ndkisjkpj/spack-build-we2yuzi/packages/belos/epetra/src && /gpfs/fs1/soft/chrysalis/spack/opt/spack/linux-centos8-x86_64/gcc-9.3.0/cmake-3.19.1-yisciec/bin/cmake -E cmake_link_script CMakeFiles/belosepetra.dir/link.txt --verbose=1
/gpfs/fs1/soft/chrysalis/spack/opt/spack/linux-centos8-x86_64/intel-20.0.4/openmpi-4.1.3-pin4k7o/bin/mpic++ -fPIC -O2 -g -DNDEBUG -shared -Wl,-soname,libbelosepetra.so.13 -o libbelosepetra.so.13.5 CMakeFiles/belosepetra.dir/BelosEpetraAdapter.cpp.o CMakeFiles/belosepetra.dir/BelosEpetraOperator.cpp.o CMakeFiles/belosepetra.dir/BelosEpetraUtils.cpp.o CMakeFiles/belosepetra.dir/Belos_Details_Epetra_registerLinearSolverFactory.cpp.o CMakeFiles/belosepetra.dir/Belos_Details_Epetra_registerSolverFactory.cpp.o  -Wl,-rpath,/lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-trilinos-for-albany-develop-we2yuzibwzgdeomqm5qiqy7ndkisjkpj/spack-build-we2yuzi/packages/belos/src:/lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-trilinos-for-albany-develop-we2yuzibwzgdeomqm5qiqy7ndkisjkpj/spack-build-we2yuzi/packages/xpetra/sup:/lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-trilinos-for-albany-develop-we2yuzibwzgdeomqm5qiqy7ndkisjkpj/spack-build-we2yuzi/packages/xpetra/src:/lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-trilinos-for-albany-develop-we2yuzibwzgdeomqm5qiqy7ndkisjkpj/spack-build-we2yuzi/packages/thyra/adapters/epetraext/src:/lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-trilinos-for-albany-develop-we2yuzibwzgdeomqm5qiqy7ndkisjkpj/spack-build-we2yuzi/packages/epetraext/src:/gpfs/fs1/soft/chrysalis/spack/opt/spack/linux-centos8-x86_64/intel-20.0.4/hdf5-1.10.7-eewgp6v/lib:/lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-trilinos-for-albany-develop-we2yuzibwzgdeomqm5qiqy7ndkisjkpj/spack-build-we2yuzi/packages/thyra/adapters/tpetra/src:/lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-trilinos-for-albany-develop-we2yuzibwzgdeomqm5qiqy7ndkisjkpj/spack-build-we2yuzi/packages/tpetra/core/ext:/lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-trilinos-for-albany-develop-we2yuzibwzgdeomqm5qiqy7ndkisjkpj/spack-build-we2yuzi/packages/tpetra/core/inout:/lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-trilinos-for-albany-develop-we2yuzibwzgdeomqm5qiqy7ndkisjkpj/spack-build-we2yuzi/packages/tpetra/core/src:/lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-trilinos-for-albany-develop-we2yuzibwzgdeomqm5qiqy7ndkisjkpj/spack-build-we2yuzi/packages/tpetra/core/compat:/lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-trilinos-for-albany-develop-we2yuzibwzgdeomqm5qiqy7ndkisjkpj/spack-build-we2yuzi/packages/tpetra/tsqr/src:/lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-trilinos-for-albany-develop-we2yuzibwzgdeomqm5qiqy7ndkisjkpj/spack-build-we2yuzi/packages/thyra/adapters/epetra/src:/lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-trilinos-for-albany-develop-we2yuzibwzgdeomqm5qiqy7ndkisjkpj/spack-build-we2yuzi/packages/thyra/core/src:/lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-trilinos-for-albany-develop-we2yuzibwzgdeomqm5qiqy7ndkisjkpj/spack-build-we2yuzi/packages/rtop/src:/lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-trilinos-for-albany-develop-we2yuzibwzgdeomqm5qiqy7ndkisjkpj/spack-build-we2yuzi/packages/aztecoo/src:/lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-trilinos-for-albany-develop-we2yuzibwzgdeomqm5qiqy7ndkisjkpj/spack-build-we2yuzi/packages/triutils/src:/lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-trilinos-for-albany-develop-we2yuzibwzgdeomqm5qiqy7ndkisjkpj/spack-build-we2yuzi/packages/epetra/src:/lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-trilinos-for-albany-develop-we2yuzibwzgdeomqm5qiqy7ndkisjkpj/spack-build-we2yuzi/packages/teuchos/numerics/src:/lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-trilinos-for-albany-develop-we2yuzibwzgdeomqm5qiqy7ndkisjkpj/spack-build-we2yuzi/packages/teuchos/remainder/src:/lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-trilinos-for-albany-develop-we2yuzibwzgdeomqm5qiqy7ndkisjkpj/spack-build-we2yuzi/packages/teuchos/kokkoscomm/src:/lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-trilinos-for-albany-develop-we2yuzibwzgdeomqm5qiqy7ndkisjkpj/spack-build-we2yuzi/packages/teuchos/comm/src:/lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-trilinos-for-albany-develop-we2yuzibwzgdeomqm5qiqy7ndkisjkpj/spack-build-we2yuzi/packages/teuchos/kokkoscompat/src:/lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-trilinos-for-albany-develop-we2yuzibwzgdeomqm5qiqy7ndkisjkpj/spack-build-we2yuzi/packages/teuchos/parameterlist/src:/lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-trilinos-for-albany-develop-we2yuzibwzgdeomqm5qiqy7ndkisjkpj/spack-build-we2yuzi/packages/teuchos/parser/src:/lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-trilinos-for-albany-develop-we2yuzibwzgdeomqm5qiqy7ndkisjkpj/spack-build-we2yuzi/packages/teuchos/core/src:/lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-trilinos-for-albany-develop-we2yuzibwzgdeomqm5qiqy7ndkisjkpj/spack-build-we2yuzi/packages/kokkos-kernels/src:/lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-trilinos-for-albany-develop-we2yuzibwzgdeomqm5qiqy7ndkisjkpj/spack-build-we2yuzi/packages/kokkos/algorithms/src:/lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-trilinos-for-albany-develop-we2yuzibwzgdeomqm5qiqy7ndkisjkpj/spack-build-we2yuzi/packages/kokkos/containers/src:/lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-trilinos-for-albany-develop-we2yuzibwzgdeomqm5qiqy7ndkisjkpj/spack-build-we2yuzi/packages/kokkos/core/src:/lcrc/soft/climate/compass/chrysalis/spack/spack_for_mache_1.8.0/opt/spack/linux-rhel8-zen2/intel-20.0.4/metis-5.1.0-fvpnjgznlef67rs2jblxnjoxjaue2iyj/lib: ../../src/libbelos.so.13.5 ../../../xpetra/sup/libxpetra-sup.so.13.5 ../../../xpetra/src/libxpetra.so.13.5 ../../../thyra/adapters/epetraext/src/libthyraepetraext.so.13.5 ../../../epetraext/src/libepetraext.so.13.5 /gpfs/fs1/soft/chrysalis/spack/opt/spack/linux-centos8-x86_64/intel-20.0.4/hdf5-1.10.7-eewgp6v/lib/libhdf5.so /usr/lib64/libz.so /gpfs/fs1/soft/chrysalis/spack/opt/spack/linux-centos8-x86_64/intel-20.0.4/hdf5-1.10.7-eewgp6v/lib/libhdf5_hl.so ../../../thyra/adapters/tpetra/src/libthyratpetra.so.13.5 ../../../tpetra/core/ext/libtpetraext.so.13.5 ../../../tpetra/core/inout/libtpetrainout.so.13.5 ../../../tpetra/core/src/libtpetra.so.13.5 ../../../tpetra/core/compat/libtpetraclassic.so.13.5 ../../../tpetra/tsqr/src/libkokkostsqr.so.13.5 ../../../thyra/adapters/epetra/src/libthyraepetra.so.13.5 ../../../thyra/core/src/libthyracore.so.13.5 ../../../rtop/src/librtop.so.13.5 ../../../aztecoo/src/libaztecoo.so.13.5 ../../../triutils/src/libtriutils.so.13.5 ../../../epetra/src/libepetra.so.13.5 ../../../teuchos/numerics/src/libteuchosnumerics.so.13.5 ../../../teuchos/remainder/src/libteuchosremainder.so.13.5 ../../../teuchos/kokkoscomm/src/libteuchoskokkoscomm.so.13.5 ../../../teuchos/comm/src/libteuchoscomm.so.13.5 ../../../teuchos/kokkoscompat/src/libteuchoskokkoscompat.so.13.5 ../../../teuchos/parameterlist/src/libteuchosparameterlist.so.13.5 ../../../teuchos/parser/src/libteuchosparser.so.13.5 ../../../teuchos/core/src/libteuchoscore.so.13.5 ../../../kokkos-kernels/src/libkokkoskernels.so.13.5 /gpfs/fs1/soft/chrysalis/spack/opt/spack/linux-centos8-x86_64/intel-20.0.4/intel-mkl-2020.4.304-g2qaxzf/compilers_and_libraries_2020.4.304/linux/mkl/lib/intel64/libmkl_intel_lp64.so /gpfs/fs1/soft/chrysalis/spack/opt/spack/linux-centos8-x86_64/intel-20.0.4/intel-mkl-2020.4.304-g2qaxzf/compilers_and_libraries_2020.4.304/linux/mkl/lib/intel64/libmkl_sequential.so /gpfs/fs1/soft/chrysalis/spack/opt/spack/linux-centos8-x86_64/intel-20.0.4/intel-mkl-2020.4.304-g2qaxzf/compilers_and_libraries_2020.4.304/linux/mkl/lib/intel64/libmkl_core.so /lib64/libpthread.so /lib64/libm.so /lib64/libdl.so /gpfs/fs1/soft/chrysalis/spack/opt/spack/linux-centos8-x86_64/intel-20.0.4/intel-mkl-2020.4.304-g2qaxzf/compilers_and_libraries_2020.4.304/linux/mkl/lib/intel64/libmkl_intel_lp64.so /gpfs/fs1/soft/chrysalis/spack/opt/spack/linux-centos8-x86_64/intel-20.0.4/intel-mkl-2020.4.304-g2qaxzf/compilers_and_libraries_2020.4.304/linux/mkl/lib/intel64/libmkl_sequential.so /gpfs/fs1/soft/chrysalis/spack/opt/spack/linux-centos8-x86_64/intel-20.0.4/intel-mkl-2020.4.304-g2qaxzf/compilers_and_libraries_2020.4.304/linux/mkl/lib/intel64/libmkl_core.so /lib64/libpthread.so /lib64/libm.so /lib64/libdl.so ../../../kokkos/algorithms/src/libkokkosalgorithms.so.13.5 ../../../kokkos/containers/src/libkokkoscontainers.so.13.5 ../../../kokkos/core/src/libkokkoscore.so.13.5 /usr/lib64/libdl.so /lcrc/soft/climate/compass/chrysalis/spack/spack_for_mache_1.8.0/opt/spack/linux-rhel8-zen2/intel-20.0.4/metis-5.1.0-fvpnjgznlef67rs2jblxnjoxjaue2iyj/lib/libmetis.so

          ": internal error: ** The compiler has encountered an unexpected problem.
** Segmentation violation signal raised. **
Access violation or stack overflow. Please contact Intel Support for assistance.

icpc: error #10105: /gpfs/fs1/soft/chrysalis/spack/opt/spack/linux-centos8-x86_64/gcc-9.3.0/intel-20.0.4-kodw73g/compilers_and_libraries_2020.4.304/linux/bin/intel64/mcpcom: core dumped
icpc: warning #10102: unknown signal(1415383120)
icpc: error #10106: Fatal error in /gpfs/fs1/soft/chrysalis/spack/opt/spack/linux-centos8-x86_64/gcc-9.3.0/intel-20.0.4-kodw73g/compilers_and_libraries_2020.4.304/linux/bin/intel64/mcpcom, terminated by unknown
icpc: error #10014: problem during multi-file optimization compilation (code 1)
make[2]: *** [packages/stk/stk_unit_test_utils/stk_unit_test_utils/stk_mesh_fixtures/CMakeFiles/stk_mesh_fixtures.dir/build.make:460: packages/stk/stk_unit_test_utils/stk_unit_test_utils/stk_mesh_fixtures/libstk_mesh_fixtures.so.13.5] Error 1
make[2]: Leaving directory '/lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-trilinos-for-albany-develop-we2yuzibwzgdeomqm5qiqy7ndkisjkpj/spack-build-we2yuzi'
make[1]: *** [CMakeFiles/Makefile2:14783: packages/stk/stk_unit_test_utils/stk_unit_test_utils/stk_mesh_fixtures/CMakeFiles/stk_mesh_fixtures.dir/all] Error 2
cd /lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-trilinos-for-albany-develop-we2yuzibwzgdeomqm5qiqy7ndkisjkpj/spack-build-we2yuzi/packages/belos/epetra/src && /gpfs/fs1/soft/chrysalis/spack/opt/spack/linux-centos8-x86_64/gcc-9.3.0/cmake-3.19.1-yisciec/bin/cmake -E cmake_symlink_library libbelosepetra.so.13.5 libbelosepetra.so.13 libbelosepetra.so
make[2]: Leaving directory '/lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-trilinos-for-albany-develop-we2yuzibwzgdeomqm5qiqy7ndkisjkpj/spack-build-we2yuzi'
[ 80%] Built target belosepetra
make[1]: Leaving directory '/lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-trilinos-for-albany-develop-we2yuzibwzgdeomqm5qiqy7ndkisjkpj/spack-build-we2yuzi'
make: *** [Makefile:174: all] Error 2

I believe this is the same issue I have seen previously with Intel on all machines I've tried.

It would be really great to get this resolved, since Intel and OpenMPI are the production compilers on Chrysalis for E3SM.

@xylar
Copy link
Collaborator Author

xylar commented Nov 11, 2022

@ikalash, @mperego, @bartgol, @jewatkins

I'm just reviving this issue in the hope that we can resolve it in the not-too-distant future. It's not super urgent, just bothersome.

@xylar
Copy link
Collaborator Author

xylar commented Nov 11, 2022

I see the same issue on Chrysalis when I use Intel and Intel-MPI (instead of OpenMPI).

Here is a gist with a shell script and yaml file that can be used to reproduce the issue:
https://gist.github.com/xylar/4fa8c2a7a54335068bedbb38a64d584c

@bartgol
Copy link

bartgol commented Nov 11, 2022

Ugh, ICEs are annoying, and often require seemingly ineffective changes (like split a cpp file in two, or remove a const), with lots of bisections iterations to identify the problematic code snippet.

That said, from the error msg, it appears it's trying to build the stk_unit_test_utils package, which I think we can do away with. Can you try to set Trilinos_ENABLE_STKUnit_test_utils:BOOL=OFF? This is always off in my trilinos builds, along with a few other stk sub-packages:

  -D Trilinos_ENABLE_STK:BOOL=ON                                  \
  -D Trilinos_ENABLE_STKDoc_tests:BOOL=OFF                        \
  -D Trilinos_ENABLE_STKIO:BOOL=ON                                \
  -D Trilinos_ENABLE_STKMesh:BOOL=ON                              \
  -D Trilinos_ENABLE_STKSearch:BOOK=ON                            \
  -D Trilinos_ENABLE_STKSearchUtil:BOOL=OFF                       \
  -D Trilinos_ENABLE_STKTopology:BOOL=ON                          \
  -D Trilinos_ENABLE_STKTransfer:BOOL=ON                          \
  -D Trilinos_ENABLE_STKUnit_tests:BOOL=OFF                       \
  -D Trilinos_ENABLE_STKUnit_test_utils:BOOL=OFF                  \
  -D Trilinos_ENABLE_STKUtil:BOOL=ON    

@jewatkins
Copy link

I see the same issue on Chrysalis when I use Intel and Intel-MPI (instead of OpenMPI).

Here is a gist with a shell script and yaml file that can be used to reproduce the issue: https://gist.github.com/xylar/4fa8c2a7a54335068bedbb38a64d584c

Can this be used to reproduce on any machine? I remember there was some issue with intel + spack a while back but that may have been resolved. Part of the issue is that everyone has different Trilinos configure scripts. It would be nice to have an up-to-date master one that we can all contribute to and have all our nightly testing conform to.

We are currently testing up to intel 19 and it looks like trilinos is doing the same? https://testing.sandia.gov/cdash/index.php?subproject=STK&project=Trilinos

We do have intel 20.2.254 on blake, we could try to update our testing there. I do recall seeing some issues with oneapi but I think they were mostly tpl issues.

@xylar
Copy link
Collaborator Author

xylar commented Nov 11, 2022

@jewatkins, we should have a discussion about this. This recipe is specific to Chrysalis. Similar recipes are needed for other supported machines. As I presented on Tuesday, we have a python package mache that handles generating these build scripts and yaml files (as well as snippets that can be part of activation scripts) to handle different machines. But there's not an easy way to make this particular type of recipe machine independent and indeed the point from my perspective is to mimic E3SM, which is decidedly not machine independent.

@xylar
Copy link
Collaborator Author

xylar commented Nov 11, 2022

I can make similar recipes for Compy, Cori or Anvil. We don't currently support Intel compilers on any other machine for the software, compass that I'm trying to build Albany for.

@xylar
Copy link
Collaborator Author

xylar commented Nov 11, 2022

@ikalash and @jewatkins, I could potentially try to add one of the machines you test with to mache and compass so you could test on your machine the same way we build on ours. That would get us a lot closer to testing the workflow we ultimately want to have work without interruptions. Is there a machine at Sandia that you test on that's also supported by E3SM? See the following file for all machines that E3SM runs on:
https://github.com/E3SM-Project/E3SM/blob/master/cime_config/machines/config_machines.xml

@bartgol
Copy link

bartgol commented Nov 11, 2022

snl-blake would be great, but I suspect that machine is not really maintained, since it was added by Micheal Deakin, who left SNL ~4yy ago, but is still listed as maintainer...

@jewatkins
Copy link

Okay, we can discuss more later if this isn't urgent. To me it looks like a compiler error, possibly due to the intel version which is not tested in Trilinos/Albany. I was thinking the easiest thing would be to try to update one of our internal builds to use the intel version that you're using and see if the compiler error comes up.

There's much we can do with a compiler error. We first have to identify the problem code. Then we'll either have tell the code owner to modify it (or we modify it and we have a special Trilinos version) or send a reproducer to the vendor and hope the issue is fixed in another release (and gets over to the target machines). But maybe it's as simple as turning something off like Luca suggests.

Skybridge is probably the closest thing to use for Sandia internal since we build e3sm and trilinos/albany there.

@jewatkins
Copy link

snl-blake would be great, but I suspect that machine is not really maintained, since it was added by Micheal Deakin, who left SNL ~4yy ago, but is still listed as maintainer...

Oh I did not see blake. It would be nice if we could revive that.

@xylar
Copy link
Collaborator Author

xylar commented Nov 11, 2022

Well, keep me posted, I'm happy to help support whatever is practical.

@ikalash
Copy link

ikalash commented Nov 11, 2022

I suspect this is due to the compiler being so new. The newest intel compiler we are using in the Albany nightlies are 19.0.5. I could try building the code on one of our intel machines with the intel compiler that is there just as a sanity check and to test this theory. I'd try that on blake, as suggested above. Another thing to try would be to try building Trilinos the usual cmake way using the newer intel 20 compiler, but that's probably harder.

@ikalash
Copy link

ikalash commented Nov 11, 2022

Also, I have added this topic to the agenda for our 11/22 Albany meeting, just FYI.

@ikalash
Copy link

ikalash commented Nov 25, 2022

So I have an update on this - my apologies for the delay. I tried building albany using spack with intel on my workstation mockba using a sems module for the intel compiler, and the build completed. I used the following intel module there: module load sems-intel/2021.3. It therefore does not seem that intel is a fundamental problem for the spack build, newer versions of intel. I can try with some other versions of intel just as a sanity check. If that goes well, I would suggest someone try to build Trilinos from scratch using cmake on the problematic machines to see if the same issue is encountered.

@ikalash
Copy link

ikalash commented Nov 25, 2022

Looking in more detail at Xylar's original error, I think this might be a compiler bug: https://community.intel.com/t5/Intel-C-Compiler/Compiler-Error-quot-Segmentation-violation-signal-raised-quot/td-p/1075456 (this forum discussion is about a diff. version of the intel compiler, but same idea).

@jewatkins
Copy link

@ikalash Just curious and probably not related to this issue but did you build your own mpi with sems-intel/2021.3 or did you use sems-openmpi/4.0.5? I was having issues with sems-openmpi/4.0.5 on cee machines.

@xylar
Copy link
Collaborator Author

xylar commented Nov 28, 2022

@ikalash, okay, I had also seen indications that it might be a compiler bug. This is a tricky situation because in the long run we really won't be able to choose our compilers, we will need to use the E3SM ones. For now, that leaves me no choice but to only support Albany with Gnu on our HPC machines until E3SM chooses to update the Intel compiler modules.

@bartgol
Copy link

bartgol commented Nov 28, 2022

@xylar Since Albany is pre-installed, isn't it possible to use a different compiler? I know this might open a can of worms, but so long as we don't use drastically different compilers, shouldn't E3SM+intelX be able to link against Albany+intelY?

Note: if you want to avoid using different compilers altogether, I completely understand; just checking though.

@jewatkins
Copy link

As mentioned before, other options might include trying to turn off the problem code or modify Trilinos and have a "special" Trilinos for the spack build. I suppose the latter could get messy if the spack build is updated automatically with Trilinos develop.

@ikalash
Copy link

ikalash commented Nov 29, 2022

How often to the compilers for E3SM get updated? I would advocate trying to get it switched if it is known to have compiler bugs, but I could see that this is easier said than done.

Trying what @bartgol and @jewatkins if we're stuck with the compiler is a good idea.

@xylar
Copy link
Collaborator Author

xylar commented Nov 29, 2022

They are updated pretty rarely. And at some HPC centers we have more control over that process than at others. I'll bring this up with the infrastructure group and see what they suggest.

@ikalash
Copy link

ikalash commented Nov 29, 2022

@ikalash Just curious and probably not related to this issue but did you build your own mpi with sems-intel/2021.3 or did you use sems-openmpi/4.0.5? I was having issues with sems-openmpi/4.0.5 on cee machines.

@jewatkins : I had spack build openmpi-4.1.4, rather than using the sems openmpi. Did not try with the sems openmpi.

@ikalash
Copy link

ikalash commented Nov 29, 2022

They are updated pretty rarely. And at some HPC centers we have more control over that process than at others. I'll bring this up with the infrastructure group and see what they suggest.

Sounds good, please keep us posted on what they say.

@xylar
Copy link
Collaborator Author

xylar commented Jan 19, 2023

We agreed that this is a lower priority. We will try building trilinos-for-albany with Intel on Perlmutter if and when that compiler becomes available. In the meantime, we will focus on GPU support and other, higher priorities.

@ikalash
Copy link

ikalash commented Jan 19, 2023

Sounds good, @xylar , thanks for the update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants