-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segfault in Spack build of trilinos-for-albany with intel #8
Comments
@ikalash, @mperego, @bartgol, @jewatkins I'm just reviving this issue in the hope that we can resolve it in the not-too-distant future. It's not super urgent, just bothersome. |
I see the same issue on Chrysalis when I use Intel and Intel-MPI (instead of OpenMPI). Here is a gist with a shell script and yaml file that can be used to reproduce the issue: |
Ugh, ICEs are annoying, and often require seemingly ineffective changes (like split a cpp file in two, or remove a const), with lots of bisections iterations to identify the problematic code snippet. That said, from the error msg, it appears it's trying to build the
|
Can this be used to reproduce on any machine? I remember there was some issue with intel + spack a while back but that may have been resolved. Part of the issue is that everyone has different Trilinos configure scripts. It would be nice to have an up-to-date master one that we can all contribute to and have all our nightly testing conform to. We are currently testing up to intel 19 and it looks like trilinos is doing the same? https://testing.sandia.gov/cdash/index.php?subproject=STK&project=Trilinos We do have intel 20.2.254 on blake, we could try to update our testing there. I do recall seeing some issues with oneapi but I think they were mostly tpl issues. |
@jewatkins, we should have a discussion about this. This recipe is specific to Chrysalis. Similar recipes are needed for other supported machines. As I presented on Tuesday, we have a python package mache that handles generating these build scripts and yaml files (as well as snippets that can be part of activation scripts) to handle different machines. But there's not an easy way to make this particular type of recipe machine independent and indeed the point from my perspective is to mimic E3SM, which is decidedly not machine independent. |
I can make similar recipes for Compy, Cori or Anvil. We don't currently support Intel compilers on any other machine for the software, compass that I'm trying to build Albany for. |
@ikalash and @jewatkins, I could potentially try to add one of the machines you test with to |
snl-blake would be great, but I suspect that machine is not really maintained, since it was added by Micheal Deakin, who left SNL ~4yy ago, but is still listed as maintainer... |
Okay, we can discuss more later if this isn't urgent. To me it looks like a compiler error, possibly due to the intel version which is not tested in Trilinos/Albany. I was thinking the easiest thing would be to try to update one of our internal builds to use the intel version that you're using and see if the compiler error comes up. There's much we can do with a compiler error. We first have to identify the problem code. Then we'll either have tell the code owner to modify it (or we modify it and we have a special Trilinos version) or send a reproducer to the vendor and hope the issue is fixed in another release (and gets over to the target machines). But maybe it's as simple as turning something off like Luca suggests. Skybridge is probably the closest thing to use for Sandia internal since we build e3sm and trilinos/albany there. |
Oh I did not see blake. It would be nice if we could revive that. |
Well, keep me posted, I'm happy to help support whatever is practical. |
I suspect this is due to the compiler being so new. The newest intel compiler we are using in the Albany nightlies are 19.0.5. I could try building the code on one of our intel machines with the intel compiler that is there just as a sanity check and to test this theory. I'd try that on blake, as suggested above. Another thing to try would be to try building Trilinos the usual cmake way using the newer intel 20 compiler, but that's probably harder. |
Also, I have added this topic to the agenda for our 11/22 Albany meeting, just FYI. |
So I have an update on this - my apologies for the delay. I tried building albany using spack with intel on my workstation mockba using a sems module for the intel compiler, and the build completed. I used the following intel module there: |
Looking in more detail at Xylar's original error, I think this might be a compiler bug: https://community.intel.com/t5/Intel-C-Compiler/Compiler-Error-quot-Segmentation-violation-signal-raised-quot/td-p/1075456 (this forum discussion is about a diff. version of the intel compiler, but same idea). |
@ikalash Just curious and probably not related to this issue but did you build your own mpi with |
@ikalash, okay, I had also seen indications that it might be a compiler bug. This is a tricky situation because in the long run we really won't be able to choose our compilers, we will need to use the E3SM ones. For now, that leaves me no choice but to only support Albany with Gnu on our HPC machines until E3SM chooses to update the Intel compiler modules. |
@xylar Since Albany is pre-installed, isn't it possible to use a different compiler? I know this might open a can of worms, but so long as we don't use drastically different compilers, shouldn't E3SM+intelX be able to link against Albany+intelY? Note: if you want to avoid using different compilers altogether, I completely understand; just checking though. |
As mentioned before, other options might include trying to turn off the problem code or modify Trilinos and have a "special" Trilinos for the spack build. I suppose the latter could get messy if the spack build is updated automatically with Trilinos develop. |
How often to the compilers for E3SM get updated? I would advocate trying to get it switched if it is known to have compiler bugs, but I could see that this is easier said than done. Trying what @bartgol and @jewatkins if we're stuck with the compiler is a good idea. |
They are updated pretty rarely. And at some HPC centers we have more control over that process than at others. I'll bring this up with the infrastructure group and see what they suggest. |
@jewatkins : I had spack build openmpi-4.1.4, rather than using the sems openmpi. Did not try with the sems openmpi. |
Sounds good, please keep us posted on what they say. |
We agreed that this is a lower priority. We will try building |
Sounds good, @xylar , thanks for the update. |
This is not a new issue but one I want to revisit. When I try to build
trilinos-for-albany
on Chrysalis with intel and OpenMPI, I see:A long error message with a segfault
I believe this is the same issue I have seen previously with Intel on all machines I've tried.
It would be really great to get this resolved, since Intel and OpenMPI are the production compilers on Chrysalis for E3SM.
The text was updated successfully, but these errors were encountered: