Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Numerous failures in nightly tests #985

Closed
ikalash opened this issue Sep 27, 2023 · 19 comments
Closed

Numerous failures in nightly tests #985

ikalash opened this issue Sep 27, 2023 · 19 comments
Assignees
Labels
build PyAlbany Testing Stuff related to testing Albany (including nightly tests)

Comments

@ikalash
Copy link
Collaborator

ikalash commented Sep 27, 2023

There are unfortunately numerous failures in the nightly tests due I think to the merging of this PR: #984 :

/usr/bin/ld: CMakeFiles/NullSpaceUtils.dir/NullSpaceUtils.cpp.o: undefined reference to symbol 'MPI_Barrier'
/projects/sems/install/rhel7-x86_64/sems/v2/tpl/openmpi/1.10.7/gcc/10.1.0/base/7jgrwmo/lib/libmpi.so.12: error adding symbols: DSO missing from command line
/nightlyCDash/repos/Albany/pyAlbany/src/Albany_PyUtils.cpp:60:48: error: ‘ALBANY_CXX_COMPILER_ID’ was not declared in this scope
c++: error: unrecognized command line option ‘-Wext-lambda-captures-this’
c++: error: unrecognized command line option ‘-arch=sm_70’

@bartgol can you please have a look and fix these issues? The last one doesn't seem like it would be from your changes actually, but I could be wrong about that.

@ikalash ikalash added build Testing Stuff related to testing Albany (including nightly tests) PyAlbany labels Sep 27, 2023
@bartgol
Copy link
Collaborator

bartgol commented Sep 27, 2023

Yes, I think these are definitely b/c of that PR. What I think is happening is that we're using serial compilers, rather than mpicxx/mpicc/mpifort.

Two possible solutions:

  1. Modify our testing scripts, passing stuff like -DCMAKE_CXX_COMPILER:STRING=mpicxx.
  2. Add a call to find_package (MPI REQUIRED), and then link albanyLib against MPI. I thought Trilinos already linked to MPI, but I guess it simply relies on being built with MPI wrappers.

@bartlettroscoe am I correct saying that trilinos does not link against MPI (in a CMake sense), but assumes mpi wrappers are used as compilers? Btw, I checked my local build of trilinos, and I found

set(Trilinos_MPI_LIBRARIES "") 
set(Trilinos_MPI_LIBRARY_DIRS "") 
set(Trilinos_MPI_INCLUDE_DIRS "") 
set(Trilinos_MPI_EXEC "/usr/lib64/openmpi/bin/mpiexec")
set(Trilinos_MPI_EXEC_PRE_NUMPROCS_FLAGS "") 
set(Trilinos_MPI_EXEC_MAX_NUMPROCS "4")
set(Trilinos_MPI_EXEC_POST_NUMPROCS_FLAGS "") 
set(Trilinos_MPI_EXEC_NUMPROCS_FLAG "-np")

so I could not even just link ${Trilinos_MPI_LIBRARIES} to Albany, and be done with it. Is it expected that those vars be empty? Note: I used mpicxx/mpicc/mpifort as CXX/C/F compilers when building trilinos.

@bartlettroscoe
Copy link
Contributor

bartlettroscoe commented Sep 27, 2023

  1. Modify our testing scripts, passing stuff like -DCMAKE_CXX_COMPILER:STRING=mpicxx.

That is what you need to do if you don't get compilers from Trilinos after calling find_package(Trilinos ...).

  1. Add a call to find_package (MPI REQUIRED), and then link albanyLib against MPI. I thought Trilinos already linked to MPI, but I guess it simply relies on being built with MPI wrappers.

Even if you override the Trilinos lib install dir to be <prefix>/lib, that does not work with CUDA builds (because you can't call find_package(CUDAToolkit) until after you define the compilers).

@bartlettroscoe am I correct saying that trilinos does not link against MPI (in a CMake sense), but assumes mpi wrappers are used as compilers?

Trilinos just uses the MPI compiler wrappers. If you are building shared libraries, then the shared Trilinos libraries should be linked to the shared MPI libraries (because that is what the mpicxx wrapper will do when building shared libs). But if you are using a static build of Trilinos, then the static Trilinos libraries are not linked to anything (because static libs are just dumb globs of object files).

Btw, I checked my local build of trilinos, and I found

set(Trilinos_MPI_LIBRARIES "") 
set(Trilinos_MPI_LIBRARY_DIRS "") 
set(Trilinos_MPI_INCLUDE_DIRS "") 

I don't the above vars were ever set when using MPI complier wrappers. Those should have been removed years ago. (We have lost the history for who put those in from the CVS days.)

set(Trilinos_MPI_EXEC "/usr/lib64/openmpi/bin/mpiexec")
set(Trilinos_MPI_EXEC_PRE_NUMPROCS_FLAGS "") 
set(Trilinos_MPI_EXEC_MAX_NUMPROCS "4")
set(Trilinos_MPI_EXEC_POST_NUMPROCS_FLAGS "") 
set(Trilinos_MPI_EXEC_NUMPROCS_FLAG "-np")

Yes, those do get set and you can count on those. (I know EMPIRE used to use those or still does.)

so I could not even just link ${Trilinos_MPI_LIBRARIES} to Albany, and be done with it. Is it expected that those vars be empty? Note: I used mpicxx/mpicc/mpifort as CXX/C/F compilers when building trilinos.

When doing MPI builds, you should use MPI compiler wrappers when building Albany.

NOTE: There has been some discussion of switching TriBITS and Trilinos to use the standard FindMPI.cmake module with find_package(MPI), see:

but that will be another major break in Trilinos backward compatibility.

@bartgol
Copy link
Collaborator

bartgol commented Sep 27, 2023

Thanks! We'll set the compilers to be the mpi wrappers, and be done with it.

It's interesting that Trilinos did not go the find_package way for MPI. It seems to mo that it would be the most cmake-y way, although the compiler wrappers is probably the easiest way.

@bartgol
Copy link
Collaborator

bartgol commented Sep 27, 2023

@ikalash I will try to fix the build scripts, though I'm not super expert on how our nightlies work (there are a lot of scripts in our dashboards folder). I will ping you on the PR, so you may take a second look and possibly fix what I did wrong.

@ikalash
Copy link
Collaborator Author

ikalash commented Sep 27, 2023

Quick question about your changes to the cmake files @bartgol : I assume they are backward compatible with

-D Trilinos_USE_GNUINSTALLDIRS=OFF

Is that right? In other words, if I have the above option in my Trilinos configure script, Albany should still build, correct?

@bartlettroscoe
Copy link
Contributor

It's interesting that Trilinos did not go the find_package way for MPI. It seems to mo that it would be the most cmake-y way, although the compiler wrappers is probably the easiest way.

@bartgol, way back in 2008 when Trilinos first adopted CMake, the standard FindMPI.cmake module did not work on a few critical platforms that Trilinos had to support (because the MPI compiler wrappers on those systems did not support a --show argument or whatever it was called to spit out the compiler and link options). It is unclear when those systems disappeared or when/if a version of CMake was released where FindMPI.cmake works on all modern systems.

This is just part of the legacy code in TriBITS that needs to be addressed as part of:

(and it is not clear if and when that will ever happen).

As you can see, even small changes in backward compatibility can have large shock waves with something like Trilinos.

@bartgol
Copy link
Collaborator

bartgol commented Sep 27, 2023

Thanks for the history background! :)

@bartgol
Copy link
Collaborator

bartgol commented Sep 27, 2023

Quick question about your changes to the cmake files @bartgol : I assume they are backward compatible with

-D Trilinos_USE_GNUINSTALLDIRS=OFF

Is that right? In other words, if I have the above option in my Trilinos configure script, Albany should still build, correct?

I think so? I do make one assumption on the trilinos install dir, and that is the bin subfolders (for all the seacas executables), but I think that subdir will be there with virtually every meaningful installation way, including whatever happens with GNUINSTALLDIRS=OFF.

That said, we should remove that as soon as these feailures are fixed. In fact, we should probably remove it from some builds, to ensure everything works as planned.

@ikalash
Copy link
Collaborator Author

ikalash commented Sep 27, 2023

@ikalash I will try to fix the build scripts, though I'm not super expert on how our nightlies work (there are a lot of scripts in our dashboards folder). I will ping you on the PR, so you may take a second look and possibly fix what I did wrong.

Sounds good. You should find configure scripts for most of the nightlies here: https://github.com/sandialabs/Albany/tree/master/doc/dashboards . There are module files too when relevant. I am happy to test some fixes as well as they come up. I was intending to test your PR before it was merged, but did not get the chance - sorry.

@ikalash
Copy link
Collaborator Author

ikalash commented Sep 28, 2023

Quick question about your changes to the cmake files @bartgol : I assume they are backward compatible with

-D Trilinos_USE_GNUINSTALLDIRS=OFF

Is that right? In other words, if I have the above option in my Trilinos configure script, Albany should still build, correct?

I think so? I do make one assumption on the trilinos install dir, and that is the bin subfolders (for all the seacas executables), but I think that subdir will be there with virtually every meaningful installation way, including whatever happens with GNUINSTALLDIRS=OFF.

That said, we should remove that as soon as these feailures are fixed. In fact, we should probably remove it from some builds, to ensure everything works as planned.

Ok, I will start changing this in the nightlies once you get the builds fixed. Let me know if that sounds reasonable.

@bartlettroscoe
Copy link
Contributor

FYI: Responding to above, configuring older versions of Trilinos with either:

-D Trilinos_USE_GNUINSTALLDIRS=OFF

(because that was the old default for that var) or

-D CMAKE_INSTALL_LIBDIR:STRING=lib

(because that var is ignored when not using GNUInstallDirs.cmake) will be backward compatible and yield the same behavior (i.e. having the Trilinos lib install dir named <trilinosInstallDir>/lib instead of <trilinosInstallDir>/lib64).

SIDENOTE: It is amazing how much of a shockwave this one seemingly small non-backward compatible change to adopt one modern CMake practice has made on even just the local the Trilinos user community. This does not bode well for all of the changes to switch to modern CMake listed in:

@bartgol
Copy link
Collaborator

bartgol commented Sep 28, 2023

@bartlettroscoe disruptive changes are welcome if they contribute to improve the quality of the package config/build system. I think CMake was adopted by many project without paying as much attention to software quality as it was payed to actual source code. Patches and hacks piled up through the years, all of them designed with a cmake2.0 eye. True switch to cmake3.0 practices is still far away in certain projects, and is sometimes viewed as a burden. Imho, it makes things simpler, cleaner, and faster to tweak (once the project is fully in cmake3.0-style). Alas, it takes effort (in certain projects) to convince ppl that we should switch, because "their scripts won't work anymore".

That said, in Albany we're happy to follow more modern approaches, mine was just a "general" rant. :)

@bartgol
Copy link
Collaborator

bartgol commented Sep 29, 2023

@jewatkins @ikalash I'm trying to update our nightly scripts, but I don't understand what scripts run. I do see lots of scripts in, say, doc/dashboards/weaver.sandia.gov, and I have a hard time figuring out how they are invoked. There seem to be a lot of hacky bash stuff happening, which I can't detangle. I thought I just had to modify do-cmake-albany, but I don't understand if that's correct. E.g., that file has the line

TRILINSTALLDIR=

Is some other script going to modify this line before we run this script? Also, do-cmake-weaver-trilinos only sets the CXX compiler. How are the C/Fortran mpi compilers detected?

@ikalash
Copy link
Collaborator Author

ikalash commented Sep 29, 2023

@bartgol : this gets set automatically to the right dir in the .cmake script that gets created from this do-cmake script during the running of the nightly tests. A user using just the do-cmake script is supposed to set that path themself.

@ikalash
Copy link
Collaborator Author

ikalash commented Sep 30, 2023

I believe this should be fixed starting tomorrow everywhere except perlmutter. On perlmutter, it looks like there are some issues pointing to the write mpicc / mpicxx compiler: https://my.cdash.org/build/2414115/configure . @mcarlson801 can you please have a look?

@bartgol
Copy link
Collaborator

bartgol commented Oct 2, 2023

Thanks for fixing all the testing scripts!

@ikalash
Copy link
Collaborator Author

ikalash commented Oct 2, 2023

Sure. Max still needs to fix the perlmutter ones. We can discuss your proposition to clean up the testing scripts at the next Albany meeting.

@mcarlson801
Copy link
Collaborator

Perlmutter scripts are updated in the dashboard and the builds are working. I'm adding some new features for the Perlmutter tests today and then I'm going to push all the changes to the repo together.

@ikalash
Copy link
Collaborator Author

ikalash commented Oct 2, 2023

I forgot to say, I haven't tried removing -D Trilinos_USE_GNUINSTALLDIRS=OFF yet from the nightly configure scripts, but I will do this next. Note that there are some failures now in Albany tests, namely #988 , #989 , #990 .

@ikalash ikalash closed this as completed Oct 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build PyAlbany Testing Stuff related to testing Albany (including nightly tests)
Projects
None yet
Development

No branches or pull requests

4 participants