Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update container build #266

Open
thomas-robinson opened this issue Nov 14, 2024 · 0 comments
Open

Update container build #266

thomas-robinson opened this issue Nov 14, 2024 · 0 comments

Comments

@thomas-robinson
Copy link
Member

Is your feature request related to a problem? Please describe.
Current container builds use a large intel-compiler based container that can not be shared outside of our organization. This needs to be updated so that the final model container does not have intel compilers and can be shared.

Describe the solution you'd like
background
There are two new containers available within the GFDL:

podman pull docker://gitlab.gfdl.noaa.gov:5050/fre/hpc-me/base-ubuntu20.04-intel:2024
podman pull docker://gitlab.gfdl.noaa.gov:5050/fre/hpc-me/base-ubuntu20.04-intel:2024rte

The first container has the libraries needed for compiling and the intel 2024.2 compilers.
The second container has the library dependencies, but only the runtime environment for intel 2024.
These containers do not need any spack commands to work. The environment is all set up.

what needs to be done

  1. The container platform should be updated to allow users to specify a base container to use, if a 2 step build is required, and the second step base container
  2. The base containers listed above should be used for building the models using fre make
  3. Any spack loads or spack commands should be removed from the container part of fre make
  4. Tests should be added to build the null model in a container
  5. Frerun will need to be updated for handing the binding.

frerun updates
The LD_LIBRARY_PATH needs to be updated appropriately by adding :\${LD_LIBRARY_PATH} to the end. This is the main difference with frerun
Here is an example of a frerun:

setenv MPICH_SMP_SINGLE_COPY_MODE "NONE"
setenv APPTAINERENV_LD_LIBRARY_PATH ${CRAY_LD_LIBRARY_PATH}:${LD_LIBRARY_PATH}:/opt/cray/pe/lib64:/usr/lib64/libibverbs:/opt/cray/libfabric/1.20.1/lib64:/opt/cray/pals/1.4/lib:\${LD_LIBRARY_PATH}
setenv APPTAINER_CONTAINLIBS "/usr/lib64/libjansson.so.4,/usr/lib64/libjson-c.so.3,/usr/lib64/libdrm.so.2,/lib64/libtinfo.so.6,/usr/lib64/libnl-3.so.200,/usr/lib64/librdmacm.so.1,/usr/lib64/libibverbs.so.1,/usr/lib64/libibverbs/libmlx5-rdmav34.so,/usr/lib64/libnuma.so.1,/usr/lib64/libnl-cli-3.so.200,/usr/lib64/libnl-genl-3.so.200,/usr/lib64/libnl-nf-3.so.200,/usr/lib64/libnl-route-3.so.200,/usr/lib64/libnl-3.so.200,/usr/lib64/libnl-idiag-3.so.200,/usr/lib64/libnl-xfrm-3.so.200,/usr/lib64/libnl-genl-3.so.200"
setenv APPTAINERENV_I_MPI_PMI_LIBRARY /opt/cray/pe/lib64/libpmi2.so
setenv APPTAINER_BIND "/usr/share/libdrm,/var/spool/slurmd,/opt/cray,/opt/intel,${PWD},/etc/libibverbs.d,/usr/lib64/libibverbs,/usr/lib64/libnl3-200,${HOME},/gpfs/f5"

Describe alternatives you've considered
none

Additional context
The base containers are available only within the GFDL because of the intel compiler restrictions. The dockerfiles for these containers can be found in the HPC-ME repository.

We need a plan for

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant