Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streamline dependencies of docker CI images #951

Merged
merged 11 commits into from
Mar 5, 2024
Merged

Conversation

ahnaf-tahmid-chowdhury
Copy link
Member

Description

This PR aims to optimize the dependencies of the docker CI images by:

  1. Removing unused dependencies.
  2. Updating Geant4 to version 11.2.1.
  3. Changing the download method of HDF5 and Geant4 to use git clone from the source.
  4. Switching solely to git clone to download dependencies, removing the dependency on wget.
  5. Creating a new stage to store only binary files, thereby reducing the docker image size.

Impact

These changes are expected to streamline the dependencies of the docker CI images, reducing the image size and potentially improving performance.

Related Issues

Closes #750.

@ahnaf-tahmid-chowdhury ahnaf-tahmid-chowdhury linked an issue Mar 2, 2024 that may be closed by this pull request
@ahnaf-tahmid-chowdhury
Copy link
Member Author

Is it necessary to create an extra stage like HDF5 and MOAB, given that these are also external dependencies? I think we can remove these and instead update our workflow to create a binary/final stage to store only the binaries with minimal size, and we may choose the tag as dagmc.

Copy link
Member

@gonuke gonuke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question about installing things in the Dockerfile...


ARG INSTALL_DIR

COPY --from=dagmc_test ${INSTALL_DIR} ${INSTALL_DIR}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What files are you copying here and why? If they aren't installed by make install then they probably aren't necessary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this step, I am using an empty image to copy all the installed files from the DAGMC step to the exact location. Therefore, in this image, we only contain the files available in /opt/*. There are no extra files or dependencies in this new image, which keeps the image size minimal, less than 90MB.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this step we have, DAGMC, MOAB, HDF5, Double Down, Embree, Geant4. However, We can make it more minimal by copying only /opt/dagmc folder. I have chosen to copy the full folder as I am not sure if they are core dependency of DAGMC or only necessary to build DAGMC since we have all the binaries available that works without them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By applying this step, we can update our docker_publish workflow to add a new tag as dagmc:latest upon PR merge, and dagmc:version/stable upon new release. This can help streamline other projects as well, such as PyNE and NukeLab. We can also update our documentations to use the docker version of DAGMC as well for the general users. Let me know if you find the idea useful.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! You've already merged this PR. I was waiting for your response. The binary stage won't run until we update the docker_publish file. There are currently no instructions for this. I think I should create a new PR for it.

@gonuke
Copy link
Member

gonuke commented Mar 5, 2024

Is it necessary to create an extra stage like HDF5 and MOAB, given that these are also external dependencies? I think we can remove these and instead update our workflow to create a binary/final stage to store only the binaries with minimal size, and we may choose the tag as dagmc.

I think these stages mostly appear for historical reasons to limit the amount of rebuilding of images that must occur. The philosophy is that things that occur earlier in the Dockerfile change least often and least under our control. Thus, having them installed in early as a different stage can mean fewer instances of rebuilding those sections of the file. GEANT4 is the most resource intensive to build so one of the main things to avoid.

@ahnaf-tahmid-chowdhury
Copy link
Member Author

I think these stages mostly appear for historical reasons to limit the amount of rebuilding of images that must occur. The philosophy is that things that occur earlier in the Dockerfile change least often and least under our control. Thus, having them installed in early as a different stage can mean fewer instances of rebuilding those sections of the file. GEANT4 is the most resource intensive to build so one of the main things to avoid.

I understand. We've included these stages to minimize the need for rebuilding HDF5 and MOAB whenever we make changes to the Dockerfile. For instance, if we modify the MOAB part, Docker will trace back to the HDF5 stage and then build MOAB. Currently, GitHub workflow supports a full chase log. This means that the steps where we make changes in the Dockerfile will be traced accordingly.

@gonuke gonuke merged commit 594f7f7 into develop Mar 5, 2024
1 check passed
cd build && \
../hdf5-${HDF5_VERSION}/configure --enable-shared \
--prefix=${hdf5_install_dir} \
../hdf5/configure --enable-shared \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ahnaf-tahmid-chowdhury - are you familiar with configure and autotools? The tarball that we retrieve with wget has the configure file already generated. If we pull from the repo, we need to do those steps ourselves. Do they support a CMake based build? That might be more generalizable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that much as I know cmake. I think autotools are based on autoconf similar to cmake. Handle things in an easier way than cmake.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have noticed we have installed autoconf and libtool thus I haven't tried with cmake for this step. I think I can give this a try. To make it more simplified.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Streamline dependencies of docker CI images System HDF5 on CI
2 participants