-
Notifications
You must be signed in to change notification settings - Fork 1
OpenConext Docker Container Strategy
This document aims to define a strategy for using Docker containers, and the tools associated with them, to deploy local development environments for the OpenConext application.
The goal of this strategy is to help transition away from the current Vagrant based local development environment to a Docker one. It also aims to provide a unifying framework for each component of OpenConext to adhere to, as some of them have already started using Docker containers independently from each other. The end result should be a relatively simple and easy way to start an OpenConext development environment. It should be platform-agnostic and use open-source, ready-made tools. While there will be some customisations, they will be kept to a minimum. This strategy has been designed with OpenConext developers in mind, but also for the open-source users that contribute to the project. The tools and artifacts created can be made public, as part of the open-source project. To make it easier to understand and discuss, this document has been split up into four sections, each one building on top of the previous.
The “base” images are a set of Docker container images that are built and maintained by SURF and provide the foundation on top of which the OpenConext application Docker images will be built. They contain the common operating system, tools and programming language frameworks that the applications need to run.
Developers should be able to write Dockerfiles for their applications that use these base images and just “plug in” the application code and configuration files in order to build their Docker images.
The recommended way to start is by creating a dedicated Github repository for all the Dockerfiles and other files required to build these images. Assuming the repository will be called ‘openconext-base-images’, a structure for this repo could look like:
Each directory contains the files needed to build a base image for a certain language or framework (PHP, Maven) and for a certain version of those tools. This system allows for an easy upgrade path, where newer versions could be built, stored and used when an update has been decided.
The Dockerfiles should be based on the official Docker images:
- For ‘php-fpm72-apache2’: docker image php:7.2-fpm
- For ‘maven-38-apache2’: docker image maven:3.8-jdk-8
- Etc.
On top of each of these official images, each of their respective Dockerfiles add all the other common tools that the OpenConext applications need. This can be done by using the operating ystem package managers available in the official images (apk, apt, yum, etc):
- Apache2
- Nodejs, Npm, yarn
- Any other utils
Using Docker multi-stage builds is recommended - if possible - in this process, in order for the end result to be the smallest image possible.
These images will be automatically built by a CI / CD system, in this case, Github Actions. Two triggers are recommended for the pipeline to build, test and push the images:
- On any change pushed to the files
- Daily build on a schedule (Github actions supports)
The daily build serves the purpose of always having an up-to-date image for security patches. In each of the Dockerfiles besides installing the tools, an ‘upgrade’ command can be run (‘apt-get upgrade’ or ‘apk upgrade’) to bring all the packages up to date with the latest versions in the remote repo. At the end of the pipeline, the images are pushed to the Github Packages Docker Registry (GHCR). These registries can be made publicly available so they can be used in the open-source project. Each image directory in the code repository should have a corresponding GHCR registry. This makes it easier to manage, use and browse, as the images won’t be mixed together in one big registry. The Github Actions pipeline will tag each image with three tags:
- The (short) SHA1 of the latest repo commit
- The
latest
tag - The current date
The
latest
tag is a movable tag that will always be changed to the latest image pushed in the registry. That way the applications can always use thelatest
tag in their Dockerfiles and make sure they pull the most up-to-date base image. The SHA1 and current data serve a historical / tracking purpose but can also be used by the applications to pin themselves to a certain version of the base image in case they need it.
With the Base images being built using the strategy in the previous section, the next step is to build the actual application Docker container images.
A good way of starting is to create a Dockerfile for each environment in each Git application repo: a ‘docker’ directory in each Git repo will contain a ‘Dockerfile.dev’ and ‘Dockerfile.prod’. This directory can also contain any other files that are required during build-time: config files, SSL certificates, etc. This structure gives development teams flexibility and control over how they build their Docker images and a bit of structure to their repo. The Dockerfiles will use as a foundation the “base” images we have previously built and shared with the organization, depending on the framework / programming language the application requires. For example, ‘OpenConext-engineblock’ will use the ‘php-fpm72-apache2’ base image. On top of this base image, the Dockerfiles will follow the usual setup procedure for the application that is outlined in the Readme and in the Ansible scripts in OpenConext-deploy, which includes adding the config files, ssl certs, building the static assets via NodeJS, etc. The two different Dockerfiles (.dev and .prod) are there in case there are differences in how the application is built for local development and production. The end result should be an application Docker image that has PHP / Maven, Apache2, the application code, static assets and configuration files all “baked” in one artifact that is ready to run either on a local development environment or in a production one.
Multi-stage Docker builds are important for this building process if smaller Docker images are desired. For example the Node modules could be installed in a “builder” NodeJS image where the static assets are generated. And from there just the resulting static assets are moved to the final image. Running the base Docker images As the base images already have installed a startup script (‘start.sh’), the application images can leverage that one and set it up as their CMD / ENTRYPOINT parameter, to be used at runtime.
If the application needs any modifications to the script, it can be overwritten in the application Dockerfiles. Image build automation, tagging and storing
Same as for the base images, the application ones should always be built by the Github actions CI / CD system. Two pipelines are recommended, one that builds the dev
image based on Dockerfile.dev
and one that builds the prod
image based on Dockerfile.prod
. Having these two pipelines separated allows for different build strategies to be defined. For example, any commit or merge to the “main” branch triggers the ‘dev’ workflow but just Git releases or certain Git tags trigger the prod
workflow. Each workflow pushes the resulting Docker images to the Github Packages Container Registry
(GHCR). A separation of the registries per application is recommended: each app should have its own Docker registry. These registries could also be made public as part of the open-source project. Before being pushed to the registry, the images can be tagged with:
- The
dev
orprod
tag, a movable tag, always moved to the latest image being built. Any system running these images will reference this tag to make sure it runs the latest version of the image. - The git short commit SHA1 of the commit that triggered the build. This creates a historical log of the builds and generates better visibility of what is inside the Docker image, what version of the code it runs. It will also allow for emergency rollbacks in the future, in production systems or if a developer wants to run a specific version of the code locally. Note on different processor architectures Presently other processor architectures are becoming popular for personal computers, like the ARM architecture found in Apple’s M-series of processors. The Docker image build system needs to account for this, especially when these images are designed for use in local development environments. An easy way to do this would be to use the new Docker build client called “BuildX”. It allows users to build and push images for different processor architectures. It can be used in Github Actions to automatically build the ‘dev’ images for both x86 and ARM. Once these images are in GHCR, a local Docker client running natively on ARM will by default, with no input from the user, ask for the ARM version of the image. If it can not find it, it will fallback to requesting the x86 version of it and run it in emulation mode. Full Docker image building workflow Putting both the base image building and the application image building workflows together looks like this in a diagram:
Using the Docker application images automatically built and pushed to the registry from the previous section, a local development environment can now be deployed using them. Two tools are required for the developers to install: the Docker daemon and Docker Compose. A ‘docker-compose.yml’ file will define how the application images are started and connected together in a coherent development environment.
In this strategy, these concepts apply: Docker compose brings up all the containers, attaches them to a network. Code is mounted from the host computer in the Docker container(s). This has been realised in the OpenConext-devconf project. This repository acts as a central store of all the local development environment files. It relies on a big, central ‘docker-compose.yml’ file that brings up the entire OpenConext stack, similar to the way it was done in the Vagrant VM. A developer setting up would clone this Git repo and start the Docker environment from it. This approach requires a bit more customisation because a script or instructions would be needed to tell Docker compose which application code to mount inside which Docker container, depending on which application is being worked on.
A number of “service containers” have to be deployed alongside them for the entire stack to function correctly. What this document defines as “service containers” are the Haproxy, MariaDB, MongoDB and Shibboleth containers. The following paragraphs define the strategy for starting them in a Docker Compose context to serve the needs of the application containers.
A ‘haproxy’ directory will be created. This directory contains all the config files for Haproxy and any other files required for Haproxy to work correctly in the OpenConext stack.The Docker Compose file creates a “service” for Haproxy that pulls the official Docker image from Docker Hub, pinned to a certain version via the Docker image tag. For example “haproxy:2.6.8”. It will then mount all the required config files in the Haproxy container. To facilitate the connection between Haproxy and all the application containers, Docker Compose will attach the Haproxy container to the same Docker network. Connecting them via a Docker network allows the containers to talk to each via DNS, as the Docker daemon provides an internal DNS service. For example an “EngineBlock” container defined as ‘openconext-engineblock’ in the compose file will be reachable by all the other containers including Haproxy - just by using the DNS name ‘openconext-engineblock’. This allows for an easy connection between Haproxy and all the application containers it serves. By maintaining a well-defined naming scheme in the Docker compose file, then the Haproxy config file can be written using the same service names. For example the entry for EngineBlock in Haproxy will look like:
server eb openconext-engineblock:410 cookie eb check inter 8000 fall 5 rise 2 maxconn 35
This method allows for easy self-registration via DNS and no need to do complex IP management for the Docker containers. The Haproxy will also expose port(s) 80 / 443 to the host (the developer’s computer), the end result being that the user is able to access these ports locally in their browser, hit Haproxy in the container and get their request forwarded to the right application based on the Haproxy config and hostname requested. The Haproxy container should be the last container started by Docker compose, as defined by both the order of services in the Compose file and by the ‘depends_on’ parameter.
The strategy for the database service containers is very similar to the one for the Haproxy container. A directory called ‘mariadb / mongodb’ in either the dedicated Git repo or a ‘docker’ directory in the application git repos should contain the configuration files and the seed data for the databases, if needed. The Docker compose file starts these database containers based on the official Docker images for them, pinned to a certain version via the Docker image tag. It then mounts in the containers the configuration files but also the seed data. Both container images include a mechanism through which data in a certain format (.sql for example) mounted in a particular directory in the container is automatically imported into the database at the container start time. Docker compose uses the network mechanism to connect these database containers to the same network and make them available to the applications via a DNS name. For example if the MariaDB container is defined as ‘openconext-mariadb’ in the compose file it will be available as that hostname via DNS. This means that if the naming is defined and respected, all the application containers can define that DNS name in their configs as the database hostname to connect to and Docker will do all the work. The database containers should be the first container started by Docker compose, as defined by both the order of services in the Compose file and by the ‘depends_on’ parameter.
The strategy for the Apache Shibboleth follows the same pattern as the database containers above: part of the Docker compose file, config file(s) mounted inside the container from a local directory, connected to the Docker network and available via the DNS name to the application containers. The only difference is that there is no official Shibboleth Docker image. This means that a custom Docker image needs to be built and pushed to GHCR. It can be part of the Docker Base images repository or it could have its own dedicated Git repo and Actions workflow. The Shibboleth container should be one of the first containers started by Docker compose, as defined by both the order of services in the Compose file and by the ‘depends_on’ parameter. Example The easiest way to see how these components all fit together is by using a docker-compose.yml
file example. This example assumes the first deployment strategy has been chosen, as that is the recommended one for OpenConext. Each application Git repository has a ‘docker’ directory inside which all the files for the development environment are hosted.
The ‘docker-compose.yml’ file in this example:
With everything in place, a simple ‘docker compose up’ command brings up the entire stack as defined in the compose file. The command ‘docker compose logs -f openconext-engineblock’ follows the logs of the EB container as outputted to STDOUT by both Apache2 and PHP-Fpm. Running ‘docker compose exec openconext-engineblock ’ will run the inside the container, with the condition that the command is available inside the container. If the developers needs to connect to a running container for more debugging, that can be done using ‘exec’ to start a new shell in the container: ‘docker compose exec openconext-engineblock /bin/bash’. And many more commands that make it easy to use the Docker environment as defined in their documentation.
The migration to the Docker based development environment opens up the possibility to move the production environment as well to a Docker based one. Docker images being built using a Dockerfile dedicated to the production environment, like ‘Dockerfile.prod’ mentioned in the previous sections, can be deployed using a Docker container orchestrator and scheduler. There are several, popular, open-source options available. What follows is a list of three of them that could fit the needs of the OpenConext user very well, with recommendations.