diff --git a/084-services/proposal.md b/084-services/proposal.md new file mode 100644 index 00000000..6fb13bc0 --- /dev/null +++ b/084-services/proposal.md @@ -0,0 +1,155 @@ +# Summary + +Provide a native way to expose local services to steps. + +# Motivation + +* Easier integration testing ([concourse/concourse#324](https://github.com/concourse/concourse/issues/324)) + * The current recommended way is to run a privileged `task` with a Docker daemon + `docker-compose` installed, and that task runs `docker-compose up` and the test suite + +# Proposal + +I propose adding a new `services` field to the `task` step (and eventually `run` step) and special var source `.svc`, e.g. + +```yaml +task: integration-tests +file: ci/tasks/test.yml +params: + POSTGRES_ADDRESS: ((.svc:postgres.address)) + # or + # POSTGRES_HOST: ((.svc:postgres.host)) + # POSTGRES_PORT: ((.svc:postgres.port)) + # + # Services can expose many ports, and each port is named. + # To access addresses/ports other than the one named 'default', use: + # ((.svc:postgres.addresses.some-port-name)) + # ((.svc:postgres.ports.some-port-name)) +services: +- name: postgres + file: ci/services/postgres.yml +``` + +When the `task` finishes (successfully or otherwise), the service will be gracefully terminated by first sending a `SIGTERM`, and eventually a `SIGKILL` if the service doesn't terminate within a timeout. + +### With `across` step + +Since `services` just binds to `task`, you can make use of the `across` step to run tests against a matrix of dependencies. + +```yaml +across: +- var: postgres_version + values: [9, 10, 11, 12, 13] + max_in_flight: 3 +task: integration-suite +file: ci/tasks/integration.yml +params: + POSTGRES_ADDRESS: ((.svc:postgres.address)) +services: +- name: postgres + file: ci/services/postgres.yml + image: postgres-((.:postgres_version)) +``` + +## Service Configuration + +Services can be configured similarly to tasks, e.g. + +```yaml +name: postgres +config: # or "file:" + image_resource: # can specify a top-level "image:" instead of "image_resource:" + type: registry-image + source: {repository: postgres} + inputs: + - name: some-input + ports: + - name: default # optional if using default name + number: 5432 + startup_probe: # By default, Concourse will wait for all the listed ports to be open + run: {path: pg_isready} + failure_threshold: 10 + period_seconds: 5 +``` + +Services can also run by sending a message to a [Prototype], similar to the `run` step, e.g. + +```yaml +name: concourse +type: docker-compose +run: up # up is the default message for prototype-based services +params: + files: + - concourse/docker-compose.yml + - ci/overrides/docker-compose.ci-containerd.yml +inputs: [concourse, ci] +ports: +- name: web + number: 8080 +``` + +### Startup Probe + +To ensure a service is ready to accept traffic before running the dependent step, the `startup_probe` must first succeed. + +`startup_probe.run` defines a process to run on the service container until it succeeds. The process will run every `startup_probe.period_seconds`, and if it fails `startup_probe.failure_threshold` times, the service will error and the dependent step will not run. + +If `startup_probe.run` is left unspecified, Concourse will wait for each of the specified ports to be open. + +## Worker Placement + +Since `services` are just bound to `task`s, the easiest approach would be to assign the service container and the task container to the same worker. This allows us to avoid a more complex architecture having to route traffic through the TSA (since workers may not be directly reachable from one another). + +This hopefully isn't *too* restrictive, as anyone running e.g. `docker-compose` in a `task` for integration testing is effectively doing the same thing (just in one mega-container instead of 2+). It's also worth noting that with a native [Kubernetes Runtime], a single "worker" will likely correspond with an entire cluster, rather than a single node in the cluster. + +However, it does mean that we can't provide services to tasks running on Windows/Darwin workers - not sure if there's much need for this, though. + +## Networking + +The way we accomplish intra-worker container-to-container networking depends on the runtime. + +### Guardian and containerd + +There are a couple of options here: + +1. Containers on the same host can communicate via the bridge network (created by a CNI plugin in our [containerd backend], not sure about Guardian...) + * Could work with minimal changes to the runtime layer + * Need extra architecture to prevent a malicious `task` from scanning the container subnet to interfere with running services (note: it's currently possible for this to happen with any `task` that runs a server, e.g. running `docker-compose` in a `task`, but is easy to prevent with some changes to firewall rules) +2. With our containerd runtime, we have more flexibility, and have the option of running both processes in the same network namespace + * This would allow communication over `localhost` + * We'll need to wait until our containerd runtime is stable so we can replace Guardian with it + * If a `task` has multiple services, two services cannot use the same ports (even if they are not exposed) + +With respect to the second point under option 1, we *can* prevent such `tasks` if this is a concern by adding a Service Manager component to each worker to register/unregister services. This component could create/destroy firewall rules granting specific containers access to others. + +![Service Manager overview](./service-manager.png) + +### Kubernetes + +When we build a [Kubernetes Runtime], we have a couple alternatives here as well that roughly mirror the choices for containerd: + +1. Run the service as its own pod + * Possible for service and `task` to run on different k8s nodes + * Can configure [network policies] to restrict access to the `service` pod +2. Run services as sidecar containers in the same pod as the `task` + * Service and `task` must run on the same k8s node + * If a `task` has multiple services, two services cannot use the same ports (even if they are not exposed) + +# Open Questions + +* Are there (sufficiently many) practical use-cases for exposing a service to multiple steps? Or is a single `task` always sufficient? +* Are there (sufficiently many) practical use-cases for exposing a service to a `task` on a Windows/Darwin worker? +* Would you ever need to run multiple services that both use the same port? +* How important is it to access the service over `localhost` vs some arbitrary hostname or IP address? + +# Answered Questions + +# New Implications + + + + + +[Prototype]: https://github.com/concourse/rfcs/blob/master/037-prototypes/proposal.md +[Kubernetes Runtime]: https://github.com/concourse/rfcs/blob/075-k8s-runtime/075-k8s-runtime/proposal.md +[containerd backend]: https://github.com/concourse/concourse/blob/27e1d83fd3d24d22a1a8d9c83823d608fae63f4a/worker/runtime/cni_network.go#L66-L80 +[network policies]: https://kubernetes.io/docs/concepts/services-networking/network-policies/ diff --git a/084-services/service-manager.png b/084-services/service-manager.png new file mode 100644 index 00000000..22a93e48 Binary files /dev/null and b/084-services/service-manager.png differ