Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

POC: agent integration tests using test containers #322

Merged
merged 3 commits into from
Aug 29, 2023

Conversation

pablochacin
Copy link
Collaborator

@pablochacin pablochacin commented Aug 25, 2023

Description

The agent tests usually require running the agent process and the SUT process, which is the target of the test requests. For example, for HTTP fault injection the SUT is usually httpbin. The agent uses iptables to redirect the traffic from the SUT to itself.

The main difficulty when testing the agent was how to set an environment in which the agent could safely modify the iptables.

A Kubernetes pod seemed a reasonable option as each pod can run multiple containers sharing the same network stack. Besides, the most common deployment of the agent is as an ephemeral container in a Pod.

However, implementing the integration tests in this way created several issues:

This PR is a Proof of concept of using TestContainers for the agent integration tests. TestContainers allows spawning multiple containers with the components of the tests. It also offers a library of utilities for setting the containers and retrieving information such as the container's exposed ports.

Under the philosophy of TestContainer, the agent and the SUP should run as two independent containers. However, as explained above, the agent needs to share the network stack with the SUT in order to inject the traffic redirection rules.

The test implemented in this POC exploits a not well-documented feature in Docker that allows attaching a container to the network stack (or network namespace) of another container.

This workaround could be avoided by creating a test image that includes not only the agent but also other components such as httpbin and grpcbin, and starting them as processes in the same container. However, this approach introduces several issues, such as creating the test image and launching each component as a process inside the container.

Known issues and limitations

  1. The tests seem not to work in MacOS workers. It fails with this error
2023/08/25 12:06:05 failed getting information about docker server: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?`
  1. Currently the tests use the latest tag for the agent's image. As explained in detail in Use the current branch's commit as the tag for Agent's integration tests #324, doing so introduces the risk of testing the version from the main branch instead of the one from the current branch. This could be solved relatively easily when testing locally, but in the CI requires adding additional steps for publishing the image with a tag that refers to the current branch.

Checklist:

  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works.
  • I have run linter locally (make lint) and all checks pass.
  • I have run tests locally (make test) and all tests pass.
  • I have run relevant e2e test locally (make e2e-xxx for agent, disruptors, kubernetes or cluster related changes)
  • Any dependent changes have been merged and published in downstream modules

@pablochacin pablochacin marked this pull request as draft August 25, 2023 11:30
@pablochacin pablochacin force-pushed the agent-integration-tests branch 4 times, most recently from 857a26c to 181657c Compare August 25, 2023 13:42
@pablochacin pablochacin marked this pull request as ready for review August 25, 2023 14:37
Copy link
Member

@roobre roobre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These look very nice! I just gave them a couple of runs locally and run very fast. The network mode workaround looks pretty neat as the testContainers API supports it nicely.

RE test not working in MacOS, that's interesting, as the log seems to imply that TestContainer is attempting to connect to docker through the usual unix socket. If I recall correctly, in MacOS DOCKER_HOST should be set to a tcp://somethingsomething address, and TestContainers claims to honor DOCKER_HOST.

Perhaps it would be worth checking what is the value of DOCKER_HOST in that machine to see if the system is misconfigured, or if that's a bug in TestContainers.

@pablochacin
Copy link
Collaborator Author

For me, that it doesn't work in MacOS in the CI is a minor issue, as we can run it only in Linux.

What really worries me is that we don't find a workaround for developers trying to contribute to the project.

@roobre
Copy link
Member

roobre commented Aug 25, 2023

I'll see if I can get my hands on an OSX laptop to see if I can reproduce the issue.

@roobre
Copy link
Member

roobre commented Aug 28, 2023

I was able to successfully run these on a borrowed M1 MacBook Pro with Docker Desktop 4.21.1, without any additional changes (go test -tags integration -v -cover -race ./...).

On that machine, DOCKER_HOST was unset, and there was a unix socket file in the usual path (/var/run/docker.sock).

@pablochacin pablochacin merged commit 09fcf9e into main Aug 29, 2023
6 checks passed
@pablochacin pablochacin deleted the agent-integration-tests branch August 29, 2023 17:10
@pablochacin pablochacin mentioned this pull request Aug 30, 2023
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants