Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix EC2 failures when Github runner name is already taken #392

Merged
merged 1 commit into from
Nov 14, 2024

Conversation

mkannwischer
Copy link
Contributor

@mkannwischer mkannwischer commented Nov 13, 2024

I suspect that our CI failures may be due to the runner registration failing due to the name already being used. By default, it is using the hostname as the Github runner name which defaults to the private IP address on EC2 instances. When instances get terminated, it does not delete the runners in Github which means we now have a lot of offline runners listed.

This may be explaining why our CI sometimes timesouts due to the runner not being registered.

I patched ec2-github-runner to add the --replace flag when registering the runner:
mkannwischer/ec2-github-runner@d15c880

Fixes #391.

@mkannwischer mkannwischer added the benchmark this PR should be benchmarked in CI label Nov 13, 2024
@mkannwischer
Copy link
Contributor Author

Damn - it still happens

@mkannwischer mkannwischer added benchmark this PR should be benchmarked in CI and removed benchmark this PR should be benchmarked in CI labels Nov 13, 2024
@mkannwischer mkannwischer added benchmark this PR should be benchmarked in CI and removed benchmark this PR should be benchmarked in CI labels Nov 13, 2024
@mkannwischer mkannwischer added benchmark this PR should be benchmarked in CI and removed benchmark this PR should be benchmarked in CI labels Nov 13, 2024
@mkannwischer mkannwischer marked this pull request as ready for review November 13, 2024 08:27
@mkannwischer
Copy link
Contributor Author

After running this a few times, the only failures I am seeing now are related to the vCPU limit. I had only once failure that was not related to that and I cannot explain that one.
Still it appears to be a huge improvement over what we had before.

@mkannwischer mkannwischer marked this pull request as draft November 13, 2024 12:54
I suspect that our CI failures may be due to the runner
registration failing due to the name already being used.
By default, it is using the hostname as the Github runner
name which defaults to the private IP address on EC2
instances. When instances get terminated, it does not
delete the runners in Github which means we now have
a lot of offline runners listed.

This may be explaining why our CI sometimes timesouts
due to the runner not being registered.

I patched ec2-github-runner to add the `--replace` flag
when registering the runner.

Signed-off-by: Matthias J. Kannwischer <[email protected]>
@mkannwischer mkannwischer marked this pull request as ready for review November 14, 2024 05:29
Copy link
Contributor

@hanno-becker hanno-becker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @mkannwischer for investigating. Let's merge this and monitor if it helps.

@hanno-becker hanno-becker merged commit 4e8db5d into main Nov 14, 2024
33 checks passed
@hanno-becker hanno-becker deleted the fix-ec2-ci branch November 14, 2024 05:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
benchmark this PR should be benchmarked in CI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CI timeouts due to EC2 runners sometimes not picking up the CI jobs
2 participants