don't reconfigure networkd on "stop" #107

nmeyerhans · 2024-03-05T21:30:11Z

Issue #, if available: n/a

Description of changes:

This makes some changes to the behavior of ec2-net-utils when the policy-routes service is stopped. The major change is that stop no longer removes the generated config. This reduces the amount of work done and eliminates reloading of systemd-networkd when doing so provides no meaningful benefit. Any routes and policy rules associated with an instance are deleted when an interface is removed, so the config removal is not meaningful.

This fixes an issue observed when stopping [email protected] that would lead to forwarded connections (e.g. from a local Docker bridge network) to be flushed from the conntrack tables, leading to dropped packets.

There are other smaller changes to the systemd unit files:

Set KillMode to only signal the top-level process, rather than the default behavior of signalling all processes in the cgroup
Add Wants= and Also= relationships between [email protected] and [email protected] units to clarify the relationship.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Previously, stopping [email protected] would delete the installed configuration for the foo interface and trigger a networkd configuration reload. Doing so would revert the interface's configuration back to the default, and the subsequent networkd reload would reset any conntrack state for connections associated with that interface. Doing so would cause traffic for any connections that relied on the RELATED or ESTABLISHED conntrack properties to be dropped, when the expectation is that it would continue to be passed. Impact from this issue was particularly visible on systems running Docker in bridged networking mode, where the containers rely on the Docker-installed iptables rules for connectivity, including an ACCEPT rule based on established connections, by default. In this case, any connections open from local containers to a remove service would see 100% packet loss after stopping [email protected] (where foo is the interface through which container generated traffic would egress). With this change, the generated config is left behind after stopping the [email protected], even after an ENI is removed. In practice, this is not a problem because: 1. re-attaching the same ENI will use the old configuration, with any configuration changes picked up by the policy-routes service 3. Connecting a different ENI in the same slot (thus with the same name) will not match the MAC Address value, and will use the default configuration. The policy-routes service will then generate the correct ENI-specific configuration, overwriting any existing configuration left behind by the previously attached ENI.

The systemd default of of `control-group` for this value is more aggressive than we want.

...rather than explicitly in the udev rules.

debian/patches/update-networkd-priorities.patch

vigh-m

LGTM!

rickwargo · 2024-04-05T17:02:37Z

@nmeyerhans When will this become available? I lost network connectivity last night and saw this service ultimately timeout. I also received the Systems Manager role issue (EC2RoleProvider Failed to connect to Systems Manager with instance profile role credentials - resulting in 404 from get http://169.254.169.254/latest (I think)). I have a newly installed image (Apr 3) and it is fairly vanilla. I am running gunicorn/uvicorn (I have seen that in another post with the same errors). It's odd as I only have one network interface (enX0). I'd like to try this to see if my instance stays stable.

nmeyerhans · 2024-04-10T16:34:38Z

@rickwargo I'm no longer involved in Amazon Linux development and thus cannot answer your question. Maybe @vigh-m can help. I suspect this is blocked on #108

nmeyerhans force-pushed the no-stop branch 6 times, most recently from 023e493 to 01a933b Compare March 6, 2024 00:01

nmeyerhans added this to the 2.5.0 milestone Mar 6, 2024

nmeyerhans force-pushed the no-stop branch from 01a933b to 69085e1 Compare March 6, 2024 23:51

Noah Meyerhans added 2 commits March 6, 2024 16:03

Set KillMode on the systemd services

0ba9cf8

The systemd default of of `control-group` for this value is more aggressive than we want.

nmeyerhans force-pushed the no-stop branch from 69085e1 to 66bb656 Compare March 7, 2024 00:07

nmeyerhans marked this pull request as ready for review March 7, 2024 00:29

nmeyerhans changed the title ~~WIP: don't reconfigure networkd on "stop"~~ don't reconfigure networkd on "stop" Mar 7, 2024

nmeyerhans force-pushed the no-stop branch from 66bb656 to 1424794 Compare March 7, 2024 00:51

Start the interface refresh timer as a dependency of the service

7bdd9dc

...rather than explicitly in the udev rules.

nmeyerhans force-pushed the no-stop branch from 1424794 to a3c92b3 Compare March 7, 2024 00:56

vigh-m reviewed Mar 8, 2024

View reviewed changes

debian/patches/update-networkd-priorities.patch Outdated Show resolved Hide resolved

debian: refresh update-networkd-priorities.patch

ce669f5

nmeyerhans force-pushed the no-stop branch from a3c92b3 to ce669f5 Compare March 8, 2024 01:01

vigh-m approved these changes Mar 8, 2024

View reviewed changes

vigh-m merged commit d34a1d4 into amazonlinux:main Mar 8, 2024
4 checks passed

nmeyerhans deleted the no-stop branch April 10, 2024 16:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

don't reconfigure networkd on "stop" #107

don't reconfigure networkd on "stop" #107

nmeyerhans commented Mar 5, 2024 •

edited

Loading

vigh-m left a comment

rickwargo commented Apr 5, 2024

nmeyerhans commented Apr 10, 2024

don't reconfigure networkd on "stop" #107

don't reconfigure networkd on "stop" #107

Conversation

nmeyerhans commented Mar 5, 2024 • edited Loading

vigh-m left a comment

Choose a reason for hiding this comment

rickwargo commented Apr 5, 2024

nmeyerhans commented Apr 10, 2024

nmeyerhans commented Mar 5, 2024 •

edited

Loading