Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define Linux Network Devices #1271

Open
wants to merge 18 commits into
base: main
Choose a base branch
from
Open

Conversation

aojea
Copy link

@aojea aojea commented Nov 7, 2024

The proposed "netdevices" field provides a declarative way to specify which host network devices should be moved into a container's network namespace.

This approach is similar than the existing "devices" field used for block devices but uses a dictionary keyed by the interface name instead.

The proposed scheme is based on the existing representation of network device by the struct net_device
https://docs.kernel.org/networking/netdevices.html.

This proposal focuses solely on moving existing network devices into the container namespace. It does not cover the complexities of network configuration or network interface creation, emphasizing the separation of device management and network configuration.

Fixes: #1239

@aojea
Copy link
Author

aojea commented Nov 7, 2024

/assign @samuelkarp

config-linux.md Outdated Show resolved Hide resolved
config-linux.md Outdated Show resolved Hide resolved
config-linux.md Outdated Show resolved Hide resolved

**`netdevices`** (object, OPTIONAL) set of network devices that MUST be available in the container. The runtime MAY supply them however it likes.

The name of the network device is the entry key.
Copy link
Member

@AkihiroSuda AkihiroSuda Nov 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the map order matter? If so, implementation can be complicated for Go

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the linux kernel guarantees the uniqueness of the name in the runtime namespace, so a set is ok. Order is not important , each network device should be independent of each other ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we recommend a runtime performs a uniqueness check as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uniqueness inside container should be checked, e.g. that rename operation was successful

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added more text to clarify runtime checks and network devices lifecycle, PTAL

@AkihiroSuda
Copy link
Member

@aojea aojea force-pushed the network-devices branch 2 times, most recently from 51e5104 to 3a666eb Compare November 12, 2024 12:26
@aojea
Copy link
Author

aojea commented Nov 12, 2024

https://github.com/opencontainers/runtime-spec/blob/main/features.md should be updated too

updated and addressed the comments


**`netdevices`** (object, OPTIONAL) set of network devices that MUST be available in the container. The runtime MAY supply them however it likes.

The name of the network device is the entry key.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we recommend a runtime performs a uniqueness check as well?

schema/config-linux.json Outdated Show resolved Hide resolved
schema/defs-linux.json Outdated Show resolved Hide resolved
schema/test/config/bad/linux-netdevice.json Outdated Show resolved Hide resolved
schema/test/config/good/linux-netdevice.json Outdated Show resolved Hide resolved
@aojea
Copy link
Author

aojea commented Nov 12, 2024

AI @aojea (document the cleanup and destroy of the network interfaces)

config-linux.md Outdated Show resolved Hide resolved
@samuelkarp
Copy link
Member

From the in-person discussion today:

  • Net device lifecycle should follow the network namespace lifecycle
  • @aojea will follow up to determine whether any cleanup actions need to be taken by the OCI runtime on a container being deleted
  • @kad was concerned about restarts and error handling
  • Should we prohibit the new netdev addition to an existing netns? IOW only allow this for containers where a new netns is created? What about containers where the root netns is used?

config-linux.md Outdated

This schema focuses solely on moving existing network devices identified by name into the container namespace. It does not cover the complexities of network device creation or network configuration, such as IP address assignment, routing, and DNS setup.

**`netDevices`** (object, OPTIONAL) set of network devices that MUST be available in the container. The runtime is responsible for providing these devices; the underlying mechanism is implementation-defined.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This spec said "MUST" but, I think it can't do it in the rootless container because the rootless container doesn't have CAP_NET_ADMIN, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we should take care of the rootless container.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be an error in the case of a rootless container, if the runtime is not able to satisfy the MUST condition.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be an error in the case of a rootless container, if the runtime is not able to satisfy the MUST condition.

+1 but It'd be better to clarify it in the spec.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added mor explanations about runtime and network devices lifecycle and runtime checks, PTAL

@aojea
Copy link
Author

aojea commented Nov 19, 2024

om the in-person discussion today:

  • Net device lifecycle should follow the network namespace lifecycle
  • @aojea will follow up to determine whether any cleanup actions need to be taken by the OCI runtime on a container being deleted
  • @kad was concerned about restarts and error handling
  • Should we prohibit the new netdev addition to an existing netns? IOW only allow this for containers where a new netns is created? What about containers where the root netns is used?

Pushed a new commit addressing those comments, the changelog is

  • the network namespace lifecycle will move migratebale network devices and destroy virtual devides, the runtime MAY decide to do cleanup actions
  • runtime MUST check the container has enough privileges and an associated network namespace and fail if the check fail
  • removed the Mask field and use the Address field with CIDR notation (IP/Prefix) to deal with IPv4 and IPv6 addresses. Only one IP is allowed to be specified on purpose to simplify the operations and reduce risks
  • Add a HardwareAddress field for use cases that require to set a
    specific mac or infiniband address

config-linux.md Outdated Show resolved Hide resolved
config-linux.md Outdated Show resolved Hide resolved
@utam0k
Copy link
Member

utam0k commented Dec 12, 2024

@aojea Thanks for your hard work! It looks good to me. Also, I agree that it includes 1.3.0.

config-linux.md Outdated Show resolved Hide resolved
config-linux.md Outdated Show resolved Hide resolved
config-linux.md Outdated Show resolved Hide resolved
config-linux.md Outdated
The runtime MUST revert back the original name to guarantee the idempotence of operations, so a container that moves an interfaces and renames it can be created and destroyed multiple times with the same result.
* **`addresses`** *(array of strings, OPTIONAL)* - the IP addresses, IPv4 and or IPv6, of the device within the container in CIDR format (IP address / Prefix). All IPv4 addresses SHOULD be expressed in their decimal format, consisting of four decimal numbers separated by periods. Each number ranges from 0 to 255 and represents an octet of the address. IPv6 addresses SHOULD be represented in their canonical form as defined in RFC 5952.
The runtime MAY limit the number of addresses allowed.
The runtime MAY decide to revert back the original addreses.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to clarify the expected behavior or add a field to ensure consistent behavior across runtimes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also depends on whether the interface will be moved back or destroyed.

What about "The runtime MAY decide to revert back the original addreses or completely remove all the existing addresses"?


* **`name`** *(string, OPTIONAL)* - the name of the network device inside the container namespace. If not specified, the host name is used. The network device name is unique per network namespace, if an existing network device with the same name exists that rename operation will fail. The runtime MAY check that the name is unique before the rename operation.
The runtime MUST revert back the original name to guarantee the idempotence of operations, so a container that moves an interfaces and renames it can be created and destroyed multiple times with the same result.
* **`addresses`** *(array of strings, OPTIONAL)* - the IP addresses, IPv4 and or IPv6, of the device within the container in CIDR format (IP address / Prefix). All IPv4 addresses SHOULD be expressed in their decimal format, consisting of four decimal numbers separated by periods. Each number ranges from 0 to 255 and represents an octet of the address. IPv6 addresses SHOULD be represented in their canonical form as defined in RFC 5952.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the runtime expected to set this? It looks like it is. Let us say that in the spec.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the input to the runtime, the runtime may choose how to set them meanwhile is consistent.
The context is that from kubernetes we got bitten by this, so is a recommendation because we find very hard to enforce this as input as it may break some clients , more context in https://daniel.haxx.se/blog/2021/04/19/curl-those-funny-ipv4-addresses/

specs-go/config.go Outdated Show resolved Hide resolved
aojea and others added 18 commits December 12, 2024 18:59
The proposed "netdevices" field provides a declarative way to
specify which host network devices should be moved into a container's
network namespace.

This approach is similar than the existing "devices" field used for block
devices but uses a dictionary keyed by the interface name instead.

The proposed scheme is based on the existing representation of network
device by the `struct net_device`
https://docs.kernel.org/networking/netdevices.html.

This proposal focuses solely on moving existing network devices into
the container namespace. It does not cover the complexities of
network configuration or network interface creation, emphasizing the
separation of device management and network configuration.

Signed-off-by: Antonio Ojea <[email protected]>
- Clarify network device lifecycle and runtime checks during creation
  and deleting of the container.
- Remove mask field and instead use the Address field with CIDR annotation to allow
to use it for both IPv4 or IPv6.
- Add a HardwareAddress field for use cases that require to set a
  specific mac or infiniband address.

Signed-off-by: Antonio Ojea <[email protected]>
Signed-off-by: Antonio Ojea <[email protected]>
Signed-off-by: Antonio Ojea <[email protected]>
Co-authored-by: Albin Kerouanton <[email protected]>
Signed-off-by: Antonio Ojea <[email protected]>
Signed-off-by: Antonio Ojea <[email protected]>
- Remove reference to rootless containers, the feature flag will be used
  by the corredponding runtime to indicate if the feature is supported.
- Clarify the runtime MUST set the interface UP when moving it to the
  container network namesapce
- Clarify the runtime MUST revert back the original name if the
  interface is renamed to guarantee idempotence
- Clarify the runtime MAY choose to revert the other original attributes
  like addresses, mtu and hardware address.

Signed-off-by: Antonio Ojea <[email protected]>
Co-authored-by: Albin Kerouanton <[email protected]>
Signed-off-by: Antonio Ojea <[email protected]>
Signed-off-by: Antonio Ojea <[email protected]>
Co-authored-by: Mrunal Patel <[email protected]>
Signed-off-by: Antonio Ojea <[email protected]>
Signed-off-by: Antonio Ojea <[email protected]>
Co-authored-by: Mrunal Patel <[email protected]>
Signed-off-by: Antonio Ojea <[email protected]>
Signed-off-by: Antonio Ojea <[email protected]>
Co-authored-by: Mrunal Patel <[email protected]>
Signed-off-by: Antonio Ojea <[email protected]>
Signed-off-by: Antonio Ojea <[email protected]>
Co-authored-by: Mrunal Patel <[email protected]>
Signed-off-by: Antonio Ojea <[email protected]>
Signed-off-by: Antonio Ojea <[email protected]>
@aojea
Copy link
Author

aojea commented Dec 12, 2024

Changelog since last review, only the part that was not clear, to decide if runtime MUST or MAY bring back the interface to the host namespace, got updated to make it MUST , since is less ambiguous and covers all use cases, more on #1271 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Proposal: Network Devices
10 participants