Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent modunload once underlay has been set #485

Merged
merged 5 commits into from
May 21, 2024
Merged

Conversation

FelixMcFelix
Copy link
Collaborator

This PR moves the underlay teardown logic from xde_detach into a new ioctl (and opteadm command) to unset the XDE underlay. Detach will now fail so long as the underlay is set -- this successfully prevents modunload from knocking us down after this state has been set.

The semantics are more or less what you'd expect:

  • Clearing the underlay will fail if there exists a port.
  • Clearing the underlay will fail if there is no underlay.
  • Driver detach now requires that you opteadm clear-xde-underlay.

This PR moves the underlay teardown logic from xde_detach into a new
ioctl (and opteadm command) to unset the XDE underlay. Detach will now
fail so long as the underlay is set -- this successfully prevents
`modunload` from knocking us down after this state has been set.

The semantics are more or less what you'd expect:
* Clearing the underlay will fail if there exists a port.
* Clearing the underlay will fail if there is no underlay.
* Driver detach now requires that you `opteadm clear-xde-underlay`.
@FelixMcFelix
Copy link
Collaborator Author

As rough proof of operation:

kyle@farme:~/gits/opte$ pfexec opteadm set-xde-underlay igb0 igb1
kyle@farme:~/gits/opte$ modinfo | grep xde
236 fffffffff44cb000 156450 290   1  xde (xde)

kyle@farme:~/gits/opte$ pfexec modunload -i 236
can't unload the module: Device busy

kyle@farme:~/gits/opte$ pfexec opteadm clear-xde-underlay
# succeeds

kyle@farme:~/gits/opte$ pfexec opteadm clear-xde-underlay
Error: command ClearXdeUnderlay failed: System { errno: 2, msg: "underlay not yet initialized" }

kyle@farme:~/gits/opte$ pfexec modunload -i 236
kyle@farme:~/gits/opte$

Naturally we need to change the virtual hardware teardown scripts in omicron's SoftNPU-based environment to call opteadm clear-xde-underlay once this is merged.

@@ -171,6 +172,13 @@ impl OpteHdl {
run_cmd_ioctl(self.device.as_raw_fd(), cmd, Some(&req))
}

/// Clear xde underlay devices.
Copy link

@mkeeter mkeeter Apr 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vague architecture question: why is this code duplicated in opte-ioctl and opteadm? It looks like opteadm already has a dependency on opte-ioctl; is there a subtle difference between struct OpteAdm and struct OpteHdl, such that struct OpteAdm couldn't be implemented by calling into an OpteHdl handle?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Staring at it for a bit, I don't see that there is much reason for us to be doing this. We have a decent number of ioctls that appear in one but not the other. OpteAdm doesn't appear to have any methods which convert from CLI-specific datastructures, either: and if we did, there's no reason that OpteAdm couldn't just deref to an inner OpteHdl.

Opened #529.

xde/src/xde.rs Outdated
drop(u.muh);

// 2. Remove MAC client handle
if Arc::strong_count(&u.mch) > 1 {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't obviously correct; there could be a race between checking Arc::strong_count and someone else cloning the Arc (maybe this race is impossible due to other factors, but it's not obvious here).

Using Arc::into_inner (and removing drop(u.mch) below) would be more obviously correct.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that this race is at all impossible -- xde_rx has another strong handle to the MAC client handle here, and it could be called into just before we deregister the promisc handle.

That gives two bad orderings between {detach (today), underlay clear} and xde_rx, which makes me a little nervous if the two happen concurrently: we either bail on cleaning up our extra clients and leak holds on the underlying NIC, or xde_rx increments the strong count on an invalid Arc (!!!).

I think the only place this should be a risk is dev environments today, but I'm going to have a think about how we can safely quiesce here. Removing all ports is not a safe precondition, as any underlay traffic (or MAC flow matches, in future) will still land in xde_rx.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent a little time double checking promisc handle behaviour in illumos, and it does give us the guarantees we need. I've left a comment explaining its behaviour for the benefit of future readers. 😄

xde/src/xde.rs Outdated Show resolved Hide resolved
xde/src/xde.rs Outdated Show resolved Hide resolved
xde/src/xde.rs Outdated Show resolved Hide resolved
@FelixMcFelix FelixMcFelix modified the milestones: 8, 9 May 10, 2024
Following discussion, we need to figure out a safer quiesce on rx as
there is a theoretical bad ordering between `xde_rx` and underlay
removal.
After some research into MAC promiscuous callback behaviour, I've
verified that we don't need to do our own quiescing since the cleanup
operations will await any active promisc list walkers terminating. I've
explained this logic for the benefit of others and/or future Kyle.
Minor papercut I just ran into, this should keep the command
zero-friction.
@FelixMcFelix FelixMcFelix requested a review from mkeeter May 13, 2024 10:29
@FelixMcFelix FelixMcFelix merged commit 1e20821 into master May 21, 2024
10 checks passed
@FelixMcFelix FelixMcFelix deleted the underlay-lock branch May 21, 2024 08:41
FelixMcFelix added a commit that referenced this pull request Jun 3, 2024
As part of #485, we need to *inspect* the contents of the shared
`XdeState` before taking ownership and dropping its contents. However, I
had created a scenario where:

- The lock is acquired, based on a reference derived from the dev
  info ptr (`dip`),
- Ownership of the contents of `dip` is taken using `Box::new_raw`,
- These contents (`XdeState`) are freed,
- The `MutexGuard` pointing into the `XdeState` is dropped, and attempts
  to unlock a freed mutex.

This PR scopes the `&XdeState` and the derived `MutexGuard` in an inner
block, correcting this drop order issue by forcing the free to happen at
the end of a lexica scope.

Fixes #543.
FelixMcFelix added a commit that referenced this pull request Jun 4, 2024
As part of #485, we need to *inspect* the contents of the shared
`XdeState` before taking ownership and dropping its contents. However, I
had created a scenario where:

- The lock is acquired, based on a reference derived from the dev info
ptr (`dip`),
- Ownership of the contents of `dip` is taken using `Box::new_raw`,
- These contents (`XdeState`) are freed,
- The `MutexGuard` pointing into the `XdeState` is dropped, and attempts
to unlock a freed mutex.

This PR scopes the `&XdeState` and the derived `MutexGuard` in an inner
block, correcting this drop order issue by forcing the free to happen at
the end of a lexical scope.

Fixes #543.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants