Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correctly scope lock use in xde_detach #545

Merged
merged 2 commits into from
Jun 4, 2024
Merged

Correctly scope lock use in xde_detach #545

merged 2 commits into from
Jun 4, 2024

Conversation

FelixMcFelix
Copy link
Collaborator

As part of #485, we need to inspect the contents of the shared XdeState before taking ownership and dropping its contents. However, I had created a scenario where:

  • The lock is acquired, based on a reference derived from the dev info ptr (dip),
  • Ownership of the contents of dip is taken using Box::new_raw,
  • These contents (XdeState) are freed,
  • The MutexGuard pointing into the XdeState is dropped, and attempts to unlock a freed mutex.

This PR scopes the &XdeState and the derived MutexGuard in an inner block, correcting this drop order issue by forcing the free to happen at the end of a lexical scope.

Fixes #543.

As part of #485, we need to *inspect* the contents of the shared
`XdeState` before taking ownership and dropping its contents. However, I
had created a scenario where:

- The lock is acquired, based on a reference derived from the dev
  info ptr (`dip`),
- Ownership of the contents of `dip` is taken using `Box::new_raw`,
- These contents (`XdeState`) are freed,
- The `MutexGuard` pointing into the `XdeState` is dropped, and attempts
  to unlock a freed mutex.

This PR scopes the `&XdeState` and the derived `MutexGuard` in an inner
block, correcting this drop order issue by forcing the free to happen at
the end of a lexica scope.

Fixes #543.
@FelixMcFelix FelixMcFelix requested a review from luqmana June 3, 2024 09:46
@FelixMcFelix FelixMcFelix self-assigned this Jun 3, 2024
@FelixMcFelix
Copy link
Collaborator Author

FelixMcFelix commented Jun 3, 2024

I've tested that this works via:

pfexec bash -c "echo set kmem_flags = 0xf > /etc/system.d/kmem_debug"
pfexec reboot
#...
pfexec add_drv xde
pfexec modunload -i 0

For some more confirmation, we can compare the relevant MIR output with cargo +nightly-2024-05-12-x86_64-unknown-illumos rustc -- -Zunpretty=stable-mir. (Note: very wide table below).

BeforeAfter
fn xde::xde_detach(_1: *mut illumos_sys_hdrs::dev_info_2: illumos_sys_hdrs::ddi_detach_cmd_t) -> i32 {
    // ...
    let  _29: &XdeState;
    let  _30: KMutexGuard<'_, core::option::Option<UnderlayState>>;
    let mut _31: &KMutex<core::option::Option<UnderlayState>>;
    let mut _32: bool;
    let  _33: &core::option::Option<UnderlayState>;
    let mut _34: &KMutexGuard<'_, core::option::Option<UnderlayState>>;
    let mut _45: Box<XdeState>;
    // ...
    bb25: {
        _32 = core::option::Option::<UnderlayState>::is_some(_33) -> [return: bb26, unwind unreachable];
    }
    bb26: {
        switchInt(move _32) -> [0: bb35, otherwise: bb27];
    }
    // ...
    bb35: {
        _45 = Box::<XdeState>::from_raw(_23) -> [return: bb36, unwind unreachable];
    }
    bb36: {
        _44 = core::mem::drop::<Box<XdeState>>(move _45) -> [return: bb37, unwind unreachable];
    }
    bb37: {
        _48 = {alloc8: *mut *mut illumos_sys_hdrs::dev_info};
        _59 = _48 as *const ();
        _60 = _59 as usize;
        _61 = AlignOf *mut illumos_sys_hdrs::dev_info " ";
        _62 = Sub(_61, 1_usize);
        _63 = BitAnd(_60, _62);
        _64 = Eq(_63, 0_usize);
        assert(_64, "misaligned pointer dereference: address must be a multiple of {} but is {}",_61, _60) -> [success: bb43, unwind unreachable];
    }
    // reordered for clarity
    bb43: {
        _47 = (*_48);
        _46 = illumos_sys_hdrs::ddi_remove_minor_node(move _47, xde::XDE_STR) -> [return: bb38, unwind unreachable];
    }
    bb38: {
        _49 = null_mut::<illumos_sys_hdrs::dev_info>() -> [return: bb39, unwind unreachable];
    }
    bb39: {
        _50 = {alloc8: *mut *mut illumos_sys_hdrs::dev_info};
        _53 = _50 as *const ();
        _54 = _53 as usize;
        _55 = AlignOf *mut illumos_sys_hdrs::dev_info " ";
        _56 = Sub(_55, 1_usize);
        _57 = BitAnd(_54, _56);
        _58 = Eq(_57, 0_usize);
        assert(_58, "misaligned pointer dereference: address must be a multiple of {} but is {}",_55, _54) -> [success: bb42, unwind unreachable];
    }
    bb42: {
        (*_50) = move _49;
        _0 = illumos_sys_hdrs::DDI_SUCCESS;
        drop(_30) -> [return: bb41, unwind unreachable];
    }
    bb41: {
        return;
    }
fn xde::xde_detach(_1: *mut illumos_sys_hdrs::dev_info_2: illumos_sys_hdrs::ddi_detach_cmd_t) -> i32 {
    // ...
    let  _23: *mut XdeState;
    let  _29: &XdeState;
    let  _30: KMutexGuard<'_, core::option::Option<UnderlayState>>;
    let mut _31: &KMutex<core::option::Option<UnderlayState>>;
    let mut _32: bool;
    let  _33: &core::option::Option<UnderlayState>;
    let mut _34: &KMutexGuard<'_, core::option::Option<UnderlayState>>;
    let mut _45: Box<XdeState>;
    // ...
        bb25: {
        _32 = core::option::Option::<UnderlayState>::is_some(_33) -> [return: bb26, unwind unreachable];
    }
    bb26: {
        switchInt(move _32) -> [0: bb35, otherwise: bb27];
    }
    bb35: {
        drop(_30) -> [return: bb36, unwind unreachable];
    }
    bb36: {
        _45 = Box::<XdeState>::from_raw(_23) -> [return: bb37, unwind unreachable];
    }
    bb37: {
        _44 = core::mem::drop::<Box<XdeState>>(move _45) -> [return: bb38, unwind unreachable];
    }
    /// ...eventually returns.

XdeState is freed in bb36, then the Mutex guard which points into this state is incorrectly released in bb42.

The Mutex guard is dropped in bb35, and the XdeState is freed in bb37 as required.

Copy link
Contributor

@luqmana luqmana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing that Kyle (and Andy for catching it)!

I realized we would've caught this sooner if KMutexGuard actually had a Drop impl that called mutex_destroy–which is a thing we should be doing anyways:

  mutex_destroy() releases any resources that might have been allocated
  by mutex_init(). mutex_destroy() must be called before freeing the
  memory containing the mutex, and should be called with the mutex unheld
  (not owned by any thread). The caller must be sure that no other thread
  attempts to use the mutex.

That would've marked the mutex as destroyed and caused mutex_exit from the guard's drop to panic like in #543, just without relying on a debug kernel filling free'd memory with deadbeef.

The fix should work as-is but mind adding the Drop impls for KMutex and KRwLock (which is similarly missing rw_destroy).

Required a little creativity to satisfy borrowck around
`KMutex::into_inner`, but we got there.
@FelixMcFelix
Copy link
Collaborator Author

I've taken some laps spinning up and down omicron, and I'm not seeing any lingering issues unearthed by the mutex_destroy and rw_destroy additions. I think this is good to go.

@FelixMcFelix FelixMcFelix merged commit 417f74e into master Jun 4, 2024
10 checks passed
@FelixMcFelix FelixMcFelix deleted the fix-543 branch June 4, 2024 01:30
FelixMcFelix added a commit to oxidecomputer/omicron that referenced this pull request Jun 4, 2024
This PR includes one soundness fix over the current OPTE version, as
well as dependency updates.

* Correctly scope lock use in xde_detach (oxidecomputer/opte#545)
* Lock file maintenance (oxidecomputer/opte#544)
* Lock file maintenance (oxidecomputer/opte#541)
FelixMcFelix added a commit to oxidecomputer/omicron that referenced this pull request Jun 5, 2024
This PR includes one soundness fix over the current OPTE version, as
well as dependency updates.

* Correctly scope lock use in xde_detach (oxidecomputer/opte#545)
* Lock file maintenance (oxidecomputer/opte#544)
* Lock file maintenance (oxidecomputer/opte#541)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Kernel panic on modunload with debug illumos
2 participants