Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Native eBPF program attach failure causes the next valid attempt to fail #2133

Open
mtfriesen opened this issue Feb 24, 2023 · 6 comments · May be fixed by #3794
Open

Native eBPF program attach failure causes the next valid attempt to fail #2133

mtfriesen opened this issue Feb 24, 2023 · 6 comments · May be fixed by #3794
Assignees
Labels
blocked Blocked on another issue that must be done first bug Something isn't working P2 triaged Discussed in a triage meeting
Milestone

Comments

@mtfriesen
Copy link
Collaborator

mtfriesen commented Feb 24, 2023

Describe the bug

If an application tries to open+load+attach the same native program immediately after it failed to attach, eBPF fails to load the program on the second attempt, seemingly because the first driver is still unloading.

[0]0C1C.0FD0::2023/02/24-10:04:25.996585800 [xdpfntest] TryAttachEbpfXdpProgram:3817 bpf_xdp_attach failed: -22
[0]0C1C.0FD0::2023/02/24-10:04:26.026993200 [xdpfntest] TryAttachEbpfXdpProgram:3796 bpf_object__load failed: -2

Separately, the same native eBPF program cannot be loaded concurrently, but it's unclear whether that is by design.

OS information

20348.859.amd64fre.fe_release_svc_prod2.220707-1832

Steps taken to reproduce bug

  1. Attach eBPF program A (native or JIT) to an XDP interface
  2. Open, load, attach native eBPF program B to the same XDP interface - this will be rejected by XDP
  3. bpf_object__close native program B.
  4. Attach native eBPF program B to the same XDP interface using XDP_FLAGS_REPLACE

Expected behavior

Attach (3) should succeed.

Actual outcome

Attach (3) fails unless the caller delays the thread between attach (2) and (3) long enough for the driver loaded during attach (2) to unload.

Additional details

ebpf_native_load_fail.log

Concrete code:

using unique_bpf_object = wistd::unique_ptr<bpf_object, wil::function_deleter<decltype(&::bpf_object__close), ::bpf_object__close>>;

static
HRESULT
TryAttachEbpfXdpProgram(
    _Out_ unique_bpf_object &BpfObject,
    _In_ const TestInterface &If,
    _In_ const CHAR *BpfRelativeFileName,
    _In_ const CHAR *BpfProgramName,
    _In_ INT AttachFlags = 0
    )
{
    HRESULT Result;
    CHAR Path[MAX_PATH];
    std::string BpfAbsoluteFileName;
    bpf_program *Program;
    int ProgramFd;
    int ErrnoResult;

    Result = GetCurrentBinaryPath(Path, RTL_NUMBER_OF(Path));
    if (FAILED(Result)) {
        goto Exit;
    }

    BpfAbsoluteFileName = Path;
    BpfAbsoluteFileName += BpfRelativeFileName;

    BpfObject.reset(bpf_object__open(BpfAbsoluteFileName.c_str()));
    if (BpfObject.get() == NULL) {
        TraceError("bpf_object__open failed: %d", errno);
        Result = E_FAIL;
        goto Exit;
    }

    ErrnoResult = bpf_object__load(BpfObject.get());
    if (ErrnoResult != 0) {
        TraceError("bpf_object__load failed: %d, errno=%d", ErrnoResult, errno);
        Result = E_FAIL;
        goto Exit;
    }

    Program = bpf_object__find_program_by_name(BpfObject.get(), BpfProgramName);
    if (Program == NULL) {
        TraceError("bpf_object__find_program_by_name failed: %d", errno);
        Result = E_FAIL;
        goto Exit;
    }

    ProgramFd = bpf_program__fd(Program);
    if (ProgramFd < 0) {
        TraceError("bpf_program__fd failed: %d", errno);
        Result = E_FAIL;
        goto Exit;
    }

    ErrnoResult = bpf_xdp_attach(If.GetIfIndex(), ProgramFd, AttachFlags, NULL);
    if (ErrnoResult != 0) {
        TraceError("bpf_xdp_attach failed: %d, errno=%d", ErrnoResult, errno);
        Result = E_FAIL;
        goto Exit;
    }

    Result = S_OK;

Exit:

    if (FAILED(Result)) {
        BpfObject.reset();
    }

    return Result;
}

VOID
GenericRxEbpfAttach()
{
    auto If = FnMpIf;

    unique_bpf_object BpfObject = AttachEbpfXdpProgram(If, "\\bpf\\drop.o", "drop");

    unique_bpf_object BpfObjectReplacement;
    TEST_TRUE(FAILED(TryAttachEbpfXdpProgram(BpfObjectReplacement, If, "\\bpf\\pass.sys", "pass")));

    //
    // TODO: eBPF doesn't wait for the pass.sys driver to completely unload
    // after tearing down the object, so allow some time for that to happen
    // before retrying with the replace flag.
    //
    Sleep(TEST_TIMEOUT_ASYNC_MS);
    BpfObjectReplacement =
        AttachEbpfXdpProgram(If, "\\bpf\\pass.sys", "pass", XDP_FLAGS_REPLACE);
}
@mtfriesen mtfriesen added the bug Something isn't working label Feb 24, 2023
@dahavey dahavey added the triaged Discussed in a triage meeting label Feb 27, 2023
@dahavey dahavey added this to the 2303 milestone Feb 27, 2023
@saxena-anurag
Copy link
Contributor

@mtfriesen can you confirm if the step 2 in the repro steps above also unloaded the eBPF program B, before moving to step 3?
If the test is unloading program B step 2, can you try the following and see if it still reproduces?

  1. not unload the program in step 2
  2. re-attach the same eBPF program B (which was loaded in step 2) in step 3?

@mtfriesen
Copy link
Collaborator Author

mtfriesen commented Feb 27, 2023

Yup, I've updated the repro steps to include the bpf_object__close between original steps 2 and 3.

Confirmed no issues issuing the two bpf_xdp_attach calls directly in sequence, i.e. without closing the BPF object created in step (2) and reusing it for original step (3).

@Alan-Jowett Alan-Jowett modified the milestones: 2303, 2304 Mar 27, 2023
@dthaler dthaler modified the milestones: 2304, 2305 Apr 10, 2023
@dthaler dthaler modified the milestones: 2305, 2306 Jun 1, 2023
@dahavey dahavey modified the milestones: 2306, 2307 Jun 26, 2023
@dv-msft dv-msft modified the milestones: 2307, 2308 Jul 28, 2023
@dahavey dahavey modified the milestones: 2308, 2309 Aug 7, 2023
@dv-msft dv-msft modified the milestones: 2309, 2310 Sep 25, 2023
@dahavey dahavey modified the milestones: 2310, 2311 Oct 25, 2023
@mtfriesen
Copy link
Collaborator Author

@shankarseal could this be prioritized? I am now needing to work around this eBPF bug in another project's test cases.

@dahavey dahavey modified the milestones: 2311, 2312 Nov 20, 2023
@Alan-Jowett Alan-Jowett added this to the 2402 milestone Jan 29, 2024
@Alan-Jowett Alan-Jowett modified the milestones: 2402, 2403 Feb 28, 2024
@Alan-Jowett Alan-Jowett added the P1 label Feb 29, 2024
@dahavey dahavey modified the milestones: 2403, 2404 Mar 27, 2024
@dahavey dahavey added blocked Blocked on another issue that must be done first and removed P1 labels Apr 29, 2024
@dahavey dahavey modified the milestones: 2404, 2405 Apr 29, 2024
@dahavey dahavey added the P1 label Apr 29, 2024
@dahavey dahavey modified the milestones: 2405, 2406 May 28, 2024
@shankarseal shankarseal modified the milestones: 2406, 2407 Jun 29, 2024
@shankarseal
Copy link
Collaborator

@Alan-Jowett -- can you make a short-term fix to return EBUSY for the scenario mentioned in this issue? I am moving this to 2408.

@shankarseal shankarseal modified the milestones: 2407, 2408 Jul 25, 2024
@Alan-Jowett
Copy link
Member

Still blocked on multi-program fix.

@Alan-Jowett Alan-Jowett modified the milestones: 2408, 2409 Aug 26, 2024
@Alan-Jowett Alan-Jowett linked a pull request Aug 28, 2024 that will close this issue
@shankarseal shankarseal modified the milestones: 2409, 2410 Sep 30, 2024
@shankarseal shankarseal added P2 and removed P1 labels Oct 21, 2024
@shankarseal shankarseal modified the milestones: 2410, 2411 Oct 21, 2024
@shankarseal shankarseal modified the milestones: 2411, 2501 Nov 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked Blocked on another issue that must be done first bug Something isn't working P2 triaged Discussed in a triage meeting
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants