Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(modern_ebpf): address verifier issues on kernel versions >=6.11.4 #2150

Merged
merged 9 commits into from
Nov 8, 2024

Conversation

Andreagit97
Copy link
Member

What type of PR is this?

/kind bug

Any specific area of the project related to this PR?

/area driver-modern-bpf

Does this PR require a change in the driver versions?

No

What this PR does / why we need it:

There are many changes but they are almost all renames, review it commit by commit.

This PR addresses a kernel-breaking change in the tail call ebpf API merged into the 6.11.4.
The reason behind the change is a CVE https://access.redhat.com/security/cve/cve-2024-50063
The commit that introduces the breaking change is https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=5d5e3b4cbe8ee16b7bf96fd73a421c92a9da3ca1

The fix on our side is pretty simple in the end, we avoid tail calls were possible and if not possible we use a dedicated tail call. Before this patch, we had a unique tail table for all the progs since it was possible to have a single tail table even if the attached function prototype was different.

Let's go commit by commit

  • sched_proc_exec has its own tail table, but this is handled entirely bpf side (so quite handy)
  • sched_proc_fork has its own table as well
  • hotplug events will be sent only by the syscall exit dispatcher. This is more than enough and allows us to simplify the code.
  • the sampling logic is now only for syscalls, tracepoint events are always dropped in dropping mode (with the exception of sched_switch)
  • cleanup, set sched_switch to UF_ALWAYS_DROP
  • cleanup
  • cleanup (take a look if I missed something obvious)
  • cleanup

Which issue(s) this PR fixes:

Special notes for your reviewer:

The verifier error corresponding to this issue was

libbpf: prog 'pf_user': BPF program load failed: Invalid argument
libbpf: prog 'pf_user': -- BEGIN PROG LOAD LOG --
processed 282 insns (limit 1000000) max_states_per_insn 0 total_states 17 peak_states 17 mark_read 8
-- END PROG LOAD LOG --
libbpf: prog 'pf_user': failed to load: -22
libbpf: failed to load object 'bpf_probe'
libbpf: failed to load BPF skeleton 'bpf_probe': -22
libpman: failed to load BPF object (errno: 22 | message: Invalid argument)

Does this PR introduce a user-facing change?:

NONE

With this new approach the tail calls are only visible inside the
sched_proc_exec file.

Signed-off-by: Andrea Terzolo <[email protected]>
Signed-off-by: Andrea Terzolo <[email protected]>
Signed-off-by: Andrea Terzolo <[email protected]>
Copy link

github-actions bot commented Nov 6, 2024

Please double check driver/SCHEMA_VERSION file. See versioning.

/hold

@Andreagit97 Andreagit97 changed the title Fix modern fedora fix(modern_ebpf): address verifier issues on kernel versions >=6.11.4 Nov 6, 2024
@Andreagit97 Andreagit97 added this to the 0.19.0 milestone Nov 6, 2024
@Andreagit97 Andreagit97 modified the milestones: 0.19.0, next-driver Nov 6, 2024
@Andreagit97
Copy link
Member Author

@falcosecurity/libs-maintainers this should fix the second issue here falcosecurity/falco#3323

Copy link

github-actions bot commented Nov 6, 2024

Perf diff from master - unit tests

     9.52%     +0.95%  [.] sinsp_parser::reset
     1.93%     +0.91%  [.] scap_event_decode_params
     4.99%     -0.73%  [.] sinsp_parser::process_event
     5.55%     -0.71%  [.] sinsp_evt::get_type
     1.67%     +0.62%  [.] std::_Hashtable<long, std::pair<long const, std::shared_ptr<sinsp_threadinfo> >, std::allocator<std::pair<long const, std::shared_ptr<sinsp_threadinfo> > >, std::__detail::_Select1st, std::equal_to<long>, std::hash<long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_find_before_node
     1.09%     -0.59%  [.] std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Identity, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, true, true> >::find
     0.96%     -0.55%  [.] sinsp_evt::get_direction
     0.85%     -0.50%  [.] std::__detail::_Hashtable_alloc<std::allocator<std::__detail::_Hash_node<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, libsinsp::state::dynamic_struct::field_info>, true> > >::_M_deallocate_nodes
     4.03%     +0.50%  [.] sinsp_evt::load_params
     2.43%     -0.43%  [.] sinsp_thread_manager::find_thread

Heap diff from master - unit tests

peak heap memory consumption: 0B
peak RSS (including heaptrack overhead): 0B
total memory leaked: 0B

Heap diff from master - scap file

peak heap memory consumption: 0B
peak RSS (including heaptrack overhead): 0B
total memory leaked: 0B

Benchmarks diff from master

Comparing gbench_data.json to /root/actions-runner/_work/libs/libs/build/gbench_data.json
Benchmark                                                         Time             CPU      Time Old      Time New       CPU Old       CPU New
----------------------------------------------------------------------------------------------------------------------------------------------
BM_sinsp_split_mean                                            +0.0118         +0.0118           152           154           152           154
BM_sinsp_split_median                                          +0.0100         +0.0100           153           154           153           154
BM_sinsp_split_stddev                                          +0.2007         +0.2008             1             2             1             2
BM_sinsp_split_cv                                              +0.1867         +0.1868             0             0             0             0
BM_sinsp_concatenate_paths_relative_path_mean                  -0.0325         -0.0325            62            60            62            60
BM_sinsp_concatenate_paths_relative_path_median                -0.0348         -0.0348            62            60            62            60
BM_sinsp_concatenate_paths_relative_path_stddev                -0.6051         -0.6050             1             0             1             0
BM_sinsp_concatenate_paths_relative_path_cv                    -0.5919         -0.5917             0             0             0             0
BM_sinsp_concatenate_paths_empty_path_mean                     +0.0454         +0.0454            24            25            24            25
BM_sinsp_concatenate_paths_empty_path_median                   +0.0448         +0.0447            24            25            24            25
BM_sinsp_concatenate_paths_empty_path_stddev                   -0.5484         -0.5442             0             0             0             0
BM_sinsp_concatenate_paths_empty_path_cv                       -0.5680         -0.5640             0             0             0             0
BM_sinsp_concatenate_paths_absolute_path_mean                  +0.0224         +0.0224            63            64            62            64
BM_sinsp_concatenate_paths_absolute_path_median                +0.0134         +0.0134            63            64            63            64
BM_sinsp_concatenate_paths_absolute_path_stddev                -0.8276         -0.8278             1             0             1             0
BM_sinsp_concatenate_paths_absolute_path_cv                    -0.8314         -0.8316             0             0             0             0
BM_sinsp_split_container_image_mean                            +0.0110         +0.0110           398           402           398           402
BM_sinsp_split_container_image_median                          +0.0119         +0.0119           397           402           397           402
BM_sinsp_split_container_image_stddev                          +0.1844         +0.1825             1             2             1             2
BM_sinsp_split_container_image_cv                              +0.1715         +0.1697             0             0             0             0

Copy link

codecov bot commented Nov 6, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 74.67%. Comparing base (f82c686) to head (77ea1ca).
Report is 17 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2150      +/-   ##
==========================================
- Coverage   74.69%   74.67%   -0.03%     
==========================================
  Files         254      254              
  Lines       33502    33513      +11     
  Branches     5746     5729      -17     
==========================================
+ Hits        25025    25026       +1     
+ Misses       8477     8456      -21     
- Partials        0       31      +31     
Flag Coverage Δ
libsinsp 74.67% <ø> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

github-actions bot commented Nov 6, 2024

X64 kernel testing matrix

KERNEL CMAKE-CONFIGURE KMOD BUILD KMOD SCAP-OPEN BPF-PROBE BUILD BPF-PROBE SCAP-OPEN MODERN-BPF SCAP-OPEN
amazonlinux2-4.19 🟢 🟢 🟢 🟢 🟢 🟡
amazonlinux2-5.10 🟢 🟢 🟢 🟢 🟢 🟢
amazonlinux2-5.15 🟢 🟢 🟢 🟢 🟢 🟢
amazonlinux2-5.4 🟢 🟢 🟢 🟢 🟢 🟡
amazonlinux2022-5.15 🟢 🟢 🟢 🟢 🟢 🟢
amazonlinux2023-6.1 🟢 🟢 🟢 🟢 🟢 🟢
archlinux-6.0 🟢 🟢 🟢 🟢 🟢 🟢
archlinux-6.7 🟢 🟢 🟢 🟢 🟢 🟢
centos-3.10 🟢 🟢 🟢 🟡 🟡 🟡
centos-4.18 🟢 🟢 🟢 🟢 🟢 🟢
centos-5.14 🟢 🟢 🟢 🟢 🟢 🟢
fedora-5.17 🟢 🟢 🟢 🟢 🟢 🟢
fedora-5.8 🟢 🟢 🟢 🟢 🟢 🟢
fedora-6.2 🟢 🟢 🟢 🟢 🟢 🟢
oraclelinux-3.10 🟢 🟢 🟢 🟡 🟡 🟡
oraclelinux-4.14 🟢 🟢 🟢 🟢 🟢 🟡
oraclelinux-5.15 🟢 🟢 🟢 🟢 🟢 🟢
oraclelinux-5.4 🟢 🟢 🟢 🟢 🟢 🟡
ubuntu-4.15 🟢 🟢 🟢 🟢 🟢 🟡
ubuntu-5.8 🟢 🟢 🟢 🟢 🟢 🟡
ubuntu-6.5 🟢 🟢 🟢 🟢 🟢 🟢

ARM64 kernel testing matrix

KERNEL CMAKE-CONFIGURE KMOD BUILD KMOD SCAP-OPEN BPF-PROBE BUILD BPF-PROBE SCAP-OPEN MODERN-BPF SCAP-OPEN
amazonlinux2-5.4 🟢 🟢 🟢 🟢 🟢 🟡
amazonlinux2022-5.15 🟢 🟢 🟢 🟢 🟢 🟢
fedora-6.2 🟢 🟢 🟢 🟢 🟢 🟢
oraclelinux-4.14 🟢 🟢 🟢 🟡 🟡 🟡
oraclelinux-5.15 🟢 🟢 🟢 🟢 🟢 🟢
ubuntu-6.5 🟢 🟢 🟢 🟢 🟢 🟢

@Andreagit97
Copy link
Member Author

The unique real failure here is this:

[ RUN      ] Actions.sampling_ratio_check_DROP_E_DROP_X
/home/runner/work/libs/libs/test/drivers/test_suites/actions_suite/sampling_ratio.cpp:309: Failure
Failed
Found 'drop_e' = true, found 'drop_x' = false

[  FAILED  ] Actions.sampling_ratio_check_DROP_E_DROP_X (136650 ms)

But this is failing on the legacy ebpf probably as a consequence of putting SCHED_SWITCH to always drop, since this is the only change cross-driver, i will try to see if we can do something for it

@leogr
Copy link
Member

leogr commented Nov 7, 2024

Please double check driver/SCHEMA_VERSION file. See versioning.

/hold

Since extra_event_prog_code is kinda internal stuff for the modern ebpf and not actually part of the event schema (nor the kernelspace<->userspace API), we shouldn't bump the SCHEMA_VERSION. So
/unhold

@Andreagit97 do you agree?

@Andreagit97
Copy link
Member Author

Andreagit97 commented Nov 7, 2024

The unique real failure here is this:

[ RUN      ] Actions.sampling_ratio_check_DROP_E_DROP_X
/home/runner/work/libs/libs/test/drivers/test_suites/actions_suite/sampling_ratio.cpp:309: Failure
Failed
Found 'drop_e' = true, found 'drop_x' = false

[  FAILED  ] Actions.sampling_ratio_check_DROP_E_DROP_X (136650 ms)

But this is failing on the legacy ebpf probably as a consequence of putting SCHED_SWITCH to always drop, since this is the only change cross-driver, i will try to see if we can do something for it

Without sched_switch and a sampling ratio so high (128) it was possible that in 50000 events there was no a drop_x event. I reduced the sampling ratio to 2 and increased the number of events to process, this should cover the fact that now sched_switch are disabled

Copy link
Contributor

@Molter73 Molter73 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just tested the fix in a affected system and the verifier is now happy again, thanks for putting this together @Andreagit97.

There is a segfault on one of the tests that worries me a bit, so I'm holding off on approving until that is addressed.

Copy link
Contributor

@jasondellaluce jasondellaluce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@poiana
Copy link
Contributor

poiana commented Nov 7, 2024

LGTM label has been added.

Git tree hash: e4f7b91cb93ebdb404a911da38adcb8a3480718d

@Andreagit97
Copy link
Member Author

@Molter73 if you mean this one

[ RUN      ] async_key_value_source_test.prune_old_metadata
pure virtual method called
terminate called without an active exception
Aborted (core dumped)
make[7]: *** [libsinsp/test/CMakeFiles/run-unit-test-libsinsp.dir/build.make:70: libsinsp/test/CMakeFiles/run-unit-test-libsinsp] Error 134
make[6]: *** [CMakeFiles/Makefile2:1484: libsinsp/test/CMakeFiles/run-unit-test-libsinsp.dir/all] Error 2
make[5]: *** [CMakeFiles/Makefile2:1491: libsinsp/test/CMakeFiles/run-unit-test-libsinsp.dir/rule] Error 2
make[4]: *** [Makefile:650: run-unit-test-libsinsp] Error 2
make[3]: *** [CMakeFiles/run-unit-tests.dir/build.make:70: CMakeFiles/run-unit-tests] Error 2
make[2]: *** [CMakeFiles/Makefile2:692: CMakeFiles/run-unit-tests.dir/all] Error 2
make[1]: *** [CMakeFiles/Makefile2:699: CMakeFiles/run-unit-tests.dir/rule] Error 2
make: *** [Makefile:299: run-unit-tests] Error 2

This is not related to this PR, all PRs on libs are failing on this since the switch to GitHub arm64 runners

Copy link
Contributor

@Molter73 Molter73 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@poiana
Copy link
Contributor

poiana commented Nov 7, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Andreagit97, Molter73

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [Andreagit97,Molter73]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@Andreagit97
Copy link
Member Author

/merge

@poiana poiana merged commit 75133a6 into master Nov 8, 2024
57 of 62 checks passed
@poiana poiana deleted the fix_modern_fedora branch November 8, 2024 08:09
@FedeDP
Copy link
Contributor

FedeDP commented Nov 13, 2024

/milestone 0.18.2

@poiana poiana modified the milestones: next-driver, 0.18.2 Nov 13, 2024
@FedeDP FedeDP mentioned this pull request Nov 13, 2024
hhoffstaette added a commit to hhoffstaette/gentoo that referenced this pull request Nov 20, 2024
A recent BPF verifier vulnerability made some fixes to the modern
BPF probe necessary. See: falcosecurity/libs#2150

Signed-off-by: Holger Hoffstätte <[email protected]>
hhoffstaette added a commit to hhoffstaette/portage that referenced this pull request Nov 20, 2024
A recent BPF verifier vulnerability made some fixes to the modern
BPF probe necessary. See: falcosecurity/libs#2150
hhoffstaette added a commit to hhoffstaette/gentoo that referenced this pull request Nov 25, 2024
A recent BPF verifier vulnerability made some fixes to the modern
BPF probe necessary. See: falcosecurity/libs#2150

Signed-off-by: Holger Hoffstätte <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants