Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(libsinsp): enable metrics collector on all platforms #1870

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

mrgian
Copy link
Contributor

@mrgian mrgian commented May 16, 2024

What type of PR is this?

Uncomment one (or more) /kind <> lines:

/kind bug

/kind cleanup

/kind design

/kind documentation

/kind failing-test

/kind feature

Any specific area of the project related to this PR?

Uncomment one (or more) /area <> lines:

/area API-version

/area build

/area CI

/area driver-kmod

/area driver-bpf

/area driver-modern-bpf

/area libscap-engine-bpf

/area libscap-engine-gvisor

/area libscap-engine-kmod

/area libscap-engine-modern-bpf

/area libscap-engine-nodriver

/area libscap-engine-noop

/area libscap-engine-source-plugin

/area libscap-engine-savefile

/area libscap

/area libpman

/area libsinsp

/area tests

/area proposals

Does this PR require a change in the driver versions?

/version driver-API-version-major

/version driver-API-version-minor

/version driver-API-version-patch

/version driver-SCHEMA-version-major

/version driver-SCHEMA-version-minor

/version driver-SCHEMA-version-patch

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

fix(libsinsp): enable metrics collector on all platforms

@poiana poiana added the size/XL label May 16, 2024
@Andreagit97 Andreagit97 marked this pull request as ready for review May 16, 2024 11:06
@poiana poiana requested a review from Andreagit97 May 16, 2024 11:06
@Andreagit97 Andreagit97 marked this pull request as draft May 16, 2024 11:07
@mrgian mrgian force-pushed the plugin-api-metrics-win-test branch 3 times, most recently from 8b31d31 to 7b2e258 Compare May 16, 2024 12:36
@poiana poiana added size/M and removed size/XL labels May 16, 2024
Copy link
Contributor

@FedeDP FedeDP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also wondering whether we should tie available sinsp_stats_v2_collectors to eg: MINIMAL_BUILD (for example, container-related ones will always be 0 on MINIMAL_BUILD builds).
This should be as simple as adding a compilation guard around collector entries.

@@ -274,9 +272,11 @@ class libs_metrics_collector
uint32_t m_metrics_flags = METRICS_V2_KERNEL_COUNTERS | METRICS_V2_LIBBPF_STATS | METRICS_V2_RESOURCE_UTILIZATION | METRICS_V2_STATE_COUNTERS | METRICS_V2_PLUGINS;
std::vector<metrics_v2> m_metrics;

#ifdef __linux__
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might want to move these in the scap_platform vtable, likely as a
struct scap_metrics_vtable (embedded in each scap_foo_platform), so that we could get platform-dependent metrics from the scap handle. Again, this might be an idea for a future refactor.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't had time to check out this PR but reading your comment @FedeDP I would like that. Especially since the scap refactor the CPU usage calculation is broken when only having a plugin source even when on Linux because we do not instantiate the agent info in that case which is used in the CPU usage calculation.

@mrgian mrgian changed the title [WIP] fix(libsinsp): enable metrics collector on all platforms fix(libsinsp): enable metrics collector on all platforms May 16, 2024
@mrgian mrgian marked this pull request as ready for review May 16, 2024 12:55
@poiana poiana requested a review from incertum May 16, 2024 12:56
@FedeDP
Copy link
Contributor

FedeDP commented May 16, 2024

Since we don't need this for the next release, i'd put this in the
/milestone 0.18.0

@poiana poiana added this to the 0.18.0 milestone May 16, 2024
@mrgian
Copy link
Contributor Author

mrgian commented May 16, 2024

I think we might want to move these in the scap_platform vtable, likely as a
struct scap_metrics_vtable (embedded in each scap_foo_platform), so that we could get platform-dependent metrics from the scap handle. Again, this might be an idea for a future refactor.

Ei @FedeDP make sense!
Since you moved this to the next milestone and we are not in hurry, I can take care of this :)

@incertum
Copy link
Contributor

I think we might want to move these in the scap_platform vtable, likely as a
struct scap_metrics_vtable (embedded in each scap_foo_platform), so that we could get platform-dependent metrics from the scap handle. Again, this might be an idea for a future refactor.

Ei @FedeDP make sense! Since you moved this to the next milestone and we are not in hurry, I can take care of this :)

Added this as item to falcosecurity/falco#3194 (comment).
Just to reiterate: If we could fix the agent info initialization for Linux for the plugin platform (see falcosecurity/falco#2821) -- it would be fantastic. For macOS and Windows CPU utilization and memory usage calculation would need to be new code, not sure if truly needed, WDYT?

@FedeDP
Copy link
Contributor

FedeDP commented May 17, 2024

If we could fix the agent info initialization for Linux for the plugin platform (see falcosecurity/falco#2821) -- it would be fantastic.

Agree!

For macOS and Windows CPU utilization and memory usage calculation would need to be new code, not sure if truly needed, WDYT?

I think it is interesting to expose those metric for osx and win too, but yes it's not high priority.

@incertum
Copy link
Contributor

@mrgian hope all is well, just wanted to kindly check in and ask what our current plan is to get out of the regression in our scap platforms approach? (falcosecurity/falco#2821) If we can have a proper refactor -- amazing. Else I would also support something more intermediate to ensure the next Falco release does not have this regression anymore. CC @FedeDP @leogr

Thanks in advance!

@mrgian mrgian marked this pull request as draft July 23, 2024 09:57
Copy link

github-actions bot commented Jul 23, 2024

Perf diff from master - unit tests

    10.18%     -0.88%  [.] sinsp::next
     6.71%     +0.52%  [.] sinsp_evt::get_type
     2.80%     -0.43%  [.] is_conversion_needed
     3.35%     -0.40%  [.] sinsp_thread_manager::get_thread_ref
     9.67%     -0.38%  [.] sinsp_parser::reset
     1.11%     +0.38%  [.] sinsp_evt::get_ts
     5.83%     -0.32%  [.] next_event_from_file
     1.16%     +0.31%  [.] sinsp_parser::event_cleanup
     3.42%     -0.29%  [.] sinsp_thread_manager::find_thread
     0.74%     +0.27%  [.] libsinsp::events::is_unknown_event

Heap diff from master - unit tests

peak heap memory consumption: 0B
peak RSS (including heaptrack overhead): 0B
total memory leaked: 0B

Heap diff from master - scap file

peak heap memory consumption: 0B
peak RSS (including heaptrack overhead): 0B
total memory leaked: 0B

Benchmarks diff from master

Comparing gbench_data.json to /root/actions-runner/_work/libs/libs/build/gbench_data.json
Benchmark                                                         Time             CPU      Time Old      Time New       CPU Old       CPU New
----------------------------------------------------------------------------------------------------------------------------------------------
BM_sinsp_split_mean                                            -0.0534         -0.0535           151           143           151           143
BM_sinsp_split_median                                          -0.0558         -0.0559           150           142           150           142
BM_sinsp_split_stddev                                          +0.3083         +0.3113             2             2             2             2
BM_sinsp_split_cv                                              +0.3821         +0.3854             0             0             0             0
BM_sinsp_concatenate_paths_relative_path_mean                  -0.0599         -0.0600            61            57            61            57
BM_sinsp_concatenate_paths_relative_path_median                -0.0597         -0.0598            61            57            61            57
BM_sinsp_concatenate_paths_relative_path_stddev                -0.1440         -0.1455             0             0             0             0
BM_sinsp_concatenate_paths_relative_path_cv                    -0.0894         -0.0910             0             0             0             0
BM_sinsp_concatenate_paths_empty_path_mean                     +0.0481         +0.0480            24            25            24            25
BM_sinsp_concatenate_paths_empty_path_median                   +0.0476         +0.0475            24            25            24            25
BM_sinsp_concatenate_paths_empty_path_stddev                   +0.4570         +0.4547             0             0             0             0
BM_sinsp_concatenate_paths_empty_path_cv                       +0.3902         +0.3881             0             0             0             0
BM_sinsp_concatenate_paths_absolute_path_mean                  -0.1516         -0.1516            67            56            67            56
BM_sinsp_concatenate_paths_absolute_path_median                -0.1654         -0.1654            67            56            67            56
BM_sinsp_concatenate_paths_absolute_path_stddev                -0.0706         -0.0704             1             1             1             1
BM_sinsp_concatenate_paths_absolute_path_cv                    +0.0954         +0.0957             0             0             0             0
BM_sinsp_split_container_image_mean                            +0.0178         +0.0177           385           392           385           392
BM_sinsp_split_container_image_median                          +0.0178         +0.0177           385           392           385           392
BM_sinsp_split_container_image_stddev                          -0.3534         -0.3539             2             2             2             2
BM_sinsp_split_container_image_cv                              -0.3647         -0.3651             0             0             0             0

Copy link

codecov bot commented Jul 23, 2024

Codecov Report

Attention: Patch coverage is 93.75000% with 7 lines in your changes missing coverage. Please review.

Project coverage is 75.19%. Comparing base (230ddfb) to head (ea6ddee).
Report is 2 commits behind head on master.

Files with missing lines Patch % Lines
userspace/libsinsp/linux/resource_utilization.cpp 93.33% 7 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1870      +/-   ##
==========================================
- Coverage   75.19%   75.19%   -0.01%     
==========================================
  Files         259      261       +2     
  Lines       33875    33875              
  Branches     5800     5801       +1     
==========================================
- Hits        25473    25472       -1     
- Misses       8402     8403       +1     
Flag Coverage Δ
libsinsp 75.19% <93.75%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@incertum
Copy link
Contributor

If I'm not wrong, currently libs_resource_utilization (https://github.com/falcosecurity/libs/blob/master/userspace/libsinsp/metrics_collector.h#L271-L300) is the only class with linux-only code.

Confirmed.

As you said, taking a decision on the directory naming will influence future components development, so I'll wait to know what the maintainers think.

Also don't have any preference. Maybe go with what @gnosek deems slightly better, because Grzeg has been around the block for some time and I get all the callouts. The ifdefs were a good solution to get these metrics going. Now we can finally get it right. By now 4+ folks already refactored the libs metrics collector, so there is hope that we will stabilize that code at some point 🙃 .

@FedeDP
Copy link
Contributor

FedeDP commented Aug 27, 2024

Any news on this @mrgian ?

@mrgian
Copy link
Contributor Author

mrgian commented Aug 27, 2024

Ei @FedeDP
Not yet!
We decided to refactor this again :( and currently I'm busy with other tasks
So I don't think this will make it in the next release, but I will start working on this as soon as I can

@FedeDP
Copy link
Contributor

FedeDP commented Aug 27, 2024

Ok! Moving to next milestone then :)
/milestone 0.19.0

@poiana poiana modified the milestones: 0.18.0, 0.19.0 Aug 27, 2024
@mrgian mrgian force-pushed the plugin-api-metrics-win-test branch 4 times, most recently from b8b3ee8 to 9938e53 Compare October 8, 2024 15:29
@mrgian mrgian force-pushed the plugin-api-metrics-win-test branch from 9938e53 to d0c2588 Compare October 10, 2024 14:44
@mrgian
Copy link
Contributor Author

mrgian commented Oct 10, 2024

Ehi @incertum @gnosek
I moved all the linux-specific code in linux/resource_utilization.cpp.
If compiled on a non-linux platform, instead of using linux_resource_utilization we use a generic libs_metrics which returns an empty metrics vector on to_metrics().
This allows us to use the metrics collector on all platforms.

WDYT?

@FedeDP
Copy link
Contributor

FedeDP commented Nov 13, 2024

/cc @gnosek

/milestone 0.20.0

@poiana poiana requested a review from gnosek November 13, 2024 08:59
@poiana poiana modified the milestones: 0.19.0, 0.20.0 Nov 13, 2024
@mrgian mrgian force-pushed the plugin-api-metrics-win-test branch from d0c2588 to ea6ddee Compare December 9, 2024 16:12
@mrgian mrgian marked this pull request as ready for review December 9, 2024 16:22
Copy link
Contributor

@FedeDP FedeDP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@poiana
Copy link
Contributor

poiana commented Dec 11, 2024

LGTM label has been added.

Git tree hash: c149ce366ecba7cfa90f6c97d61d4be7ef2e6998

@poiana
Copy link
Contributor

poiana commented Dec 11, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: FedeDP, mrgian

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants