Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(libsinsp/container_info): change default / init lookup state to FAILED #1707

Merged
merged 3 commits into from
Apr 16, 2024

Conversation

incertum
Copy link
Contributor

What type of PR is this?

Uncomment one (or more) /kind <> lines:

/kind bug

/kind cleanup

/kind design

/kind documentation

/kind failing-test

/kind feature

Any specific area of the project related to this PR?

Uncomment one (or more) /area <> lines:

/area API-version

/area build

/area CI

/area driver-kmod

/area driver-bpf

/area driver-modern-bpf

/area libscap-engine-bpf

/area libscap-engine-gvisor

/area libscap-engine-kmod

/area libscap-engine-modern-bpf

/area libscap-engine-nodriver

/area libscap-engine-noop

/area libscap-engine-source-plugin

/area libscap-engine-savefile

/area libscap

/area libpman

/area libsinsp

/area tests

/area proposals

Does this PR require a change in the driver versions?

/version driver-API-version-major

/version driver-API-version-minor

/version driver-API-version-patch

/version driver-SCHEMA-version-major

/version driver-SCHEMA-version-minor

/version driver-SCHEMA-version-patch

What this PR does / why we need it:

While working on #1595 it became evident that the init state is set wrongly. Curious if something breaks, if it does we likely will have found more improvement areas in regards to the container engine in general.

Opened it again against the libs repo directly to allow for easier testing.

CC @therealbobo @leogr @deepskyblue86

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

@leogr
Copy link
Member

leogr commented Feb 26, 2024

Opened it again against the libs repo directly to allow for easier testing.

It seems at least 3 tests are failing:

[  FAILED  ] sinsp_with_test_input.event_sources
[  FAILED  ] sinsp_with_test_input.container_parser_cri_containerd
[  FAILED  ] sinsp_with_test_input.container_parser_cri_crio

Still trying to figure out where the problem is: the change or the tests? 🤔

@Andreagit97
Copy link
Member

any news here?

@Andreagit97 Andreagit97 added this to the TBD milestone Mar 11, 2024
@incertum
Copy link
Contributor Author

We first wanted to get the other container PRs in. They are all merged now. Let me rebase today and check.

@incertum incertum force-pushed the refactor/container-engine-init-state branch from 91960ef to c799c23 Compare March 11, 2024 17:37
@poiana poiana added size/S and removed size/XS labels Mar 11, 2024
@@ -131,8 +131,13 @@ TEST_F(sinsp_with_test_input, event_sources)
ASSERT_FALSE(field_has_value(evt, "evt.asynctype"));

// metaevents have the "syscall" event source
evt = add_event_advance_ts(increasing_ts(), 1, PPME_CONTAINER_JSON_E, 1, "{\"value\": 1}");
ASSERT_EQ(evt->get_type(), PPME_CONTAINER_JSON_E);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was it intentional to use the deprecated event type PPME_CONTAINER_JSON_E? Changed it to PPME_CONTAINER_JSON_2_E

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @FedeDP

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't write those tests, i don't know why they used PPME_CONTAINER_JSON_E. Most probably it was just an oversight.

@incertum
Copy link
Contributor Author

Unit tests just needed to be updated w/ set_lookup_status(sinsp_container_lookup::state::SUCCESSFUL).

Copy link
Member

@Andreagit97 Andreagit97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM, but I don't have so much context on this area can anyone double-check them?

@leogr leogr requested a review from LucaGuerra March 12, 2024 09:06
@LucaGuerra
Copy link
Contributor

Not an expert in this specific engine but changes LGTM

LucaGuerra
LucaGuerra previously approved these changes Mar 12, 2024
@poiana
Copy link
Contributor

poiana commented Mar 12, 2024

LGTM label has been added.

Git tree hash: 6aebd6a961b17e8166d2f9af082aa60b1030b7b9

@poiana poiana requested a review from LucaGuerra March 12, 2024 18:26
@@ -33,6 +33,7 @@ docker_base::resolve_impl(sinsp_threadinfo *tinfo, const docker_lookup_request&
auto container = sinsp_container_info();
container.m_type = request.container_type;
container.m_id = request.container_id;
container.set_lookup_status(sinsp_container_lookup::state::SUCCESSFUL);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CRI and docker mostly handled the lookup state correctly, except here it was necessary to add.

Copy link
Contributor Author

@incertum incertum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't aware we were gonna move already forward with this PR. In turns out that all the non primary container engines never explicitly set the status (see last commit) ... probably why the default came from. I believe this is a great first step towards more transparent lookup status handling and please keep in mind that there is an open ticket for a more comprehensive future refactor #1708.

@leogr
Copy link
Member

leogr commented Mar 14, 2024

Wasn't aware we were gonna move already forward with this PR. In turns out that all the non primary container engines never explicitly set the status (see last commit) ... probably why the default came from. I believe this is a great first step towards more transparent lookup status handling and please keep in mind that there is an open ticket for a more comprehensive future refactor #1708.

This PR hadn't a milestone, but I guess it should be for the 0.16 (since it's too late for 0.15, further testing and another round of maintainers' opinion is likely needed cc @falcosecurity/libs-maintainers)

/milestone 0.16

Let me know if you disagree.

@poiana
Copy link
Contributor

poiana commented Mar 14, 2024

@leogr: The provided milestone is not valid for this repository. Milestones in this repository: [0.15.0, 0.16.0, TBD, next-driver]

Use /milestone clear to clear the milestone.

In response to this:

Wasn't aware we were gonna move already forward with this PR. In turns out that all the non primary container engines never explicitly set the status (see last commit) ... probably why the default came from. I believe this is a great first step towards more transparent lookup status handling and please keep in mind that there is an open ticket for a more comprehensive future refactor #1708.

This PR hadn't a milestone, but I guess it should be for the 0.16 (since it's too late for 0.15, further testing and another round of maintainers' opinion is likely needed cc @falcosecurity/libs-maintainers)

/milestone 0.16

Let me know if you disagree.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@leogr
Copy link
Member

leogr commented Mar 14, 2024

@leogr: The provided milestone is not valid for this repository. Milestones in this repository: [0.15.0, 0.16.0, TBD, next-driver]

Use /milestone clear to clear the milestone.

sorry

/milestone 0.16.0

@poiana poiana modified the milestones: TBD, 0.16.0 Mar 14, 2024
@incertum
Copy link
Contributor Author

Exactly @leogr there was never a milestone since it's not particularly urgent from a functionality point of view, but it's one step towards making sure we don't break the container engine every time we try to touch it. It also aligns with #1747 (better documentation, more transparency). Furthermore, it's the first step towards improving the container information lookup functionality, see #1708, more to follow.

I manually tested these changes for docker and CRI ...

@incertum incertum force-pushed the refactor/container-engine-init-state branch from 7395591 to 733bb0a Compare March 14, 2024 16:55
@incertum
Copy link
Contributor Author

Rebased so that the new container e2e tests can run.

@incertum incertum force-pushed the refactor/container-engine-init-state branch from 733bb0a to a68fce9 Compare March 25, 2024 23:23
@incertum
Copy link
Contributor Author

Updates on this PR? Thanks.

@@ -61,6 +61,7 @@ bool bpm::resolve(sinsp_threadinfo *tinfo, bool query_os_for_missing_info)
if(container_cache().should_lookup(container_info.m_id, CT_BPM))
{
container_info.m_name = container_info.m_id;
container_info.set_lookup_status(sinsp_container_lookup::state::SUCCESSFUL);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering whether it would be better to set the SUCCESSFUL lookup status inside
container_cache().notify_new_container(): it requires less changes and it "feels" better, ie: since we are notifying the new container, we mark the container as successfully looked up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels less error prone too (like: a new engine gets introduced and we forgot to add the set_lookup_state call).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except some container engines (e.g. libvirt_lxc) add the container to the cache and already expect the lookup status to be SUCCESSFUL, so that notify_new_container() doesn't do anything.

As stated more at the beginning of the PR this change is just an initial improvement so that the lookup state is not SUCCESSFUL by default, which makes not a lot of sense and can backfire even more. We have this issue (#1708) we wanted to work on after Falco 0.38.0.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry, i completely overlooked that. Then we are good to go!

Copy link
Contributor

@FedeDP FedeDP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@poiana
Copy link
Contributor

poiana commented Apr 16, 2024

LGTM label has been added.

Git tree hash: a8cb40c7bcdda56b62e57ec6c84ec07cc6b455d5

@poiana
Copy link
Contributor

poiana commented Apr 16, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: FedeDP, incertum, LucaGuerra

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [FedeDP,LucaGuerra,incertum]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@poiana poiana merged commit 6b1d8d6 into master Apr 16, 2024
43 checks passed
@poiana poiana deleted the refactor/container-engine-init-state branch April 16, 2024 12:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants