refactor(cri): fast-track add containers to cache (synchronous lookup case, Falco run w/ `--disable-cri-async`) #1595

incertum · 2023-12-20T05:30:46Z

What type of PR is this?

Uncomment one (or more) /kind <> lines:

/kind bug

/kind cleanup

/kind design

/kind documentation

/kind failing-test

/kind feature

Any specific area of the project related to this PR?

Uncomment one (or more) /area <> lines:

/area API-version

/area build

/area CI

/area driver-kmod

/area driver-bpf

/area driver-modern-bpf

/area libscap-engine-bpf

/area libscap-engine-gvisor

/area libscap-engine-kmod

/area libscap-engine-modern-bpf

/area libscap-engine-nodriver

/area libscap-engine-noop

/area libscap-engine-source-plugin

/area libscap-engine-savefile

/area libscap

/area libpman

/area libsinsp

/area tests

/area proposals

Does this PR require a change in the driver versions?

/version driver-API-version-major

/version driver-API-version-minor

/version driver-API-version-patch

/version driver-SCHEMA-version-major

/version driver-SCHEMA-version-minor

/version driver-SCHEMA-version-patch

What this PR does / why we need it:

Fast-track addition of enriched containers (fields successfully retrieved from the container runtime socket) to the container cache, bypassing the round-trip process:

`source_callback` -> `notify_new_container` ->
`container_to_sinsp_event(container_to_json(container_info), ...)` ->
`parse_container_json_evt` -> `m_inspector->m_container_manager.add_container()`

Although we still re-add the container in parse_container_json_evt to also support native 'container' events, it introduces an avoidable delay in the incoming syscall event stream. Syscall events do not explicitly require container events and instead directly retrieve container details from the container cache. This behavior could potentially contribute to the issues noted by adopters, such as the absence of container images in syscall events.

Part of #1589

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

incertum · 2023-12-20T05:42:59Z

userspace/libsinsp/container_engine/cri.cpp

+				* events and instead directly retrieve container details from the container cache. This behavior could potentially
+				* contribute to the issues noted by adopters, such as the absence of container images in syscall events.
+				*/
+				cache->add_container(std::make_shared<sinsp_container_info>(result), nullptr);


Maybe we should introduce a new function as this one also loops over callbacks. In general, I dislike how certain functions are overloaded in the container engine.

CC @gnosek ?

Using existing replace_container instead. This should be safe and do what we want.
@mstemm would you have additional info on the callback or the general flow here? Eager to understand subtleties better. Thank you!

I'm not sure why we even have a synchronous lookup option. That would surely lead to event drops wouldn't it? That's why we made the lookups asynchronous in the first place.

If you're asking how the asynchronous lookup flow works, you got the general steps right:

The container engine does an asynchronous lookup of container metadata.

The container metadata is packed into a CONTAINER_JSON event.

The event is enqueued to the inspector by the container engine thread.

On the next call to sinsp::next() by the event processing thread, the enqueued event is read and parsed. Parsing the event adds the container info to libsinsp.

I think step 2 is probably more confusing than it needs to be.

The async is not working well in production, you have many events with missing container enrichment. My follow up question is that synchronous still isn't truly synchronous? We do the CRI API calls and parsing synchronously but then we don't add the container right away to the cache and still do the container event round trip before we add the container to the container cache. WDYT?

I thought all the container engine code lookups were done in a separate thread, using the mechanisms in https://github.com/falcosecurity/libs/blob/master/userspace/libsinsp/container_engine/container_async_source.tpp. The reason for creating the CONTAINER_JSON event and enqueuing it to the inspector is because it's a separate thread and we can't directly access the libsinsp state, due to no locking/mutexes at all for those data structures.

There may be bugs in the async lookup path, and I agree that the current code is unnecessarily confusing, but you definitely don't want to go back to synchronous lookups as a solution for that. Some container engines had a delay of up to 500ms before container info were available for a new container, and stalling the event loop was definitely causing dropped syscalls while the synchronous lookup was underway.

If no one else can or wants to sure I can help. In this case since we talk about the container engine "master of disaster" seems more accurate.

I think as a first step we should start with storing the container before we convert to JSON in any case. That way we don't break the trace files and delays don't matter for the native container events anyways, but storing the container as fast as possible does matter a lot for the syscalls event stream.

After that we can rework the more general flow.

Sounds reasonable?

Yeah, if the container cache is protected across threads it makes sense to add it immediately and then create the CONTAINER_JSON event afterward.

ok one more question: what do the callbacks do or can do? Was there a reason the callback should only be called from the enqueued container event?

I guess you mean the on_new_container callback? I believe it was a general mechanism for other parts of libsinsp to be notified when there is a new container. And for this, it's used to pass back control to from the container engines to libsinsp (I think, I haven't looked at the code in a while).

ok, sounds like no particular reason why they were invoked from the enqueued event right? Hence we could upfront them as well, I think other container engines do that. I will look into it.

incertum · 2023-12-20T05:44:04Z

Maybe

/milestone 0.15.0

incertum · 2024-01-08T18:49:55Z

I need to do another re-deploy with some preview patches from https://github.com/falcosecurity/libs/pull/1600/files#diff-8c61b5c69dd08557db645a44eed01006cb658f4099c0339262fa7ed5ff567e35R732 to get to the bottom of what causes missing container images.

In general, I will expand the scope of this PR a bit once I have better data around what the actual problem is, related to #291

userspace/libsinsp/container_engine/container_async_source.tpp

userspace/libsinsp/container_engine/cri.cpp

incertum · 2024-01-23T05:07:40Z

userspace/libsinsp/parsers.cpp

@@ -5083,22 +5083,11 @@ void sinsp_parser::parse_container_json_evt(sinsp_evt *evt)
 		const Json::Value& lookup_state = container["lookup_state"];
 		if(check_json_val_is_convertible(lookup_state, Json::uintValue, "lookup_state"))
 		{
-			container_info->set_lookup_status(static_cast<sinsp_container_lookup::state>(lookup_state.asUInt()));


Not removed just moved to the bottom with updated logic.

userspace/libsinsp/parsers.cpp

leogr · 2024-02-13T14:56:36Z

/assign

mstemm

The changes look okay to me. To be clear, it seems that the big change in the PR is that when doing synchronous container lookups (I guess this might be at startup or other uncommon cases), that the container info is added to the cache immediately instead of creating a fake event and then immediately parsing the event.

poiana · 2024-02-13T19:10:19Z

LGTM label has been added.

Git tree hash: a0f3fec8db9788ce489e0f703ec6c1068af11806

therealbobo · 2024-02-26T14:25:41Z

I took a look at the failing test: given that we changed the behaviour of the cri runtime we have to change the test. Now we should send a different json to sinsp (e.g. {"value":1,container:{"image":"test_image"}}). This is due to the fact that, with this paths, we will filter out all the events that are not owned by a pod sandbox or have no image. cc @gnosek @leogr

gnosek · 2024-02-26T15:59:45Z

userspace/libsinsp/parsers.cpp

+				// note: The container image is the most crucial field from a security incident response perspective. 
+				// We aim to raise the bar for successful container lookups. Conversely, pod sandboxes do not include  
+				// a container image in the API response.
+				if (!container_info->m_image.empty() || container_info->is_pod_sandbox())
+				{
+					// Marked as SUCCESSFUL if and only if the container image field present or 
+					// if it's a pod sandbox container
+					container_info->set_lookup_status(sinsp_container_lookup::state::SUCCESSFUL);
+				} else
+				{
+					container_info->set_lookup_status(sinsp_container_lookup::state::FAILED);
+				}


Sorry, but I disagree. This is hardcoding your policy ("all containers must have an image name"), which looks sensible but https://xkcd.com/1172/ .

How is the FAILED state an improvement here? ->this<- is the metadata we have, if it contains an empty image name, so be it, no reason to pretend all the other metadata does not exist. Given how fluid the CRI spec seems to be, I can totally imagine some-new-fancy-wrapper-over-runc run sha256:abcdef to not expose the image name at all (since we don't have it), but still, e.g. the labels might be valuable.

Not to mention, this is generic code, for all container engines, some of which may not even have a notion of image names (I'm not sure e.g. LXC does). If I am in the minority and we decide to change the policy to require an image name, please at the very least place it directly in the container engine(s).

I understand you need the image name for anything interesting, but the real fix would be to make metadata queries fast and reliable (which we know is impossible in all cases but we can keep trying). This is just shooting the messenger: given how the docker/cri engines are written, if you get a success without an image name, I'd assume we have not received an image name from the API and there's nothing we can do about it. OK, maybe retrying the query a bit later might help (doubt it but stranger things have happened), but that's entirely the business of a particular engine.

Hey @gnosek, I respectfully disagree. Here's how I feel: I'm simply trying to ensure the tool functions effectively in real-life production environments. However, it often encounters issues with the container engine, which leads security analysts to conclude that the tool isn't working properly. It seems that this harsh reality isn't receiving much attention from anyone around here. Therefore, all I'm attempting to do is to make improvements to address these issues.

As of today, we still add it to the cache even if it failed. However, now it's marked as not successful, which lays the foundation for future improvements.

Are you making guesses about future changes? I am fighting today's fire-drills.

Still, how does this chunk help? I'm not opposing the patch in general. I like the idea, I just still need to look at the implementation. I just can't imagine this particular bit doing anything good. I'm not even entirely sure it has an observable effect inside Falco (the callbacks are a different story, they'd be sad).

Combined with the other suggestion (of dropping FAILED lookups from the cache), it would keep retrying the lookups until one did return an image name, burning a ton of CPU in the process (both in Falco and the container runtime).

In the harsh reality I am so carefully ignoring, have you observed an API query that:

returned metadata other than image name initially (i.e. not just raised an error)

returned the image name too later on?

Like I said in the other comment, I'm all for marking the metadata for a container as "yeah, we got something but not everything we'd like" and retrying later (for some definition of later, probably other than "the next syscall"), but that has to be done in an explicit manner.

Jumping in late in this discussion. From my high-level understanding, this seem we're deciding between:

Discarding all partially-incomplete metadata in case the container image is not available and force a new fetch attempt later on thus consuming extra CPU in the hope of getting more complete data

Consider the fetch successful even in case some data (such as container image) is missing

Both come with evident pros and cons. If we struggle finding an agreement, to me this feels like the kind of behavior that should be configurable and not be enforced on any of the libs adopters. Of course I agree that security-related use cases have top-priority overall for the Falco project, but I also see that the libs themselves are an arsenal of system inspection tools that can be the foundation of different projects and products as well. To me, this feels like something we could negotiate as a "retry policy" new configuration point. Just my two cents, I trust both of you with your own expertise and evaluations.

Not to mention, this is generic code, for all container engines, some of which may not even have a notion of image names (I'm not sure e.g. LXC does). If I am in the minority and we decide to change the policy to require an image name, please at the very least place it directly in the container engine(s).

This one I'm very onboard with -- I think this should not affect the behavior of other container runtimes.

retrying later (for some definition of later, probably other than "the next syscall"), but that has to be done in an explicit manner.

Agree also on this, this "retry policy" definitely has to happen in a controlled or throttled way.

Given that we have the new ticket #1708, let's hold off on any changes in this regard in the parser, and someone shall refactor it in the future. Please be aware of #1707. It's incorrect to initialize a lookup to SUCCESSFUL, by the way. I believe this is the strongest indication of a very suboptimal shared design across the container engines we support.

incertum · 2024-02-26T18:13:55Z

Needed another rebase.

incertum · 2024-02-26T18:52:22Z

hmmm wanting to re-push the last commit, but GitHub is acting up
/hold

It needs to be

                               result.set_lookup_status(sinsp_container_lookup::state::STARTED);
				// note: The cache should not have SUCCESSFUL as lookup status at this point, else `parse_container_json_evt` would wrongly exit early.
				cache->replace_container(std::make_shared<sinsp_container_info>(result));
}
// note: On the other hand `parse_container_json_evt` expects SUCCESSFUL as lookup state for the incoming container event /
// the not yet cached container, this was previously done within `lookup_sync`.
result.set_lookup_status(sinsp_container_lookup::state::SUCCESSFUL);

… case) Signed-off-by: Melissa Kilby <[email protected]>

* Now more precise, logical and clean, set STARTED at the beginning and SUCCESSFUL at the end of `parse_container_json_evt` if and only if the container image was successfully retrieved and present before calling `add_container` or if it's a pod sanbox container. * In `lookup_sync` we previously set the lookup_status to SUCCESSFUL, which can be avoided to have a more consolidated and clean behavior in `parse_container_json_evt`. Signed-off-by: Melissa Kilby <[email protected]>

…tainer sync lookups Co-authored-by: Roberto Scolaro <[email protected]> Signed-off-by: Melissa Kilby <[email protected]>

incertum · 2024-02-26T19:02:59Z

/unhold

@therealbobo mind checking the last commit to ensure it's the exact same old code flow. It seems that we need to change the lookup status again to SUCCESSFUL for the further handling of the new container event (the non cached instance), see #1595 (comment)

…ting state updates in lookup_sync are preserved Signed-off-by: Melissa Kilby <[email protected]>

leogr

As privately discussed, I'm ok with this change. Any other improvement may come in follow-up PRs.

poiana · 2024-02-28T16:18:05Z

LGTM label has been added.

Git tree hash: d878d485880b017d5b47b87311c4765b6147fc3a

poiana · 2024-02-28T17:10:36Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: incertum, leogr, LucaGuerra

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [LucaGuerra,incertum,leogr]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

incertum · 2024-02-28T18:24:31Z

Thanks and FYI I am tracking overall / general progress here: falcosecurity/falco#2708.

poiana added kind/cleanup release-note-none dco-signoff: yes area/libsinsp labels Dec 20, 2023

poiana requested review from hbrueckner and leogr December 20, 2023 05:31

poiana added size/S approved labels Dec 20, 2023

incertum commented Dec 20, 2023

View reviewed changes

poiana added this to the 0.15.0 milestone Dec 20, 2023

incertum force-pushed the refactor/cri-sync-add-container-fasttrack branch 2 times, most recently from f685d0e to e6895e3 Compare December 21, 2023 20:15

incertum changed the title ~~refactor(cri): fast-track add containers to cache (synchronous lookup case)~~ wip: refactor(cri): fast-track add containers to cache (synchronous lookup case) Jan 8, 2024

poiana added the do-not-merge/work-in-progress label Jan 8, 2024

incertum force-pushed the refactor/cri-sync-add-container-fasttrack branch from e6895e3 to ef937ec Compare January 23, 2024 04:54

incertum changed the title ~~wip: refactor(cri): fast-track add containers to cache (synchronous lookup case)~~ refactor(cri): fast-track add containers to cache (synchronous lookup case) Jan 23, 2024

poiana added size/M and removed size/S do-not-merge/work-in-progress labels Jan 23, 2024

incertum commented Jan 23, 2024

View reviewed changes

incertum force-pushed the refactor/cri-sync-add-container-fasttrack branch from ef937ec to cba3555 Compare February 11, 2024 02:41

poiana assigned leogr Feb 13, 2024

leogr requested a review from LucaGuerra February 13, 2024 14:56

mstemm previously approved these changes Feb 13, 2024

View reviewed changes

poiana assigned mstemm Feb 13, 2024

poiana added the lgtm label Feb 13, 2024

incertum mentioned this pull request Feb 24, 2024

feat(libsinsp/container_info): change default / init lookup state to FAILED #1707

Merged

gnosek reviewed Feb 26, 2024

View reviewed changes

incertum mentioned this pull request Feb 26, 2024

[TRACKING] Improve container engine state handling #1708

Open

incertum force-pushed the refactor/cri-sync-add-container-fasttrack branch 2 times, most recently from f3ac0ac to ba164f1 Compare February 26, 2024 18:13

incertum force-pushed the refactor/cri-sync-add-container-fasttrack branch from a0ccfdc to 1439f01 Compare February 26, 2024 18:55

incertum and others added 3 commits February 26, 2024 18:57

refactor(cri): fast-track add containers to cache (synchronous lookup…

d6a03b3

… case) Signed-off-by: Melissa Kilby <[email protected]>

fix(cri): properly handle state assignments in new fast-track CRI con…

6c0da85

…tainer sync lookups Co-authored-by: Roberto Scolaro <[email protected]> Signed-off-by: Melissa Kilby <[email protected]>

incertum force-pushed the refactor/cri-sync-add-container-fasttrack branch 2 times, most recently from ba164f1 to a0ccfdc Compare February 26, 2024 18:58

poiana removed the do-not-merge/hold label Feb 26, 2024

incertum force-pushed the refactor/cri-sync-add-container-fasttrack branch from ff4bc4a to 3f2dfaf Compare February 26, 2024 19:04

cleanup: revert any changes to parse_container_json_evt + ensure exis…

9646c87

…ting state updates in lookup_sync are preserved Signed-off-by: Melissa Kilby <[email protected]>

incertum force-pushed the refactor/cri-sync-add-container-fasttrack branch from 3f2dfaf to 9646c87 Compare February 26, 2024 19:07

leogr approved these changes Feb 28, 2024

View reviewed changes

poiana added the lgtm label Feb 28, 2024

LucaGuerra approved these changes Feb 28, 2024

View reviewed changes

poiana assigned LucaGuerra Feb 28, 2024

poiana merged commit 3aad411 into master Feb 28, 2024
41 checks passed

poiana deleted the refactor/cri-sync-add-container-fasttrack branch February 28, 2024 17:11

incertum mentioned this pull request Feb 28, 2024

[TRACKING] Re-audit container engines for empty container info values (Initial focus on CRI for Kubernetes) falcosecurity/falco#2708

Open

incertum changed the title ~~refactor(cri): fast-track add containers to cache (synchronous lookup case)~~ refactor(cri): fast-track add containers to cache (synchronous lookup case, Falco run w/ --disable-cri-async) Feb 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(cri): fast-track add containers to cache (synchronous lookup case, Falco run w/ `--disable-cri-async`) #1595

refactor(cri): fast-track add containers to cache (synchronous lookup case, Falco run w/ `--disable-cri-async`) #1595

incertum commented Dec 20, 2023 •

edited

Loading

incertum Dec 20, 2023

incertum Dec 21, 2023

mstemm Dec 21, 2023

incertum Dec 21, 2023

mstemm Dec 21, 2023

incertum Dec 21, 2023

mstemm Dec 21, 2023

incertum Dec 21, 2023

mstemm Dec 22, 2023

incertum Dec 22, 2023

incertum commented Dec 20, 2023

incertum commented Jan 8, 2024

incertum Jan 23, 2024

leogr commented Feb 13, 2024

mstemm left a comment

poiana commented Feb 13, 2024

therealbobo commented Feb 26, 2024

gnosek Feb 26, 2024

incertum Feb 26, 2024

gnosek Feb 26, 2024

jasondellaluce Feb 26, 2024 •

edited

Loading

incertum Feb 26, 2024

incertum commented Feb 26, 2024

incertum commented Feb 26, 2024

incertum commented Feb 26, 2024

leogr left a comment

poiana commented Feb 28, 2024

poiana commented Feb 28, 2024

incertum commented Feb 28, 2024

refactor(cri): fast-track add containers to cache (synchronous lookup case, Falco run w/ --disable-cri-async) #1595

refactor(cri): fast-track add containers to cache (synchronous lookup case, Falco run w/ --disable-cri-async) #1595

Conversation

incertum commented Dec 20, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

incertum commented Dec 20, 2023

incertum commented Jan 8, 2024

Choose a reason for hiding this comment

leogr commented Feb 13, 2024

mstemm left a comment

Choose a reason for hiding this comment

poiana commented Feb 13, 2024

therealbobo commented Feb 26, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jasondellaluce Feb 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

incertum commented Feb 26, 2024

incertum commented Feb 26, 2024

incertum commented Feb 26, 2024

leogr left a comment

Choose a reason for hiding this comment

poiana commented Feb 28, 2024

poiana commented Feb 28, 2024

incertum commented Feb 28, 2024

refactor(cri): fast-track add containers to cache (synchronous lookup case, Falco run w/ `--disable-cri-async`) #1595

refactor(cri): fast-track add containers to cache (synchronous lookup case, Falco run w/ `--disable-cri-async`) #1595

incertum commented Dec 20, 2023 •

edited

Loading

jasondellaluce Feb 26, 2024 •

edited

Loading