-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(cri): fast-track add containers to cache (synchronous lookup case, Falco run w/ --disable-cri-async
)
#1595
Conversation
* events and instead directly retrieve container details from the container cache. This behavior could potentially | ||
* contribute to the issues noted by adopters, such as the absence of container images in syscall events. | ||
*/ | ||
cache->add_container(std::make_shared<sinsp_container_info>(result), nullptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should introduce a new function as this one also loops over callbacks. In general, I dislike how certain functions are overloaded in the container engine.
CC @gnosek ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using existing replace_container
instead. This should be safe and do what we want.
@mstemm would you have additional info on the callback or the general flow here? Eager to understand subtleties better. Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why we even have a synchronous lookup option. That would surely lead to event drops wouldn't it? That's why we made the lookups asynchronous in the first place.
If you're asking how the asynchronous lookup flow works, you got the general steps right:
- The container engine does an asynchronous lookup of container metadata.
- The container metadata is packed into a CONTAINER_JSON event.
- The event is enqueued to the inspector by the container engine thread.
- On the next call to sinsp::next() by the event processing thread, the enqueued event is read and parsed. Parsing the event adds the container info to libsinsp.
I think step 2 is probably more confusing than it needs to be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The async
is not working well in production, you have many events with missing container enrichment. My follow up question is that synchronous still isn't truly synchronous? We do the CRI API calls and parsing synchronously but then we don't add the container right away to the cache and still do the container event round trip before we add the container to the container cache. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought all the container engine code lookups were done in a separate thread, using the mechanisms in https://github.com/falcosecurity/libs/blob/master/userspace/libsinsp/container_engine/container_async_source.tpp. The reason for creating the CONTAINER_JSON event and enqueuing it to the inspector is because it's a separate thread and we can't directly access the libsinsp state, due to no locking/mutexes at all for those data structures.
There may be bugs in the async lookup path, and I agree that the current code is unnecessarily confusing, but you definitely don't want to go back to synchronous lookups as a solution for that. Some container engines had a delay of up to 500ms before container info were available for a new container, and stalling the event loop was definitely causing dropped syscalls while the synchronous lookup was underway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If no one else can or wants to sure I can help. In this case since we talk about the container engine "master of disaster" seems more accurate.
I think as a first step we should start with storing the container before we convert to JSON in any case. That way we don't break the trace files and delays don't matter for the native container events anyways, but storing the container as fast as possible does matter a lot for the syscalls event stream.
After that we can rework the more general flow.
Sounds reasonable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, if the container cache is protected across threads it makes sense to add it immediately and then create the CONTAINER_JSON event afterward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok one more question: what do the callbacks do or can do? Was there a reason the callback should only be called from the enqueued container event?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you mean the on_new_container callback? I believe it was a general mechanism for other parts of libsinsp to be notified when there is a new container. And for this, it's used to pass back control to from the container engines to libsinsp (I think, I haven't looked at the code in a while).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, sounds like no particular reason why they were invoked from the enqueued event right? Hence we could upfront them as well, I think other container engines do that. I will look into it.
Maybe /milestone 0.15.0 |
f685d0e
to
e6895e3
Compare
I need to do another re-deploy with some preview patches from https://github.com/falcosecurity/libs/pull/1600/files#diff-8c61b5c69dd08557db645a44eed01006cb658f4099c0339262fa7ed5ff567e35R732 to get to the bottom of what causes missing container images. In general, I will expand the scope of this PR a bit once I have better data around what the actual problem is, related to #291 |
e6895e3
to
ef937ec
Compare
userspace/libsinsp/parsers.cpp
Outdated
@@ -5083,22 +5083,11 @@ void sinsp_parser::parse_container_json_evt(sinsp_evt *evt) | |||
const Json::Value& lookup_state = container["lookup_state"]; | |||
if(check_json_val_is_convertible(lookup_state, Json::uintValue, "lookup_state")) | |||
{ | |||
container_info->set_lookup_status(static_cast<sinsp_container_lookup::state>(lookup_state.asUInt())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not removed just moved to the bottom with updated logic.
ef937ec
to
cba3555
Compare
/assign |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes look okay to me. To be clear, it seems that the big change in the PR is that when doing synchronous container lookups (I guess this might be at startup or other uncommon cases), that the container info is added to the cache immediately instead of creating a fake event and then immediately parsing the event.
LGTM label has been added. Git tree hash: a0f3fec8db9788ce489e0f703ec6c1068af11806
|
I took a look at the failing test: given that we changed the behaviour of the cri runtime we have to change the test. Now we should send a different json to sinsp (e.g. |
userspace/libsinsp/parsers.cpp
Outdated
// note: The container image is the most crucial field from a security incident response perspective. | ||
// We aim to raise the bar for successful container lookups. Conversely, pod sandboxes do not include | ||
// a container image in the API response. | ||
if (!container_info->m_image.empty() || container_info->is_pod_sandbox()) | ||
{ | ||
// Marked as SUCCESSFUL if and only if the container image field present or | ||
// if it's a pod sandbox container | ||
container_info->set_lookup_status(sinsp_container_lookup::state::SUCCESSFUL); | ||
} else | ||
{ | ||
container_info->set_lookup_status(sinsp_container_lookup::state::FAILED); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, but I disagree. This is hardcoding your policy ("all containers must have an image name"), which looks sensible but https://xkcd.com/1172/ .
How is the FAILED state an improvement here? ->this<- is the metadata we have, if it contains an empty image name, so be it, no reason to pretend all the other metadata does not exist. Given how fluid the CRI spec seems to be, I can totally imagine some-new-fancy-wrapper-over-runc run sha256:abcdef
to not expose the image name at all (since we don't have it), but still, e.g. the labels might be valuable.
Not to mention, this is generic code, for all container engines, some of which may not even have a notion of image names (I'm not sure e.g. LXC does). If I am in the minority and we decide to change the policy to require an image name, please at the very least place it directly in the container engine(s).
I understand you need the image name for anything interesting, but the real fix would be to make metadata queries fast and reliable (which we know is impossible in all cases but we can keep trying). This is just shooting the messenger: given how the docker/cri engines are written, if you get a success without an image name, I'd assume we have not received an image name from the API and there's nothing we can do about it. OK, maybe retrying the query a bit later might help (doubt it but stranger things have happened), but that's entirely the business of a particular engine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @gnosek, I respectfully disagree. Here's how I feel: I'm simply trying to ensure the tool functions effectively in real-life production environments. However, it often encounters issues with the container engine, which leads security analysts to conclude that the tool isn't working properly. It seems that this harsh reality isn't receiving much attention from anyone around here. Therefore, all I'm attempting to do is to make improvements to address these issues.
As of today, we still add it to the cache even if it failed. However, now it's marked as not successful, which lays the foundation for future improvements.
Are you making guesses about future changes? I am fighting today's fire-drills.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still, how does this chunk help? I'm not opposing the patch in general. I like the idea, I just still need to look at the implementation. I just can't imagine this particular bit doing anything good. I'm not even entirely sure it has an observable effect inside Falco (the callbacks are a different story, they'd be sad).
Combined with the other suggestion (of dropping FAILED lookups from the cache), it would keep retrying the lookups until one did return an image name, burning a ton of CPU in the process (both in Falco and the container runtime).
In the harsh reality I am so carefully ignoring, have you observed an API query that:
- returned metadata other than image name initially (i.e. not just raised an error)
- returned the image name too later on?
Like I said in the other comment, I'm all for marking the metadata for a container as "yeah, we got something but not everything we'd like" and retrying later (for some definition of later, probably other than "the next syscall"), but that has to be done in an explicit manner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jumping in late in this discussion. From my high-level understanding, this seem we're deciding between:
- Discarding all partially-incomplete metadata in case the container image is not available and force a new fetch attempt later on thus consuming extra CPU in the hope of getting more complete data
- Consider the fetch successful even in case some data (such as container image) is missing
Both come with evident pros and cons. If we struggle finding an agreement, to me this feels like the kind of behavior that should be configurable and not be enforced on any of the libs adopters. Of course I agree that security-related use cases have top-priority overall for the Falco project, but I also see that the libs themselves are an arsenal of system inspection tools that can be the foundation of different projects and products as well. To me, this feels like something we could negotiate as a "retry policy" new configuration point. Just my two cents, I trust both of you with your own expertise and evaluations.
Not to mention, this is generic code, for all container engines, some of which may not even have a notion of image names (I'm not sure e.g. LXC does). If I am in the minority and we decide to change the policy to require an image name, please at the very least place it directly in the container engine(s).
This one I'm very onboard with -- I think this should not affect the behavior of other container runtimes.
retrying later (for some definition of later, probably other than "the next syscall"), but that has to be done in an explicit manner.
Agree also on this, this "retry policy" definitely has to happen in a controlled or throttled way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that we have the new ticket #1708, let's hold off on any changes in this regard in the parser, and someone shall refactor it in the future. Please be aware of #1707. It's incorrect to initialize a lookup to SUCCESSFUL, by the way. I believe this is the strongest indication of a very suboptimal shared design across the container engines we support.
f3ac0ac
to
ba164f1
Compare
Needed another rebase. |
hmmm wanting to re-push the last commit, but GitHub is acting up It needs to be
|
a0ccfdc
to
1439f01
Compare
… case) Signed-off-by: Melissa Kilby <[email protected]>
* Now more precise, logical and clean, set STARTED at the beginning and SUCCESSFUL at the end of `parse_container_json_evt` if and only if the container image was successfully retrieved and present before calling `add_container` or if it's a pod sanbox container. * In `lookup_sync` we previously set the lookup_status to SUCCESSFUL, which can be avoided to have a more consolidated and clean behavior in `parse_container_json_evt`. Signed-off-by: Melissa Kilby <[email protected]>
…tainer sync lookups Co-authored-by: Roberto Scolaro <[email protected]> Signed-off-by: Melissa Kilby <[email protected]>
ba164f1
to
a0ccfdc
Compare
/unhold @therealbobo mind checking the last commit to ensure it's the exact same old code flow. It seems that we need to change the lookup status again to |
ff4bc4a
to
3f2dfaf
Compare
…ting state updates in lookup_sync are preserved Signed-off-by: Melissa Kilby <[email protected]>
3f2dfaf
to
9646c87
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As privately discussed, I'm ok with this change. Any other improvement may come in follow-up PRs.
LGTM label has been added. Git tree hash: d878d485880b017d5b47b87311c4765b6147fc3a
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: incertum, leogr, LucaGuerra The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Thanks and FYI I am tracking overall / general progress here: falcosecurity/falco#2708. |
--disable-cri-async
)
What type of PR is this?
/kind cleanup
Any specific area of the project related to this PR?
/area libsinsp
Does this PR require a change in the driver versions?
What this PR does / why we need it:
Fast-track addition of enriched containers (fields successfully retrieved from the container runtime socket) to the container cache, bypassing the round-trip process:
Although we still re-add the container in
parse_container_json_evt
to also support native 'container' events, it introduces an avoidable delay in the incoming syscall event stream. Syscall events do not explicitly require container events and instead directly retrieve container details from the container cache. This behavior could potentially contribute to the issues noted by adopters, such as the absence of container images in syscall events.Part of #1589
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?: