-
Notifications
You must be signed in to change notification settings - Fork 910
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Falco 0.36.1 and 0.36.0 segfault sometimes with syscall (evt null) #2878
Comments
Hi! Thanks for the detailed research!
fixes the issue? |
@FedeDP I've replaced https://github.com/falcosecurity/libs/blob/5b4e28d44f9c87aa23f24c53c6bfb696b2b47635/userspace/libsinsp/sinsp.cpp#L1436-L1447 with the suggested lines and compiled the debug build. I've got it running now now I can just wait and see if it will crash or not. |
@max-frank you are the best tester out there! Thank you very much! Fingers 🤞 |
@max-frank any new crash after applying the patch? :) |
@FedeDP Ok so its been officially around 24h and can report not crash yet |
/milestone 0.36.2 We are arranging a 0.36.2 patch release for Falco to fix this :) |
Very nice much appreciated |
Hi! I just released Falco 0.36.2-rc1 that should solve this issue; care to try it? |
Just deployed it in our local deployment test cluster. No crashes for syscall falco yet with the release candidate. We are still experiencing issues with gvisor crashing on startup on our GKE 1.26 configuration, but that is a pre-existing issue I did not have time to debug yet. So I don't know if its a configuration issue on our end or an issue in falco. (I will open an issue though for it if I have time to figure it out.) |
Thanks for your feedback, it is really valuable! And thanks for helping us spot this issue! |
This is fixed in Falco 0.36.2, just released! |
@FedeDP: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Describe the bug
After the upgrade to 0.36.0 (and consecutively 0.36.1) we noticed that our falco daemonsets were restarting multiple times overnight. Note that this is in a low traffic cluster only used for testing the falco deployment so should be unrelated to K8S metadata performance issues.
Pods seem to exit with CODE: 139 seemingly randomly. To investigate we built falco with debug release in the orignal pod environment and let it run until it crashed. See the stack trace and logs from one such below.
As you can see in the stack trace the cause of the segfault seems to be that the
evt
pointer seems to be null during the get_source_idx call in the main inspect loop. Looking at the code the only reason this could happen is if the sinsp::next at the beginning of the loop set the pointer to null. If this is not expected behaviour then this might be a bug in the libs rather then a bug in falco caused by a missing null check.How to reproduce it
Start falco with syscall source and wait. The pod will eventually crash.
Expected behaviour
No crashes happen.
Environment
GKE 1.26.6-gke.1700 (COS)
Falco version (debug build):
Installation method:
Custom K8S manifest using standard container image
For debugging built falco from source as DEBUG release (with ASSERTIONS stripped)
Additional context
Debug run output
Stack trace
Is too long so pastebin here
The text was updated successfully, but these errors were encountered: