-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add more checks for buffer corruption on startup #3970
Comments
Thanks for summarizing this issue!
I feel the next log should be the info level since these chunks could be a problem in the case of abnormal system termination, such as a machine power failure.
We have the check in
In
|
I am examining this issue. The most important thing is to detect file corruption at abnormal system termination, such as a machine power failure.
This log level should be Without this log, even if we notice that we have received corrupted data on the destination server, we can not know which chunks may have been corrupted. And, if possible, we should check for the corruption of the chunk's body when loading existing chunks. I am currently making this modification. |
I have created some PRs for this issue.
About adding more checks for buffer corruption, I consider the following:
|
All PRs are merged, thanks for the reviews!
I want to work on other issues now, so I won't be able to work on this for a while. |
Added documentation. The following feature would be helpful, but I will not be able to work on it for a while.
|
Not sure it's possible but if we could add checksums for the buffer contents, it would be helpful to verify the correctness of the buffers. This is already implemented in the chunkio which is used in Fluent Bit's filesystem buffering mechanism. The main issue of the current implementation is: there is no mechanisms to detect the buffer corruptions. |
I agree. I remember that when I previously made some improvements to this issue, I did not consider such a new mechanism because it would be expensive to implement and impactful to existing logic. |
Is your feature request related to a problem? Please describe.
Currently when starting up Fluentd outputs, we try to check if each buffer chunk
is non-empty, and if it has some bytes, we assume it contains valid data.
It turned out that this operation model has a few issues:
many kinds of errors in various parts of the pipeline.
This is important because users probably want to recover the lost data.
We should perform more rigorous buffer checks on startup,
so that Fluentd can handle corrupted chunks gracefully.
Describe the solution you'd like
Describe alternatives you've considered
N/A
Additional context
No response
The text was updated successfully, but these errors were encountered: