Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In tail plugin, position data in pos_file often was deleted? disappear? #4103

Open
SML0127 opened this issue Mar 18, 2023 · 6 comments
Open
Assignees

Comments

@SML0127
Copy link

SML0127 commented Mar 18, 2023

Describe the bug

The postition information of each file in pos_file disappears.
And only empty pos_file remain.

After that, several phenomena occur.
Such as "Unparsable line in pos_file: 000000000796950c" in fluentd log,
or position information is added for files that have already been tailed, and it starts to tail again from the beginning of file.

To Reproduce

fluentd-conf.yaml

/fluentd/source-data/ is NFS mounted server directory

<source>
  @type tail
  path "/fluentd/source-data/#{hostname}__*.log"
  pos_file "/fluentd/source-data/pos_files/#{hostname}.pos"
  refresh_interval 5s 
  follow_inodes true 
  skip_refresh_on_startup true
  read_from_head true
  read_lines_limit  10000 
  tag tag 
  pos_file_compaction_interval 1s
  <parse>
    @type json
  </parse>
</source>

Expected behavior

pos_file only compacts(or delete) position data when tracked file deleted or position data of duplicated file appended.

Your Environment

- Fluentd version: fluentd:v1.15.3-debian-1.1
- Operating system: PRETTY_NAME="Debian GNU/Linux 11 (bullseye)"
- Kernel version: Linux 4.15.0-207-generic

Your Configuration

fluentd-conf.yaml

/fluentd/source-data/ is NFS mounted server directory

<source>
  @type tail
  path "/fluentd/source-data/#{hostname}__*.log"
  pos_file "/fluentd/source-data/pos_files/#{hostname}.pos"
  refresh_interval 5s 
  follow_inodes true 
  skip_refresh_on_startup true
  read_from_head true
  read_lines_limit  10000 
  tag tag 
  pos_file_compaction_interval 1s
  <parse>
    @type json
  </parse>
</source>

Your Error Log

2023-03-18 15:11:06 +0900 [warn]: #0 Unparsable line in pos_file: 000000000761339b
2023-03-18 15:11:08 +0900 [info]: #0 Clean up the pos file
2023-03-18 15:11:08 +0900 [warn]: #0 Unparsable line in pos_file: 0000000006ed6a27
2023-03-18 15:11:09 +0900 [info]: #0 Clean up the pos file
2023-03-18 15:11:09 +0900 [warn]: #0 Unparsable line in pos_file: 000000000726ca85
2023-03-18 15:11:09 +0900 [info]: #0 Clean up the pos file
2023-03-18 15:11:11 +0900 [info]: #0 Clean up the pos file
2023-03-18 15:11:11 +0900 [warn]: #0 Unparsable line in pos_file: 000000000796950c
2023-03-18 15:11:11 +0900 [info]: #0 Clean up the pos file
2023-03-18 15:11:12 +0900 [info]: #0 Clean up the pos file

Additional context

No response

@SML0127
Copy link
Author

SML0127 commented Mar 18, 2023

even if I changed pos_file location to local storage (not nfs) same phenomena occurred 🥲

@daipom
Copy link
Contributor

daipom commented Mar 28, 2023

Somehow the pos_file seems to be broken.

Please tell me how to reproduce it in local storage in more detail.
Does this happen suddenly while Fluentd is running?
Can you check the content of the pos_file just before it is cleaned up and tell me?

@daipom daipom added the waiting-for-user Similar to "moreinfo", but especially need feedback from user label Mar 28, 2023
@SML0127
Copy link
Author

SML0127 commented Mar 31, 2023

@daipom

Please tell me how to reproduce it in local storage in more detail.

There are two conditions are necessary to reproduce.

First one is path must be a folder on NFS mounted server,
and second one is to set a small value to pos_file_compaction_interval.
(when I set large value like 12h to pos_file_compaction_interval other errors occurred.)

I think the key is NFS, there is an issue when fluentd checks the structure and files inside the folder(path) on the NFS mounted server.

Does this happen suddenly while Fluentd is running?

Yes.

Can you check the content of the pos_file just before it is cleaned up and tell me?

When I checked the content of the pos_file before cleaned up, there is no strange formats .

@daipom daipom removed the waiting-for-user Similar to "moreinfo", but especially need feedback from user label Apr 1, 2023
@daipom
Copy link
Contributor

daipom commented Apr 1, 2023

@SML0127
Thanks for the very helpful information!
I suspect some race conditions occurred in updating the pos_file.
I will check for possible conflicting processes in in_tail.

I have heard before that we should not put pos_file on NFS.
I don't know the detailed reason for this, but I hope this information will help improve it!

even if I changed pos_file location to local storage (not nfs) same phenomena occurred

I was wondering if this phenomenon happens on local storage as well.

First one is path must be a folder on NFS mounted server,

But, as a result, does this happen only on NFS?

If so, it could be due to differences in disk write speeds, file system flushing timing, etc...

Until the cause of this problem is found and fixed, is it possible to work around this issue by setting longer pos_file_compaction_interval or putting pos_file on local storage?

@daipom daipom self-assigned this Apr 1, 2023
@SML0127
Copy link
Author

SML0127 commented Apr 1, 2023

@daipom
Thank you for your kind reply!

Since I'm not sure of the root cause of this issue and whether it work's properly on local system,
I'm thinking of not using the tail plugin

I have one question, is there any guideline in the fluentd documentation that do not use NFS to path of tail plugin?

@daipom
Copy link
Contributor

daipom commented Apr 6, 2023

@SML0127
Environmental-specific problems are pointed out for in_tail for a while. I'm planning to make major improvements so that it can operate stably in a variety of environments.

I have one question, is there any guideline in the fluentd documentation that do not use NFS to path of tail plugin?

I don't think there is any such guideline.
I guess that's because we don't know the environment or settings that won't work for sure.
At least it seems to me that pos_file is not designed to be placed on NFS.
So it is certainly better to have such a guideline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants