Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PoC for log tailing shipment via log gateway #769

Closed
a-thaler opened this issue Feb 5, 2024 · 1 comment
Closed

PoC for log tailing shipment via log gateway #769

a-thaler opened this issue Feb 5, 2024 · 1 comment
Assignees
Labels
area/logs LogPipeline kind/decision Marks a decision document
Milestone

Comments

@a-thaler
Copy link
Collaborator

a-thaler commented Feb 5, 2024

Description
As outlined in #556 it would be best to have a central log gateway. An optional additional component should be the log agent log agent, which is tailing the application logs and forwarding them to the gateway. The approach of separating the setup for log tailing in a agent and gateway part will have the advantage:

  • to have well-defined responsibilities of the components (log enrichment happens only in the gateway)
  • the setup will be consistent with the metrics scenario
  • only one component dealing with backend connectivity (secret handling)

However, it might

  • have a resource-consumption impact as two components need to process log data instead of 1 as it is in the current fluentbit based setup.
  • have a scaling implication as not many pods (the agent instances) are sending to the backend anymore but only few (the gateway instances)
  • have a durability impact, as there are more places where logs could be dropped

Goal:
Build up knowledge in regards to retry handling and buffering and make experiments in order to come up with a decision on if the downsides are acceptable/can be mitigated so that we can forward with the target design.

Note:

  • Using a batch processor in the agent will not cause a pause of the tailing, maybe we need to remove the batching there
  • A re-enqueueing seems not to happen as in fluentbit

Criteria

  • Document the most relevant findings in regards to otel-collector behaviour
  • Create a concept on the new architecture incl. buffering/persistence setup
  • Answers to the questions:
    • Agent will ship via gateway or not
    • Where do we have persistent buffers (agent and/or gateway)
    • Can we keep our current guarantees (see limitation section) or do we need adjustments

Reasons

Attachments

Release Notes


@a-thaler a-thaler added area/logs LogPipeline kind/decision Marks a decision document labels Feb 5, 2024
@a-thaler a-thaler mentioned this issue Feb 5, 2024
23 tasks
@chrkl
Copy link
Contributor

chrkl commented Mar 4, 2024

PoC results have been documented with #843.

@chrkl chrkl closed this as completed Mar 4, 2024
@a-thaler a-thaler added this to the 1.11.0 milestone Mar 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/logs LogPipeline kind/decision Marks a decision document
Projects
None yet
Development

No branches or pull requests

2 participants