Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Helm] Chart does not mount /var/log/containers/ on Elastic-Agent pod, thus no logs are ingested #6204

Closed
belimawr opened this issue Dec 3, 2024 · 7 comments · Fixed by #6345
Assignees
Labels
bug Something isn't working Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team

Comments

@belimawr
Copy link
Contributor

belimawr commented Dec 3, 2024

  • Version: 8.16.0, main,
  • Operating System: Kubernetes

The Helm chart does not mount /var/log/containers/ into the Elastic-Agent container, therefore it cannot read logs from any container in the cluster.

Steps to reproduce

  1. Add the Kubernetes integration with logs enabled to a new Elastic-Agent policy
  2. Follow the steps to install the Helm Chart from our documentation using the policy created on step 1.
  3. There will be no logs ingested from the Kubernetes cluster. You can confirm that by filtering the logs using event.dataset: kubernetes.container_logs

Workaround

Use the manifest provided in Kibana when selecting "Add agent" -> "Kubernetes"

@belimawr belimawr added bug Something isn't working Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team labels Dec 3, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

@pkoutsovasilis
Copy link
Contributor

pkoutsovasilis commented Dec 3, 2024

thanks for the issue @belimawr, so what happens here is that under fleet mode the chart does nothing that derives from an integration config, as this is controlled by fleet. As a result, this choice leads to volument mounts and other bits, as you already observed, to not be in the k8s manifest of elastic-agent. Specifically in this case the container logs volume mount which is injected when a user selects to enable the kubernetes integration doesn't apply here. Most probably we want to alter this behaviour and allow a user to enable integrations even when deploying a managed by fleet elastic-agent which results in maintaining these bits and not the elastic-agent config

cc @ycombinator this probably needs some prio 🙂

@belimawr
Copy link
Contributor Author

belimawr commented Dec 4, 2024

Because the default configuration of the Kubernetes integration is to collect pod logs, I believe we should have the mount enabled by default in the helm chart.

@pkoutsovasilis
Copy link
Contributor

Because the default configuration of the Kubernetes integration is to collect pod logs, I believe we should have the mount enabled by default in the helm chart.

I hear you and you already captured something that is not enabled by default, the Kubernetes integration 🙂 What I mean is that the fact that a user installs a Fleet-managed agent in practice is not tightly coupled with the Kubernetes integration and the volume mount is only needed for the latter. Thus having by default on in the kubernetes integration (which the user can enable by --set kubernetes.enabled=true makes total sense) but having it by default just because a user installs a Fleet-managed agents seems a little bit too aggressive

@belimawr
Copy link
Contributor Author

belimawr commented Dec 6, 2024

I see your point, it makes sense.

@swiatekm
Copy link
Contributor

swiatekm commented Dec 6, 2024

In general, it is the case that a user can add integrations in Fleet that their agent deployment in K8s is unable to run, due to missing mounts or permissions. Container logs are just the most obvious example of this, but it applies just as well to system metrics (require a /proc mount from the Node) or cluster metrics (require specific RBAC). In the absence of an actual operator that Fleet could talk to, the best we can do is either:

  1. Install agent with a superset of all necessary mounts and permissions, give instructions on how to cut them down if the user wants to.
  2. Install agent with only the necessary mounts and permissions, give instructions on how to redeploy the agent if the user adds integrations which require more.

Right now, I think we're going with 2? But it's a valid discussion to have.

@cmacknz
Copy link
Member

cmacknz commented Dec 6, 2024

1 is closer to the current strategy in the reference configurations. https://github.com/elastic/elastic-agent/blob/main/deploy/kubernetes/elastic-agent-managed/elastic-agent-managed-daemonset.yaml. This makes sure everything works when getting started and experimenting. The drawback is we get regular requests to explain or reduce the mounts and privilege level to the minimum required for a use case. Often these requests block deployment or adoption of agent until they are resolved.

My current view on this would be that:

  1. The Kubernetes integration should be enabled by default, and we should setup whatever we need to read container logs and contact the k8s apiserver by default. There should be options to disable particular capabilities (e.g. removing the container log directory mount if container logs aren't desired) and easily disable Kubernetes monitoring entirely. Disabling k8s monitoring is another very common request for people who want to collect data while running agent on k8s but are monitoring k8s in another way.
  2. The system integration should be disabled by default, but when it gets enabled it should have the capabilities and mounts to get the most common metrics. Often when this gets enabled it is to monitor the node (cpu, memory, io, disk, network, etc) and that should work out of the box when the system integration is enabled, but monitoring the node requires doing more to break the container-host boundary.

This would establish that on a native system where agent runs as a service, system is the default integration. When run on Kubernetes, the Kubernetes integration is the default integration because the cluster is the system in this world.

CC @mlunadia in case you have opinions here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants