Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add agent.ip #1027

Closed
runesl opened this issue Oct 14, 2020 · 3 comments
Closed

Add agent.ip #1027

runesl opened this issue Oct 14, 2020 · 3 comments
Labels
enhancement New feature or request

Comments

@runesl
Copy link

runesl commented Oct 14, 2020

Summary
Add the agent.ip field to store ip address of the agent.

Motivation:
For many use case, such as audit logs, message authenticity is paramount. If you can't see where a message came from, and don't use complex security such as 2-way TLS, anyone with network access can pretend to be anyone else and send false messages (spoofing).

A simple way to determine the origin of a message, is to store the ip address of the agent it was received from. Many tools - including logstash' beats-input - can reliably store the ip address of the remote client that sent it a message. But ECS currently has no where to put this information.

Please don't confuse agent with host - they are not the same, and we need a place to store ip of both. Host is where the event was initially created. Later the event can pass through an agent (beat) on another machine with a different ip, before reaching logstash, on it's way to Elastic Search.

Also, this is not covered reliably by observer.ip_address. The observer is defined to be outside the host. This means, we cannot put the ip address of the agent in the observer field, for the events that are born inside the same host as the agent.

Since there is currently no consistent place to store the agent's ip in ECS, please add the agent.ip field.

@ebeahan
Copy link
Member

ebeahan commented Oct 15, 2020

Thanks for opening the issue @runesl.

First, I'll mention that ECS always welcomes the use of custom fields to capture any fields needed for your use cases but not yet defined by ECS. By following that guidance, you run very little risk of having a future conflict.

As you stated, agent shouldn't be confused with host. Though, ECS defines an agent as a software component, and as such a software component wouldn't itself have an IP address. The IP address(es) would come from the host hardware/VM/container/etc where the process is running. Agents like packetbeat and metricbeat populate agent.* fields but use the host.* fields to capture details of the underlying host system such as the IP address.

Looking at agent through that lens, I'm not sure if agent.ip is the best field to capture this. But you have hit on a commonly revisited ECS discussion topic: better support and guidance for capturing data ingestion pipeline details in ECS. I think the need you describe here absolutely fits into that larger discussion around how to best capture the details of different stages of an ingestion pipeline.

We've recently consolidated a few other related discussions on this topic into a single issue, #940, and I've linked this issue there as well. If you have any thoughts or ideas to share in the meta issue discussion, feel welcome to add them.

@runesl
Copy link
Author

runesl commented Oct 20, 2020

Thanks for the quick reply, ebeahan!
I sense you are arguing that agent is only meant to contain static software component configuration, and not details about the running agent process, such as PID, memory/disk/cpu usage/limits, and open sockets including ips? Where could that go then?

The ip-address that a message is sent from, is not necesarily a static detail of the underlying system, which often has many ip's, but something that is assigned to the agent process when it does the OS bind call to claim a socket (ip+port) for sending the message. With serverless agents, things become even more dynamic, as "the host" can have a subnet of ip-adresses (think AWs lambda), and assign them to agent processes dynamicly.

Thanks for the link to the very relevant issue on pipeline modeling. Having a generic model for the whole pipeline would be great. I hope it will eventually be able to capture the distinction between internal ip, as claimed by the agent itself, and remote ip as observed by the next receiver in the chain, which can differ due to spoofing or NAT'ing.

@ebeahan
Copy link
Member

ebeahan commented Dec 1, 2020

Closing in favor of the pipeline details meta issue, #940.

@ebeahan ebeahan closed this as completed Dec 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants