Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

arp: optimize interface name resolution #3133

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

dswarbrick
Copy link
Contributor

Go net package's net.InterfaceByIndex fetches the entire interface table behind the scenes, despite us specifying the interface index that we are interested in. This results in an exponential amount of rtnetlink traffic (and thus recvfrom syscalls) as the number of interfaces in a host increases.

Fixes: #3075

Copy link
Member

@discordianfish discordianfish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great find! Thanks!

github.com/jsimonetti/rtnetlink provides a high level rtnl wrapper
around the lower level rtnetlink functions, which essentially does all
that we need. The rtnl.Conn.Neighbors uses an internal cache for
resolving interface indexes to names, so it makes at most one rtnetlink
call per interface to resolve the name.

Using this high level wrapper hugely simplifies our code and makes it
easier to understand and maintain.

Fixes: prometheus#3075

Signed-off-by: Daniel Swarbrick <[email protected]>
@maxime1907
Copy link

Any news regarding this PR?

@juliantaylor
Copy link

juliantaylor commented Nov 27, 2024

the issue being fixed is pretty problematic in e.g. calico based kubernetes clusters which have a lot of devices:

ts=2024-11-27T13:10:40.581Z caller=collector.go:173 level=debug msg="collector succeeded" name=arp duration_seconds=14.24509357

workaround until this is merged and released is to switch back to gathering via procfs via

--no-collector.arp.netlink

(limiting devices via --collector.arp.device-exclude does not help)

@dswarbrick
Copy link
Contributor Author

@SuperQ Can we get this merged before the next release?

@discordianfish
Copy link
Member

@SuperQ Yeah PTAL

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug: collector.arp.netlink is super slow to retrieve metrics
5 participants