Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Sharding metrics #71

Open
penguinlav opened this issue Jan 8, 2023 · 3 comments
Open

Feature request: Sharding metrics #71

penguinlav opened this issue Jan 8, 2023 · 3 comments

Comments

@penguinlav
Copy link

Hi! Our bioyino cluster consumes a lot of resources from each host. Probably, the cluster node will run out of resources soon. Are there any plans to shard the collected metrics?

@penguinlav penguinlav changed the title Sharding metrics Feature request: Sharding metrics Jan 8, 2023
@Albibek
Copy link
Collaborator

Albibek commented Jan 8, 2023

Hi. Are you using agent-based approach so far? You probably could shard your metrics per-agent first.

UPD:
I thought I had documentation for agent-based approach published, but there is not. Though, we are mentioning some in this article.

The idea is to have separate non-clustered bioyino instances listening on UDP that will pre-aggregate metrics and send them to a cluster. In this case some of the cluster's load will be offloaded to the agents. It also can help with manual sharding if you have e.g. groups of servers you can easily separate from each other metric-wise. In such case you could point these groups to different agents and point agents to different clusters. If your case doesn't fit into what I've said above, it could be great if you gave more details or a case of usage, I could propose some solution for you.

I was also thinking about "pure" sharding, based on hash rings or kind of this, but could not find a good case for that. Usually what we actually wanted was some kind of routing of incoming metrics more based on prefix or the source than based on a hash. All these many ways of distributing metrics among different destinations seemed to me worth an entire separate product, something similar to carbon-c-relay. Adding all the possible routing algorithms to bioyino would bloat the codebase too much.

@penguinlav
Copy link
Author

penguinlav commented Jan 11, 2023

Yes, we already use this approach with local aggregate nodes. And we are close to the limit of resources on master nodes.

We can increase period of time between sending snapshots (2 sec). As far I can see, it will help reduce cpu consumption for snapshot merging by reducing the number of snapshots in the interval (send metrics to carbon).

But what if there is a sharding feature? Unlimited ability to increase the number of metrics collected by bioyino :)

@Albibek
Copy link
Collaborator

Albibek commented Feb 18, 2023

To be clear, snapshots interval is regulated by snapshot-interval, not by interval, but you are right, it may reduce some load because of less TCP connection overheads and a lower number of total TCP connections.

Sharding is not so easy when it comes to failing nodes. Let's say you have a 3-node cluster and 10 agents. How would you distribute data among them and how you expect to configure it? What behavior do you expect when 1 cluster node or 2 cluster nodes fail?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants