Make initial delay selection more sane #84

astei · 2021-04-19T12:58:17Z

See PaperMC/Velocity#476 for justification.

This should be less problematic by picking a random (up to five minute) delay in sending the first data.

Bastian · 2021-04-19T20:08:41Z

Thank you for the PR!

The current design with two random delays was fully intentional and hitting ratelimits more often is an accepted compromise.

Let me start by explaining why the random delay even exists. The initial logic was very simple: Wait 5 minutes after server startup and then send the data every 30 minutes. What I didn't consider back then, was that many servers tend to restart at a fixed time at a full hour mark (xx:00). This caused a spike in requests every 30 minutes at xx:05 and xx:35 which resulted in a high CPU usage at these times:

The first idea to solve this issue was very similar to your PR: Just make the initial delay a bit random (e.g., MiniDigger suggested 5 min + random between 0 and 15 min). This however comes with a new set of issues:

If the random delay is too large (like the suggested 15 min), it might take a very long time for the server to send data initially and we could miss some
The load would still be unevenly distributed, just more spread out. This becomes especially relevant for smaller random delays like the 5 minutes in this PR. Instead of having about 100% more requests for ~1 minute at xx:05 and xx:35, these extra requests would now be distributed over 5 minutes. This would definitely better but still cause about ~20% more requests in that period.

This means that it is necessary to find a good balance and choose a not too short but also not too long random delay. Or alternatively, find another approach.

The current solution is a hybrid that tries to get the best of both worlds: It still uses a (short) initial random delay (between 3-6 minutes) but then instead of just sending the data every 30 minutes, it introduces a second delay between 0-30 minutes and only then starts the regular 30 minutes interval. This makes it very likely to hit a ratelimit once initially (in fact, in nearly 50% of all cases). Since ratelimits are super (CPU-)inexpensive for the backend to handle, this is no issue at all. However, it also makes sure that requests are evenly distributed and thus solves the issue quite nicely.

That being said, I agree that there is still space for improvements:

It happens quite often that people report bugs because they received a 429 response from the backend. While you have to enable the logFailedRequests config option to see it (which is only intended for debugging), it indeed looks scary with its huge stacktrace and WARN log level. It might be a good idea to handle it differently and log a less verbose message without a stack trace. Something along these lines: "Received 429 ratelimit from bStats. This is expected and not causing issues".
The fixed initial delay of 3 minutes (3 min + random between 0 and 3 min) is a remnant of the old bStats Metrics classes that sent the data for all plugins in a single request and thus needed it to give the other plugins some time to startup (otherwise the first loaded plugin would only send its own data initially). This is not really necessary anymore and can probably be removed.

astei · 2021-04-20T08:11:43Z

Let me start by explaining why the random delay even exists.

I know why it exists - having made the Buycraft plugin, we ran into this issue too (and we had the advantage of doing near-infinite scaling on AWS!) and I implemented randomization to spread out the load from servers sending requests to pull purchases from Buycraft.

I'll probably implement your ideas here though, just to make the issue much less scary-looking.

astei · 2021-05-05T06:33:09Z

Decided to not pursue this anymore - more pressing things to do.

Bastian · 2021-05-05T06:58:38Z

Thanks anyway 🙂

I've created a new issue for it to not forget it: #86

Ensure the second delay is strictly greater than the first delay

db21f3c

See PaperMC/Velocity#476 for justification.

astei mentioned this pull request Apr 19, 2021

bStats' random intervals are too random PaperMC/Velocity#476

Closed

Only randomize the first delay

b5cb258

This should be less problematic by picking a random (up to five minute) delay in sending the first data.

astei changed the title ~~Ensure the second delay is strictly greater than the first delay~~ Make initial delay selection more sane Apr 19, 2021

astei closed this May 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make initial delay selection more sane #84

Make initial delay selection more sane #84

astei commented Apr 19, 2021

Bastian commented Apr 19, 2021

astei commented Apr 20, 2021 •

edited

Loading

astei commented May 5, 2021

Bastian commented May 5, 2021

Make initial delay selection more sane #84

Make initial delay selection more sane #84

Conversation

astei commented Apr 19, 2021

Bastian commented Apr 19, 2021

astei commented Apr 20, 2021 • edited Loading

astei commented May 5, 2021

Bastian commented May 5, 2021

astei commented Apr 20, 2021 •

edited

Loading