-
Notifications
You must be signed in to change notification settings - Fork 308
Server congestion #2109
Comments
Pretty sure that's Pagerduty (#2072). |
I've marked this DevX ★. |
"Site unavailable" notifications with pagerduty are simple, as we're converting UptimeRobot alert emails into pagerduty notifications. Just need to work with @whit537 and @zwn to get them properly set up and get on-call schedules spread fairly. (ie. #2072) As for more advanced alerting on certain conditions or logs messages, that requires new tooling (graylog2? some heroku add-on for log monitoring? newrelic? etc.). We would still route any other services' monitoring through the consistent pagerduty alerting system. Let's make this issue about that. I've added a Todo section to the OP |
Not writing to the database on each and every request for signed in users (even for static files, even for 304) could help us move further away from the congestion issue (see #2041). The query to update |
@zwn I assume you queried the production database to find the most common queries? Could you list them out somewhere? |
I strongly recommend NewRelic for diagnosing these issues. The free NewRelic Standard you get with Heroku gives you things like detailed error reports, alerting, event (downtime, high error rate, etc.) reports, transaction tracing, etc.. |
Closing. The specific issue of becoming completely unresponsive when busy threads are maxed out has been fixed by #2384. |
Lately we've been experiencing quite a few incidents where the server becomes so congested that it becomes completely unresponsive.
Today at 1 PM Central Time: (IRC)
Today at 5 PM Central Time: (IRC)
In both cases we had to restart the server for it to get back to normal. We need to get to the bottom of this. What's causing the slowdown? Is there any insight to be gained from the log of pages accessed?
We should have a system to automatically alert us when the site is having such troubles.
The text was updated successfully, but these errors were encountered: