Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible connection re-use issue in 0.0.24 #248

Closed
dhermes opened this issue Mar 2, 2024 · 3 comments · Fixed by #246
Closed

Possible connection re-use issue in 0.0.24 #248

dhermes opened this issue Mar 2, 2024 · 3 comments · Fixed by #246

Comments

@dhermes
Copy link
Contributor

dhermes commented Mar 2, 2024

We just put 0.0.24 into production and it's starting to get into a state where river is logging hundreds of times a second with

{
    "level": "error",
    "time": "2024-03-01T23:56:08.679305457Z",
    "notifier": {
        "err": {
            "error": "connection already established",
            "kind": "*errors.errorString",
            "stack": null
        }
    },
    "subsystem": "river",
    "message": "error establishing connection from pool"
}

via:

(My apologies for the lack of a stack trace)

This is thrashing CPU significantly (the small bumps are the deployments that haven't rolled out 0.0.24 yet):

Screenshot 2024-03-01 at 5 15 37 PM

I managed to capture the lead up to ONE of these bursts and it looks like a TCP error on the PostgreSQL connection started the death spiral

{
    "level": "info",
    "time": "2024-03-01T23:56:07.097802512Z",
    "num_completed_jobs": 40,
    "num_jobs_running": 0,
    "queue": "default",
    "subsystem": "river",
    "message": "producer: Heartbeat"
}

{
    "level": "error",
    "time": "2024-03-01T23:56:08.679126682Z",
    "notifier": {
        "err": {
            "error": "tls: failed to send closeNotify alert (but connection was closed anyway): write tcp 10.122.48.181:44034->10.122.30.240:5432: i/o timeout",
            "kind": "*fmt.wrapError",
            "stack": null
        }
    },
    "subsystem": "river",
    "message": "error closing listener"
}

{
    "level": "error",
    "time": "2024-03-01T23:56:08.679305457Z",
    "notifier": {
        "err": {
            "error": "connection already established",
            "kind": "*errors.errorString",
            "stack": null
        }
    },
    "subsystem": "river",
    "message": "error establishing connection from pool"
}

I'm sorry I don't have anything more conclusive right now but wanted to at least get this in your hands in case something rings a bell.

@bgentry
Copy link
Contributor

bgentry commented Mar 2, 2024

@dhermes thank you, I believe this is the same issue reported in #239 (comment) and it should be fixed by #246. Reviewing that now!

@brandur
Copy link
Contributor

brandur commented Mar 2, 2024

@dhermes Hey, sorry about the trouble here. We just cut v0.0.25, which should resolve the problem.

@dhermes
Copy link
Contributor Author

dhermes commented Mar 2, 2024

Thanks all!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants