Possible connection re-use issue in `0.0.24` #248

dhermes · 2024-03-02T01:20:41Z

We just put 0.0.24 into production and it's starting to get into a state where river is logging hundreds of times a second with

{
    "level": "error",
    "time": "2024-03-01T23:56:08.679305457Z",
    "notifier": {
        "err": {
            "error": "connection already established",
            "kind": "*errors.errorString",
            "stack": null
        }
    },
    "subsystem": "river",
    "message": "error establishing connection from pool"
}

via:

river/internal/notifier/notifier.go

Line 130 in 035ba59

n.logger.Error("error establishing connection from pool", "err", err)
river/riverdriver/riverpgxv5/river_pgx_v5_driver.go

Line 491 in 035ba59

return errors.New("connection already established")

(My apologies for the lack of a stack trace)

This is thrashing CPU significantly (the small bumps are the deployments that haven't rolled out 0.0.24 yet):

I managed to capture the lead up to ONE of these bursts and it looks like a TCP error on the PostgreSQL connection started the death spiral

{
    "level": "info",
    "time": "2024-03-01T23:56:07.097802512Z",
    "num_completed_jobs": 40,
    "num_jobs_running": 0,
    "queue": "default",
    "subsystem": "river",
    "message": "producer: Heartbeat"
}

{
    "level": "error",
    "time": "2024-03-01T23:56:08.679126682Z",
    "notifier": {
        "err": {
            "error": "tls: failed to send closeNotify alert (but connection was closed anyway): write tcp 10.122.48.181:44034->10.122.30.240:5432: i/o timeout",
            "kind": "*fmt.wrapError",
            "stack": null
        }
    },
    "subsystem": "river",
    "message": "error closing listener"
}

{
    "level": "error",
    "time": "2024-03-01T23:56:08.679305457Z",
    "notifier": {
        "err": {
            "error": "connection already established",
            "kind": "*errors.errorString",
            "stack": null
        }
    },
    "subsystem": "river",
    "message": "error establishing connection from pool"
}

I'm sorry I don't have anything more conclusive right now but wanted to at least get this in your hands in case something rings a bell.

The text was updated successfully, but these errors were encountered:

bgentry · 2024-03-02T02:03:24Z

@dhermes thank you, I believe this is the same issue reported in #239 (comment) and it should be fixed by #246. Reviewing that now!

brandur · 2024-03-02T04:29:53Z

@dhermes Hey, sorry about the trouble here. We just cut v0.0.25, which should resolve the problem.

dhermes · 2024-03-02T15:53:27Z

Thanks all!

bgentry mentioned this issue Mar 2, 2024

More robust listener close in riverpgxv5 #246

Merged

brandur closed this as completed in #246 Mar 2, 2024

upalsaha mentioned this issue Mar 8, 2024

Intermittent listener close issue popping up again #256

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible connection re-use issue in `0.0.24` #248

Possible connection re-use issue in `0.0.24` #248

dhermes commented Mar 2, 2024 •

edited

Loading

bgentry commented Mar 2, 2024

brandur commented Mar 2, 2024

dhermes commented Mar 2, 2024

Possible connection re-use issue in 0.0.24 #248

Possible connection re-use issue in 0.0.24 #248

Comments

dhermes commented Mar 2, 2024 • edited Loading

bgentry commented Mar 2, 2024

brandur commented Mar 2, 2024

dhermes commented Mar 2, 2024

Possible connection re-use issue in `0.0.24` #248

Possible connection re-use issue in `0.0.24` #248

dhermes commented Mar 2, 2024 •

edited

Loading