When Postgres is shutting down gracefully, it tries to ensure the replication client is up-to-date but Electric misbehaves #1632

alco · 2024-09-05T01:16:30Z

Somewhat of a bug but more like a nuisance because it only an issue when Postgres is shutting down.

When running Electric locally, stop the Postgres container and observe a seemingly endless stream of Electric logs that look similar to this:

04:09:17.252 [debug] Primary Keepalive: wal_end=88964152 (0/54D7C38) reply=1

04:09:17.252 [debug] Primary Keepalive: wal_end=88964152 (0/54D7C38) reply=1

04:09:17.252 [debug] Primary Keepalive: wal_end=88964152 (0/54D7C38) reply=1

04:09:17.252 [debug] Primary Keepalive: wal_end=88964152 (0/54D7C38) reply=1

04:09:17.252 [debug] Primary Keepalive: wal_end=88964152 (0/54D7C38) reply=1

04:09:17.252 [debug] Primary Keepalive: wal_end=88964152 (0/54D7C38) reply=1

04:09:17.252 [debug] Primary Keepalive: wal_end=88964152 (0/54D7C38) reply=1

04:09:17.252 [debug] Primary Keepalive: wal_end=88964152 (0/54D7C38) reply=1

Postgres container logs look similarly redundant:

2024-09-05 01:09:09.129 GMT [256] FATAL:  terminating connection due to administrator command
2024-09-05 01:09:09.129 GMT [253] FATAL:  terminating connection due to administrator command
2024-09-05 01:09:09.131 GMT [245] FATAL:  terminating connection due to administrator command
2024-09-05 01:09:09.132 GMT [1] LOG:  background worker "logical replication launcher" (PID 63) exited with exit code 1
2024-09-05 01:09:09.139 GMT [58] LOG:  shutting down
2024-09-05 01:09:11.104 GMT [296] LOG:  PID 264 in cancel request did not match any process
2024-09-05 01:09:11.105 GMT [297] LOG:  PID 263 in cancel request did not match any process
2024-09-05 01:09:11.106 GMT [298] LOG:  PID 283 in cancel request did not match any process
... a whole lot more of the same ...
2024-09-05 01:09:11.133 GMT [347] FATAL:  the database system is shutting down
2024-09-05 01:09:11.134 GMT [348] FATAL:  the database system is shutting down
2024-09-05 01:09:11.133 GMT [346] FATAL:  the database system is shutting down
... a whole lot more of the same ...

Eventually, the default timeout of docker stop kills the Postgres container and Electric finally sees the broken database connection:

04:09:21.183 [info] Starting shape replication pipeline

04:09:21.184 [debug] Starting Elixir.Electric.Shapes.ConsumerSupervisor

04:09:21.185 [debug] 1 consumers of replication stream

04:09:21.188 [warning] Database connection in replication mode failed: tcp connect (localhost:54321): connection refused - :econnrefused

04:09:21.188 [warning] Reconnecting in 4000ms

The text was updated successfully, but these errors were encountered:

alco · 2024-11-29T15:46:14Z

I've taken another look at this. Can't say with certainty whether we can fix this but it looks like a benign issue where Postgres keeps sending keepalive messages to the replica until its internal shutdown timeout expires.

Closing this as a non-issue.

alco added the bug label Sep 5, 2024

alco self-assigned this Sep 5, 2024

alco closed this as not planned Won't fix, can't repro, duplicate, stale Nov 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When Postgres is shutting down gracefully, it tries to ensure the replication client is up-to-date but Electric misbehaves #1632

When Postgres is shutting down gracefully, it tries to ensure the replication client is up-to-date but Electric misbehaves #1632

alco commented Sep 5, 2024

alco commented Nov 29, 2024

When Postgres is shutting down gracefully, it tries to ensure the replication client is up-to-date but Electric misbehaves #1632

When Postgres is shutting down gracefully, it tries to ensure the replication client is up-to-date but Electric misbehaves #1632

Comments

alco commented Sep 5, 2024

alco commented Nov 29, 2024