Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When Postgres is shutting down gracefully, it tries to ensure the replication client is up-to-date but Electric misbehaves #1632

Closed
alco opened this issue Sep 5, 2024 · 1 comment
Assignees
Labels

Comments

@alco
Copy link
Member

alco commented Sep 5, 2024

Somewhat of a bug but more like a nuisance because it only an issue when Postgres is shutting down.


When running Electric locally, stop the Postgres container and observe a seemingly endless stream of Electric logs that look similar to this:

04:09:17.252 [debug] Primary Keepalive: wal_end=88964152 (0/54D7C38) reply=1

04:09:17.252 [debug] Primary Keepalive: wal_end=88964152 (0/54D7C38) reply=1

04:09:17.252 [debug] Primary Keepalive: wal_end=88964152 (0/54D7C38) reply=1

04:09:17.252 [debug] Primary Keepalive: wal_end=88964152 (0/54D7C38) reply=1

04:09:17.252 [debug] Primary Keepalive: wal_end=88964152 (0/54D7C38) reply=1

04:09:17.252 [debug] Primary Keepalive: wal_end=88964152 (0/54D7C38) reply=1

04:09:17.252 [debug] Primary Keepalive: wal_end=88964152 (0/54D7C38) reply=1

04:09:17.252 [debug] Primary Keepalive: wal_end=88964152 (0/54D7C38) reply=1

Postgres container logs look similarly redundant:

2024-09-05 01:09:09.129 GMT [256] FATAL:  terminating connection due to administrator command
2024-09-05 01:09:09.129 GMT [253] FATAL:  terminating connection due to administrator command
2024-09-05 01:09:09.131 GMT [245] FATAL:  terminating connection due to administrator command
2024-09-05 01:09:09.132 GMT [1] LOG:  background worker "logical replication launcher" (PID 63) exited with exit code 1
2024-09-05 01:09:09.139 GMT [58] LOG:  shutting down
2024-09-05 01:09:11.104 GMT [296] LOG:  PID 264 in cancel request did not match any process
2024-09-05 01:09:11.105 GMT [297] LOG:  PID 263 in cancel request did not match any process
2024-09-05 01:09:11.106 GMT [298] LOG:  PID 283 in cancel request did not match any process
... a whole lot more of the same ...
2024-09-05 01:09:11.133 GMT [347] FATAL:  the database system is shutting down
2024-09-05 01:09:11.134 GMT [348] FATAL:  the database system is shutting down
2024-09-05 01:09:11.133 GMT [346] FATAL:  the database system is shutting down
... a whole lot more of the same ...

Eventually, the default timeout of docker stop kills the Postgres container and Electric finally sees the broken database connection:

04:09:21.183 [info] Starting shape replication pipeline

04:09:21.184 [debug] Starting Elixir.Electric.Shapes.ConsumerSupervisor

04:09:21.185 [debug] 1 consumers of replication stream

04:09:21.188 [warning] Database connection in replication mode failed: tcp connect (localhost:54321): connection refused - :econnrefused

04:09:21.188 [warning] Reconnecting in 4000ms
@alco alco added the bug label Sep 5, 2024
@alco alco self-assigned this Sep 5, 2024
@alco
Copy link
Member Author

alco commented Nov 29, 2024

I've taken another look at this. Can't say with certainty whether we can fix this but it looks like a benign issue where Postgres keeps sending keepalive messages to the replica until its internal shutdown timeout expires.

Closing this as a non-issue.

@alco alco closed this as not planned Won't fix, can't repro, duplicate, stale Nov 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant