Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logical replication slot becomes inactive just after initial sync starts #491

Open
pensnarik opened this issue Dec 15, 2024 · 1 comment

Comments

@pensnarik
Copy link
Contributor

pensnarik commented Dec 15, 2024

I have source and destination databases having versions 9.4 and 15.10 correspondingly. The source has pglogical 2.4.1 installed, the destination has pglogical 2.4.5. The source database is about 2Tb and has about 300 tables. When I create a subscription in the destination database pglogical creates a logical replication in the source databases successfully and the slot is active. But just when data synchronization is started, the slot becomes inactive which (I suppose) leads to a situation when no data changes are copied from source to destination when initial sync is done. I end up with the destination database with data actual on sync start and the source database which still has inactive replication slot (the full sync takes about 3 days). Despite the fact the replication slot is inactive data is still copied which looks strange to me.

I enabled log level = debug1 both on source and destination and I see nothing suspicious.

The source db logs:

Dec 15 15:13:30 dbs01 postgres[7987]: [66-1] < 2024-12-15 15:13:30.246 GMT > < 102/0 >DEBUG:  received replication command: CREATE_REPLICATION_SLOT "pgl_DBNAME_dbs01_prod_DBNAME_prod" LOGICAL pglogical_output
Dec 15 15:13:30 dbs01 postgres[7987]: [67-1] < 2024-12-15 15:13:30.266 GMT > < 102/0 >DEBUG:  searching for logical decoding starting point, starting at 3DA0/632286E0
Dec 15 15:13:30 dbs01 postgres[7987]: [68-1] < 2024-12-15 15:13:30.270 GMT > < 102/0 >LOG:  logical decoding found initial starting point at 3DA0/632286E0
Dec 15 15:13:30 dbs01 postgres[7987]: [68-2] < 2024-12-15 15:13:30.270 GMT > < 102/0 >DETAIL:  Waiting for transactions (approximately 1) older than 965107985 to end.
Dec 15 15:15:26 dbs01 postgres[7987]: [69-1] < 2024-12-15 15:15:26.591 GMT > < 102/0 >LOG:  logical decoding found initial consistent point at 3DA0/63E56E08
Dec 15 15:15:26 dbs01 postgres[7987]: [69-2] < 2024-12-15 15:15:26.591 GMT > < 102/0 >DETAIL:  Waiting for transactions (approximately 2) older than 965108322 to end.
Dec 15 15:15:52 dbs01 postgres[7987]: [70-1] < 2024-12-15 15:15:52.031 GMT > < 102/0 >LOG:  logical decoding found consistent point at 3DA0/64007DD0
Dec 15 15:15:52 dbs01 postgres[7987]: [70-2] < 2024-12-15 15:15:52.031 GMT > < 102/0 >DETAIL:  There are no old transactions anymore.
Dec 15 15:15:52 dbs01 postgres[7987]: [71-1] < 2024-12-15 15:15:52.032 GMT > < 102/23793332 >LOG:  exported logical decoding snapshot: "398662A6-1" with 2 transaction IDs

The destination db logs:

2024-12-15 18:26:09 [2958022] => [2-1] client=[local],db=DBNAME,user=pglogical_replica_pg15 LOG:  connection authorized: user=pglogical_replica_pg15 database=DBNAME application_name=DBNAME_prod_create
2024-12-15 18:26:09 [2897011] => [10927-1] client=Y.Y.Y.Y,db=DBNAME,user=replsetup LOG:  duration: 4096.422 ms  statement: select pglogical.create_subscription(
        subscription_name := 'DBNAME_prod',
        provider_dsn := 'host=* port=5432 dbname=DBNAME user=pglogical_replica_pg15 password=*',
        replication_sets := array['DBNAME_pg15']
    );
2024-12-15 18:26:09 [469090] => [100-1] client=,db=,user= DEBUG:  registering background worker "pglogical manager 5"
2024-12-15 18:26:09 [469090] => [101-1] client=,db=,user= DEBUG:  registering background worker "pglogical apply 16688:750086471"
2024-12-15 18:26:09 [469090] => [102-1] client=,db=,user= DEBUG:  starting background worker process "pglogical apply 16688:750086471"
2024-12-15 18:26:09 [469090] => [103-1] client=,db=,user= DEBUG:  starting background worker process "pglogical manager 5"
2024-12-15 18:26:09 [2958023] => [1-1] client=,db=DBNAME,user=[unknown] LOG:  starting apply for subscription DBNAME_prod
2024-12-15 18:26:09 [2958024] => [1-1] client=,db=postgres,user=[unknown] LOG:  manager worker [2958024] at slot 1 generation 58 detaching cleanly
2024-12-15 18:26:09 [2958023] => [2-1] client=,db=DBNAME,user=[unknown] INFO:  initializing subscriber DBNAME_prod
2024-12-15 18:26:09 [469090] => [104-1] client=,db=,user= DEBUG:  unregistering background worker "pglogical manager 5"
2024-12-15 18:26:09 [469090] => [105-1] client=,db=,user= DEBUG:  background worker "pglogical manager 5" (PID 2958024) exited with exit code 0
2024-12-15 18:26:09 [469090] => [106-1] client=,db=,user= DEBUG:  registering background worker "pglogical manager 1"
2024-12-15 18:26:09 [469090] => [107-1] client=,db=,user= DEBUG:  starting background worker process "pglogical manager 1"
2024-12-15 18:26:09 [2958025] => [1-1] client=,db=template1,user=[unknown] LOG:  manager worker [2958025] at slot 1 generation 59 detaching cleanly
2024-12-15 18:26:09 [469090] => [108-1] client=,db=,user= DEBUG:  unregistering background worker "pglogical manager 1"
2024-12-15 18:26:09 [469090] => [109-1] client=,db=,user= DEBUG:  background worker "pglogical manager 1" (PID 2958025) exited with exit code 0
2024-12-15 18:26:09 [469090] => [110-1] client=,db=,user= DEBUG:  registering background worker "pglogical manager 16483"
2024-12-15 18:26:09 [469090] => [111-1] client=,db=,user= DEBUG:  starting background worker process "pglogical manager 16483"
2024-12-15 18:26:09 [2958026] => [1-1] client=,db=bo_audit_log,user=[unknown] LOG:  starting pglogical database manager for database bo_audit_log
2024-12-15 18:26:09 [2958026] => [2-1] client=,db=bo_audit_log,user=[unknown] LOG:  manager worker [2958026] at slot 1 generation 60 detaching cleanly
2024-12-15 18:26:09 [469090] => [112-1] client=,db=,user= DEBUG:  unregistering background worker "pglogical manager 16483"
2024-12-15 18:26:09 [469090] => [113-1] client=,db=,user= DEBUG:  background worker "pglogical manager 16483" (PID 2958026) exited with exit code 0
2024-12-15 18:26:09 [469090] => [114-1] client=,db=,user= DEBUG:  registering background worker "pglogical manager 20888"
2024-12-15 18:26:09 [469090] => [115-1] client=,db=,user= DEBUG:  starting background worker process "pglogical manager 20888"
2024-12-15 18:26:09 [2958027] => [1-1] client=,db=another_db_on_this_cluster,user=[unknown] LOG:  starting pglogical database manager for database another_db_on_this_cluster
2024-12-15 18:26:09 [2958027] => [2-1] client=,db=another_db_on_this_cluster,user=[unknown] LOG:  manager worker [2958027] at slot 1 generation 61 detaching cleanly
2024-12-15 18:26:09 [469090] => [116-1] client=,db=,user= DEBUG:  unregistering background worker "pglogical manager 20888"
2024-12-15 18:26:09 [469090] => [117-1] client=,db=,user= DEBUG:  background worker "pglogical manager 20888" (PID 2958027) exited with exit code 0
2024-12-15 18:26:13 [474563] => [26-1] client=X.X.X.X,db=[unknown],user=replication_patroni_X DEBUG:  xmin required by slots: data 1447914, catalog 0
2024-12-15 18:26:13 [2958023] => [3-1] client=,db=DBNAME,user=[unknown] INFO:  synchronizing data
2024-12-15 18:26:15 [2958068] => [1-1] client=[local],db=[unknown],user=[unknown] LOG:  connection received: host=[local]
2024-12-15 18:26:15 [2958068] => [2-1] client=[local],db=DBNAME,user=pglogical_replica_pg15 LOG:  connection authorized: user=pglogical_replica_pg15 database=DBNAME application_name=DBNAME_prod_copy
2024-12-15 18:26:17 [474520] => [103-1] client=,db=,user= DEBUG:  archived write-ahead log file "00000004000007C6000000EF"
2024-12-15 18:26:19 [474520] => [104-1] client=,db=,user= DEBUG:  archived write-ahead log file "00000004000007C6000000F0"
2024-12-15 18:26:19 [2958120] => [1-1] client=,db=,user= DEBUG:  autovacuum: processing database "DBNAME"

Please, don't care the timestamps. I copied the result from two different attempts because I increased log level in the destination database later then in the source one.

Info about the replication slot:

ru_loans=# select * from pg_replication_slots where slot_name = '<slot_name>';
-[ RECORD 1 ]+-------------------------------------------
slot_name    | <slot_name>
plugin       | pglogical_output
slot_type    | logical
datoid       | 2480954
database     | DBNAME
active       | f
xmin         | 
catalog_xmin | 965109778
restart_lsn  | 3DA0/67867B18

pg_stat_replication:

select * from pg_stat_replication where usename = 'pglogical_replica_pg15';
-[ RECORD 1 ]----+-----------------------------
pid              | 9192
usesysid         | 221595586
usename          | pglogical_replica_pg15
application_name | DBNAME_prod_snap
client_addr      | X.X.X.X
client_hostname  | 
client_port      | 38018
backend_start    | 2024-12-15 15:26:11.39687+00
backend_xmin     | 965109886
state            | startup
sent_location    | 0/0
write_location   | 
flush_location   | 
replay_location  | 
sync_priority    | 0
sync_state       | async

The important detail is that I used the same method to copy the database of exact same size on another cluster and everything was fine. Small databases in the cluster in question are also copied well.

I was thinking that was a network issue so tuned some parameters related to TCP in sysctl and used IP address instead of hostname in create_subscription function but it didn't help.

I appreciate any help. Thank you!

@pensnarik
Copy link
Contributor Author

It turned out that it's a normal situation. The replication slot is inactive during initial data sync. It becomes active as soon as pglogical competes copying data and starts reading changes from the slot. It looks like there was a network issue during my first attempt so it didn't update data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant