The charm allows a 2-node cluster but it's not functional after a failover #570

nobuto-m · 2024-08-05T12:25:33Z

Steps to reproduce

Prepare a MAAS provider
deploy the charm with 2 units by following https://charmhub.io/postgresql/docs/h-scale
juju deploy postgresql --base [email protected] --channel 14/stable -n 2
take down the primary unit

Expected behavior

It's either:

keep functional after taking down one of the two units
or prevent a two-node cluster from being deployed by making juju status blocked by suggesting 3 units instead

Actual behavior

Similar topic with #566.

Juju status looks okay at a glance. However, the living unit doesn't say which unit is the primary at the moment.

$ juju status
Model     Controller            Cloud/Region       Version  SLA          Timestamp
postgres  mysunbeam-controller  mysunbeam/default  3.5.3    unsupported  12:17:40Z

App         Version  Status  Scale  Charm       Channel    Rev  Exposed  Message
postgresql  14.11    active    1/2  postgresql  14/stable  429  no       

Unit           Workload  Agent  Machine  Public address   Ports     Message
postgresql/0   unknown   lost   0        192.168.151.115  5432/tcp  agent lost, see 'juju show-status-log postgresql/0'
postgresql/1*  active    idle   1        192.168.151.116  5432/tcp  

Machine  State    Address          Inst id    Base          AZ       Message
0        down     192.168.151.115  machine-7  [email protected]  default  Deployed
1        started  192.168.151.116  machine-8  [email protected]  default  Deployed

Also, the action states the dead unit is the primary, which shouldn't be.

$ juju run postgresql/leader get-primary
Running operation 3 with 1 task
  - task 4 on unit-postgresql-1

Waiting for task 4...
primary: postgresql/0

The patroni's member list cannot be fetched since the quorum of the raft was lost.

$ juju ssh postgresql/1 -- sudo -u snap_daemon env PATRONI_LOG_LEVEL=DEBUG patronictl -c /var/snap/charmed-postgresql/current/etc/patroni/patroni.yaml list
2024-08-05 12:20:16,176 - DEBUG - Loading configuration from file /var/snap/charmed-postgresql/current/etc/patroni/patroni.yaml
2024-08-05 12:20:21,243 - INFO - waiting on raft
2024-08-05 12:20:26,243 - INFO - waiting on raft
2024-08-05 12:20:31,244 - INFO - waiting on raft
2024-08-05 12:20:36,244 - INFO - waiting on raft
2024-08-05 12:20:41,245 - INFO - waiting on raft
2024-08-05 12:20:46,245 - INFO - waiting on raft
2024-08-05 12:20:51,246 - INFO - waiting on raft
2024-08-05 12:20:56,247 - INFO - waiting on raft
^C
Aborted!
Connection to 192.168.151.116 closed.

On a side note, the raft support is deprecated in patroni upstream as of 3.0.0.
https://patroni.readthedocs.io/en/latest/releases.html#version-3-0-0

Versions

Operating system: jammy

Juju CLI: 3.5.3

Juju agent: 3.5.3

Charm revision: 14/stable 429

LXD: N/A

Log output

Juju debug log:
model_debug.log

Additional context

The text was updated successfully, but these errors were encountered:

github-actions · 2024-08-05T12:25:48Z

https://warthogs.atlassian.net/browse/DPE-5042

delgod · 2024-08-06T06:02:02Z

On a side note, the raft support is deprecated in patroni upstream as of 3.0.0.

Yes, the raft is not supported upstream, but it is supported and maintained by our Team for all our users (till some point in time).

nobuto-m · 2024-08-14T00:15:29Z

Looks like the upstream assumes two PostgreSQL nodes and one witness node. So my understanding is running the cluster only with two nodes is not supported.

https://patroni.readthedocs.io/en/latest/yaml_configuration.html#raft-deprecated

Q: It is possible to run Patroni and PostgreSQL only on two nodes?

A: Yes, on the third node you can run patroni_raft_controller (without Patroni and PostgreSQL). In such a setup, one can temporarily lose one node without affecting the primary.

taurus-forever · 2024-10-17T17:36:02Z

@7annaba3l we have one more reason to include this in the 25.04 roadmap.

nobuto-m added the bug Something isn't working label Aug 5, 2024

taurus-forever mentioned this issue Sep 12, 2024

Failed to retrieve the PostgreSQL version to initialise/update db-admin relation during failover #566

Closed

alex-ramanau mentioned this issue Nov 14, 2024

Add ability to configure external DCS #671

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The charm allows a 2-node cluster but it's not functional after a failover #570

The charm allows a 2-node cluster but it's not functional after a failover #570

nobuto-m commented Aug 5, 2024 •

edited

Loading

github-actions bot commented Aug 5, 2024

delgod commented Aug 6, 2024

nobuto-m commented Aug 14, 2024

taurus-forever commented Oct 17, 2024

The charm allows a 2-node cluster but it's not functional after a failover #570

The charm allows a 2-node cluster but it's not functional after a failover #570

Comments

nobuto-m commented Aug 5, 2024 • edited Loading

Steps to reproduce

Expected behavior

Actual behavior

Versions

Log output

Additional context

github-actions bot commented Aug 5, 2024

delgod commented Aug 6, 2024

nobuto-m commented Aug 14, 2024

taurus-forever commented Oct 17, 2024

nobuto-m commented Aug 5, 2024 •

edited

Loading