Merge pull request juju#17979 from manadart/dqlite-ha-doc

juju#17979 This venerable document, out of date even for Mongo, now reflects HA in the Dqlite world. ## QA steps None required. ## Documentation changes This is one. ## Links **Jira card:** [JUJU-4997](https://warthogs.atlassian.net/browse/JUJU-4997) [JUJU-4997]: https://warthogs.atlassian.net/browse/JUJU-4997?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ
manadart · Sep 10, 2024 · f0e2f85 · f0e2f85
2 parents 8a50fcd + e0aef60
commit f0e2f85
Showing 1 changed file with 83 additions and 110 deletions.
diff --git a/doc/high_availability.md b/doc/high_availability.md
@@ -1,110 +1,83 @@
-High Availability (HA)
-======================
-
-
-High Availability in general terms means that we have 3 or more (up to 7) 
-State Machines, each one of which can be used as the master.
-
-This is an overview of how it works:
-
-### Mongo
-_Mongo_ is always started in [replicaset mode](http://docs.mongodb.org/manual/replication/).
-
- If not in HA, this will behave as if it were a single mongodb and, in practical 
-terms there is no difference with a regular setup.
-
-### Voting
-
-A voting member of the replicaset is a one that has a say in which member is master.
-
-A non-voting member is just a storage backup.
-
-Currently we don't support non-voting members; instead when a member is non-voting it
-means that said controller is going to be removed entirely.
-
-### Ensure availability
-
-There is a `ensure-availabiity` command for juju, it takes `-n` (minimum number
- of state machines) as an optional parameter; if it's not provided it will 
-default to 3.
-
- This needs to be an odd number in order to prevent ties during voting.
-
- The number cannot be larger than seven (making the current possibilities: 3, 
-5 and 7) due to limitations of mongodb, which cannot have more than 7
-replica set voting members.
-
- Currently the number can be increased but not decreased (this is planned). 
-In the first case Juju will bring up as many machines as necessary to meet the 
-requirement; in the second nothing will happen since the rule tries to have 
-_"at least that many"_
-
- At present there is no way to reduce the number of machines, you can kill by 
-hand enough machines to reduce to a number you need, but this is risky and 
-**not recommended**. If you kill less than half of the machines (half+1
-remaining) running `enable-ha` again will add more machines to 
-replace the dead ones. If you kill more there is no way to recover as there 
-are not enough voting machines.
-
- The EnableHA API call will report will report the changes that it 
-made to the model, which will shortly be reflected in reality
-### The API 
-
- There is an API server running on all State Machines, these talk to all
-the peers but queries and updates are addressed to the mongo master instance.
-
- Unit and machine agents connect to any of the API servers, by trying to connect
-to all the addresses concurrently, but not simultaneously. It starts to try each
-address in turn after a short delay. After a successful connection, the
-connected address will be stored; it will be tried first when next connecting.
-
-### The peergrouper worker:
-
- It looks at the current state and decides what the peergroup members should 
-look like and continually tries to maintain those members.
-
- The reason for its existence is that it can often take a while for mongo to 
-allow a peer group change, so we can't change it directly in the 
-EnableHA API call
-
- Its worker loop continally watches 
-
- 1. The current set of controllers 
- 2. The addresses of the current controllers 
- 3. The status of the current mongo peergroup
-
-It feeds all that information into `desiredPeerGroup`, which provides the peer 
-group that we want to be and continually tries to set that peer group in mongo 
-until it succeeds.
-
-**NOTE:** There is one situation which currently doesn't work which is 
-that if you've only got one controller, you can't switch to another one.
-
-### The Singleton Workers
-
-**Note:** This section reflects the current behavior of these workers but 
-should by no means be taken as an example to follow since most (if not all)
-should run concurrently and are going to change in the near future.
-
-The following workers require only a single instance to be running
-at any one moment:
-
- * The environment provisioner
- * The firewaller
- * The charm revision updater
- * The state cleaner
- * The transaction resumer
- * The minunits worker
-
-When a machine agent connects to the state, it decides whether
-it is on the same instance as the mongo master instance, and
-if so, it runs the singleton workers; otherwise it doesn't run them.
-
-Because we are using `mgo.Strong` consistency semantics,
-it's guaranteed that our mongo connection will be dropped
-when the master changes, which means that when the
-master changes, the machine agent will reconnect to the
-state and choose whether to run the singleton workers again.
-
-It also means that we can never accidentally have two
-singleton workers performing operations at the same time.
+# Controller high availability (HA)
+
+See first: [Juju user docs | How to make a controller highly available]
+
+This document details controller and agent behaviour when running controllers 
+in
+HA mode.
+
+## Dqlite
+
+Each controller is a [Dqlite] node. The `dbaccessor` worker on each controller is 
+responsible for maintaining the Dqlite cluster. When entering HA mode, the 
+`dbaccessor` worker will configure the local Dqlite node as a member of the 
+cluster.
+
+When starting Dqlite, the worker must bind it to an IP address. The address is 
+read from the controller configuration file populated by the controller charm. 
+If there is no address to use for binding, the worker will wait for one to be
+written to the file before attempting to join the cluster. 
+See _Controller Charm_ below.
+
+Each Dqlite node has a role within the cluster. Juju does not manage node 
+roles; this is handled within Dqlite itself. A cluster is constituted by:
+- one _leader_ to which all database reads and writes are redirected,
+- up to two other _voters_ that participate in leader elections,
+- _stand-bys_; and
+- _spares_.
+
+If the number of controller instances is reduced to one, the `dbaccessor` 
+worker detects this scenario and reconfigures the cluster with the local node 
+as the only member.
+
+## Controller charm
+
+The controller charm propagates bind addresses to the `dbaccessor` worker by 
+writing them to the controller configuration file. Each controller unit shares 
+its resolved bind address with the other units via the `db-cluster` peer 
+relation. The charm must be able to determine a unique address in the 
+local-cloud scope before it is shared with other units and written to the 
+configuration file. If no unique address can be determined, the user must supply 
+an endpoint binding for the relation using a space that ensures a unique IP 
+address.
+
+## API addresses for agents
+
+When machines in the control plane change,  the `api-address-updater` worker
+for each agent re-writes the agent's configuration file with usable API 
+addresses from all controllers. Agents will try these addresses in random order
+until they establish a successful controller connection.
+
+The list of addresses supplied to agent configuration can be influenced by the
+`juju-mgmt-space` controller configuration value. This is supplied with a space
+name so that agent-controller communication can be isolated to specific 
+networks.
+
+## API addresses for clients
+
+Each time the Juju client establishes a connection to the Juju controller, the
+controller sends the current list of API addresses and the client updates these 
+in the local store. The client's first connection attempt is always to the last 
+address that it used successfully. Others are tried subsequently if required.
+
+Addresses used by clients are not influenced by the `juju-mgmt-space` 
+configuration.
+
+## Single instance workers
+
+Many workers, such as the `dbaccessor` worker, run on all controller instances,
+but there are some workers that must run on exactly one controller instance. 
+An obvious example of this is a model's compute provisioner - we would never 
+want more than one actor attempting to start a cloud instance for a new 
+machine.
+
+Single instance workers are those declared in the model manifolds configuration
+that use the `isResponsible` decorator. This in turn is based on a flag set by the
+`singular` worker.
+
+The `singular` worker only sets the flag if it is the current lease holder for 
+the `singular-controller` namespace. See the appropriate documentation for more 
+information on leases.
+
+[Juju user docs | How to make a controller highly available]: https://juju.is/docs/juju/manage-controllers#heading--make-a-controller-highly-available
+[Dqlite]: https://dqlite.io/