Skip to content
derekcollison edited this page Jul 18, 2011 · 6 revisions

#Cluster Design Overview

Currently NATS is a single server design, which limits scalability, performance, and HA.

NATS will be extended to have multiple servers participate in a cluster. This will allow client failover, client sharding, and higher system performance with high availability guarantees.

#Design

The current system will be extended to allow routed connections between servers. These connections will direct the interest graph and data flow in an optimized fashion between servers.

The interest graph will model layer 2 semantics, and will forward all client subscription state from a given server. Depending on subject/topic semantics, mostly around wildcard matches, this can be optimized in the future. For distributed queues, the layer 2 semantics will help, and a double selection will occur for the given queue consumer on the originating server, and the routed server (if the consumer is remote).

The data graph will be pruned at the originating server to ensure that messages will only traverse to a routed server once, regardless of interest. The servers will still maintain a single subject list, but subscriptions and connections will be typed, similar to distributed queue subscriptions, to allow set operations before sending a message over a given routed connection. Client connections will receive multiple messages that match the subscription closures within the client.

Cluster topologies will be controlled by the route type: Full Mesh One-Hop, or Directed Acyclic Graph. Topologies should avoid cycles with the use of the multi-hop DAG route type. The first pass will have One-Hop semantics, which avoids any cycles by design and simplifies processing of received messages from routed servers. DAG will be added a future date, but is included here for completeness of the design.

The configuration of a given server will allow for active and passive connections to other servers for both security concerns and network issues. Each listed route will be actively solicited by the server, meaning the server will try to form a connection. Each server can also listen on a specified port for incoming connections from other routed servers. All connections however, once established, are full-duplex, and allow messages to traverse in both directions.

##Sample Config

In the following sample config, the NATS server will listen on port 4244 for incoming route connections. It will require authorization similar to client based authorization. It will also actively try to reach out and connect to two other servers listed in the routes section.

cluster:
  port: 4244

  authorization:
    user: route_user
    password: cafebabe
    token: deadbeef
    timeout: 1

  # These are actively connected from this server. Other servers
  # can connect to us if they supply the correct credentials from
  # above.

  routes:
    nats-route://foo:[email protected]:4220
    nats-route://foo:[email protected]:4221

#Security

Security for the routes will be similar to the client based authorization and timeouts. No secure information will be shared with connected clients, clients will be responsible for authorization for both explicit and implicit servers within the pool.

#Clients

Clients will now maintain a pool of potential servers. The servers can be explicitly listed and presented to the client in the initializer, or implicit, and will be shared from the connected server's knowledge of the cluster. The servers will add additional information to the INFO protocol message describing other active servers that clients will add to the potential server pool. Credential information will not be part of the information exchange between clients and servers, clients will be expected to have the correct credentials for all servers.

Older clients will continue to work as they do today, and will not process the additional information.

New clients will be able to specify multiple URIs for servers they wish to connect to. No preference will be computed apriori, and the new client will accept the old :uri option as it does today. The new option will be an array of URIs, designated by :uris.

NATS.start(:uris => [nats://user1:p1@localhost:4241, nats://user2:p2@localhost:4242]) 

When a new client detects a connection drop from the existing server, it can optionally try to connect to a new server that was presented to it under the INFO protocol, or explicitly presented in the initializer. INFO protocol messages can be sent by the server to all connected clients at any time, so the list of available implicit servers can change dynamically, which will effect the potential server pools maintained by the clients.

When a connection is lost, and can no longer be established, the client will remember the state and mark the connection internally to de-prioritize it.

Client reconnect logic will remain in effect, and is configurable, for established connections. However, reconnections will now traverse the potential server pool instead of simply trying to reconnect to the same URI.

#Rolling Authentication

Rolling authentication can be accomplished with the above scheme with 3 steps.

  1. Add additional servers into the cluster that are configured with the new client credentials that will need to be presented. These implicit servers will be added to the existing clients, but since no authorization information is passed to the clients, if they attempt to connect to the new servers they will fail, and those servers will be de-prioritized.

  2. Roll clients with the new credentials configured. This will refresh the explicit pool, and have the clients connect to the new servers. If a client attempts to connect to an older server, it will fail and the server will be de-priotized.

  3. Shutdown the servers that presented the original credentials.

#Future Work

In addition to DAG semantics for the cluster topologies, one could envision servers sending information to connected clients to properly balance the client load between all servers. This information would be sent in the INFO protocol as well, and essentially give a weighting to each server that the clients can use to re-order. There could be the introduction of additional protocols or information that would also allow a server to signal a disconnect request, due to internal metrics of the connected server.

Clone this wiki locally