Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-voters and automatic promotion #375

Merged
merged 8 commits into from
Sep 25, 2023
Merged

Conversation

illotum
Copy link
Contributor

@illotum illotum commented May 15, 2023

This is a draft.

My take on #44, mostly a continuation of design discussed there. Some remaining (low-stakes) questions are left in italics, but they will not change much.

Notably, a departure from previous designs, the feature is completely opt-in. Both server start and member addition need be extended according to this example:

UId = ra:new_uid(ClusterName),
New = #{id => Id, membership => promotable, uid => UId},
ok = ra:start_server(default, ClusterName, New, add_machine(), [ServerRef]),
{ok, _, _} = ra:add_member(ServerRef, New),

Voter status

Voter status is stored in the cluster map of the server state and is part of every $ra_cluster_update. Additionally, nodes store their own status at the top level for ease of matching.

Nodes also store their own satus in ra_state ETS table (breaking change), and present in overview.

Effect on Raft

Voters behave as before.

Non-voters ignore election timeouts and do not attempt to vote for others. Consensus calculation takes this into account, and non-voters have no effect on quorum size.

Transitions

ra:start_cluster is unchanged, for new cluster of nonvoters will never elect a leader.

ra:add_member and ra:start_server now accept a map of server id and desired voter status. Legacy calls default to adding new server as a voter.

On every #append_entries_reply leader may choose to promote non-voter by issuing new $ra_join with desired voter status. Currently, only one promotion condition is implemented: non-voter will be promoted when it reaches the leaders log index at the time of joining.

@kjnilsson
Copy link
Contributor

This looks very promising. Initial thought is that we shouldn't need a new command for this and that this should be the behaviour of all ra_join commands. New commands, although ok, can't be understood by old members if changes are made during a live upgrade so it is preferable to avoid adding new API unless absolutely necessary.

@illotum
Copy link
Contributor Author

illotum commented May 18, 2023

Update after talking to @kjnilsson elsewhere. Removes voter promotion from the log, instead it is based on a single static target the nonvoter has to reach. Non-voters behave like regular followers, but are treated differently:

  1. Leader doesn't count non voters when it computes required majority to raise commit index. This also means the newly-added members do not block cluster from progressing as is visible in certain tests I had to edit.
  2. Pre vote requests from non-voters are ignored.
  3. Pre vote and election responses from non voters are allowed (there's no way to tell who the ack is from). For this reason they also count towards required majority.

ra:member_overview reports a list of "voters" if requested from the leader. I kept ra_voters() type to allow for future extension. The tuple key, {matching, Target} is subject to discussion ofc.

@luos
Copy link

luos commented May 26, 2023

Thank you @illotum for working on this. Seems we will be able to add our stuff on top of it when time comes. :)

One thing not clear for me, is why is there a need for automatic promotion? For a RabbitMQ use case, I would imagine the benefits to be not that big as the cluster does not change very often?

@illotum
Copy link
Contributor Author

illotum commented May 26, 2023

@luos, at large enough scale nodes are lost regularly. I want to change the "quorum critical" report to acknowledge that the newly joined nodes are not ready, and restarting another member (out of three) will block quorum progress.


After talking to Karl I am reverting the "minimalistic" version of the feature. It is still incomplete, but will be continued with the original design of tracking voter status in the log.

@illotum
Copy link
Contributor Author

illotum commented Jun 14, 2023

Taking the WIP off. It is unlikely polished, but deserves a review.

Some notes and concerns:

Voter state is part of the Raft log, within the cluster membership record. It is also tracked at a higher level purely for convenience in handling RPCs. This is a fairly superficial improvement compared to the effort and risk of confusion. Totally willing to forgo it.

Non-voting followers do not attempt to get elected, and ignore calls for election. All done within ra_server limits. Certain election timeouts are still kicked off in ra_server_proc, which is a waste.

Their match_index is not included in commit computation, yet non-voters are listed in cluster members. This may be confusing to consumers, e.g. when newly added server doesn't block progress at first.

Voter status is reported in overview. My biggest concern, as RabbitMQ instead directly reads ra_state table when it reports quorum critical. Issuing overview RPCs feels prohibitively expensive in comparison.

Overall, most of the issues above would be addressed if I move this change into ra_server_proc and add new state(s). This comes with its own tradeoffs, e.g. code duplication with the follower state. I am prototyping this on a separate branch.


Updated transition diagram:

      ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ 
─────▷  voter => {no, Condition} │
      └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ 
                    │                                                  
                    │                                                  
                    ▽                    ▷ $ra_join (now can be used to update voter status)
      ┌──────────────────────────┐       ▶ initial_cluster_members                   
─────▶│       voter => yes       │
      └──────────────────────────┘                                     

@illotum illotum changed the title WIP Non-voters and automatic promotion Non-voters and automatic promotion Jun 14, 2023
Copy link
Contributor

@kjnilsson kjnilsson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, some formatting and nitpicks only.

src/ra.hrl Outdated Show resolved Hide resolved
src/ra.hrl Outdated Show resolved Hide resolved
src/ra_server.erl Outdated Show resolved Hide resolved
src/ra_server.erl Outdated Show resolved Hide resolved
src/ra_server.erl Outdated Show resolved Hide resolved
src/ra_server.erl Outdated Show resolved Hide resolved
src/ra_server.erl Outdated Show resolved Hide resolved
src/ra_server.erl Outdated Show resolved Hide resolved
src/ra_server.erl Outdated Show resolved Hide resolved
src/ra_server.erl Outdated Show resolved Hide resolved
@illotum illotum force-pushed the nonvoters branch 2 times, most recently from 4fc73bd to 484c209 Compare June 30, 2023 06:08
src/ra_server.erl Outdated Show resolved Hide resolved
src/ra_server.erl Outdated Show resolved Hide resolved
test/ra_SUITE.erl Outdated Show resolved Hide resolved
test/ra_SUITE.erl Outdated Show resolved Hide resolved
test/ra_SUITE.erl Outdated Show resolved Hide resolved
@illotum illotum force-pushed the nonvoters branch 5 times, most recently from 1073dd4 to 556f009 Compare July 13, 2023 18:49
@illotum illotum force-pushed the nonvoters branch 3 times, most recently from 250e700 to 3bede59 Compare August 21, 2023 22:45
@illotum illotum requested a review from kjnilsson August 22, 2023 05:04
@illotum illotum force-pushed the nonvoters branch 3 times, most recently from cae0448 to 33bdbe0 Compare August 23, 2023 17:48
@kjnilsson kjnilsson added this to the 2.7.0 milestone Sep 5, 2023
@illotum illotum force-pushed the nonvoters branch 3 times, most recently from e6c474f to c681c6b Compare September 12, 2023 00:31
@kjnilsson kjnilsson modified the milestones: 2.7.0, 2.8.0 Sep 12, 2023
Copy link
Contributor

@kjnilsson kjnilsson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. I have a few smaller suggestions.

I will start testing this PR with RabbitMQ

src/ra_server.erl Outdated Show resolved Hide resolved
src/ra_server.erl Outdated Show resolved Hide resolved
src/ra_server.erl Outdated Show resolved Hide resolved
src/ra_server.erl Outdated Show resolved Hide resolved
src/ra_server.erl Outdated Show resolved Hide resolved
src/ra.erl Outdated Show resolved Hide resolved
src/ra_server_proc.erl Outdated Show resolved Hide resolved
@kjnilsson
Copy link
Contributor

@illotum hi - I think there are both test and dialyzer failures but the checks pass as the tests aren't actually run atm. Would you mind rebasing from main which should pick up the relevant config to allow the PR ci suite to actually run.

@michaelklishin
Copy link
Member

michaelklishin commented Sep 19, 2023

@illotum @kjnilsson and I seem to agree that using the "regular" (existing) Ra node identity for non-voting nodes should be fine. We can reuse it or even consider dropping the extra/new record field.

Alex Valiushko added 3 commits September 19, 2023 09:38
…p with the log

An opt-in ability of a cluster to ignore newly joined member until it catches
up with the log:

    New = #{id => Id, ini_non_voter => ra:new_nvid()},
    {ok, _, _} = ra:add_member(ServerRef, New),
    ok = ra:start_server(default, ClusterName, New, add_machine(), [ServerRef]),

Voter status is stored in the cluster map of the server state and is part of
every $ra_cluster_update. Additionally, nodes store their own status at the top
level for ease of matching. Nodes also store their own satus in ra_state ETS
table (breaking change), and present in overview.

On every #append_entries_reply leader may choose to promote non-voter by
issuing new `$ra_join` with desired voter status. Currently, only one promotion
condition is implemented `{nonvoter, #{target := ra_index()}`. Non-voter will
be promoted when it reaches the leaders log index at the time of joining.
@illotum
Copy link
Contributor Author

illotum commented Sep 20, 2023

Rebased and pushed the uid change. It is one of those small things that cascade widely, but a sum PR diff became saner IMO. Notably, local state and ETS now only hold non_voter :: boolean(), and that's what various reporting tools expose.

It is briefly tested with ra-2.7.0 branch of rabbit. Will deal with dialyzer tomorrow.

Copy link
Contributor

@kjnilsson kjnilsson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have done some testing in RabbitMQ with this Pr and so far it all looks good now.

Some questions and suggestions included which we need to address.

src/ra.erl Outdated Show resolved Hide resolved
src/ra.erl Outdated Show resolved Hide resolved
src/ra.hrl Outdated Show resolved Hide resolved
src/ra.hrl Outdated Show resolved Hide resolved
src/ra.hrl Outdated Show resolved Hide resolved
src/ra.hrl Outdated Show resolved Hide resolved
src/ra_server.erl Outdated Show resolved Hide resolved
@kjnilsson kjnilsson merged commit 2863dbb into rabbitmq:main Sep 25, 2023
5 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants