how to improve high-availability when waitTillLeaderIsReadyOrStepDown #73

jackjoesh · 2022-04-01T15:08:15Z

If the leader needs to be really ready, it needs to apply all business data successfully after the election.
During this period of time, the leader cannot actually provide services. In Gringofts, the method waitTillLeaderIsReadyOrStepDown will be waiting.
If a lot of data needs to be applied, it will wait for a long time.
Does Gringofts have some high-availability solutions that can minimize the time when services are unavailable?
Thank you.

huyumars · 2022-04-01T16:37:18Z

Hi thanks for your interests.
This step wait for two things, one is leader on stage and the second is applying the the lasted log.

The first one is very quick if Raft cluster is health.
For the second one,
- if the instance is just startup, it might take time to recovery state from rocksdb and apply rest logs to the latest state.
- If it is from follower to leader, we have a little trick. We have two single-thread loop, one is CPL for command processing which decodes command to event, then commits events to raft layer. Other is EAL for event applying, which applies events from raft log entry (committed) to maintain a latest state. For leader it has both two, and followers only have EAL. When a follower becomes a new leader it can directly use the EAL built state machine as its latest state. So the switch is also fast.

jackjoesh · 2022-04-02T02:26:47Z

Hi thanks for your interests. This step wait for two things, one is leader on stage and the second is applying the the lasted log.

The first one is very quick if Raft cluster is health.

For the second one,

if the instance is just startup, it might take time to recovery state from rocksdb and apply rest logs to the latest state.

If it is from follower to leader, we have a little trick. We have two single-thread loop, one is CPL for command processing which decodes command to event, then commits events to raft layer. Other is EAL for event applying, which applies events from raft log entry (committed) to maintain a latest state. For leader it has both two, and followers only have EAL. When a follower becomes a new leader it can directly use the EAL built state machine as its latest state. So the switch is also fast.

Thank you for your quickly apply. Yes, I'm talking about the second one.
If we restartup the follower instance for release, and at this time, there happened to be a problem with the leader, and the currently restarted follower became the leader by voting. But this new leader may take a long time to load state from rocksdb snapshot apply rest logs to the latest state.
In this situation, what should we do?

huyumars · 2022-04-02T09:03:07Z

Our state machine takes advantage of rocksdb. EAL only write rocksdb, and CPL read from memory cache and same shared rockdb.
You can image rocksdb state is the latested commited state, which also is the EAL state. And CPL state is the rocksdb state + uncommited command state in memory.
So when follower becomes to new leader, The swap processing of EAL and CPL is like CPL start to write data in memory on the base of lasted commited state in rocksdb, and EAL still write to the same rocksdb. So both of some are latest. And the recover is super fast

jackjoesh · 2022-04-05T15:14:47Z

Our state machine takes advantage of rocksdb. EAL only write rocksdb, and CPL read from memory cache and same shared rockdb. You can image rocksdb state is the latested commited state, which also is the EAL state. And CPL state is the rocksdb state + uncommited command state in memory. So when follower becomes to new leader, The swap processing of EAL and CPL is like CPL start to write data in memory on the base of lasted commited state in rocksdb, and EAL still write to the same rocksdb. So both of some are latest. And the recover is super fast

Thank you, I get your design point.
Because we want to store idempotent and original request datas (one update may store N fund items) in rocksdb, so data is very large. I think very large data will affect the frequency of snapshots, resulting in a very large amount of data that needs to be applied on apply events. Finally, when follower to leader, the apply of EAL will also be very slow.
Do your actual business store idempotent and original request data in EAL rocksdb? how to solve this problem, thank you！

huyumars · 2022-04-07T12:28:02Z

In practice, there is no extra EAL event to apply, since EAL should catch up with commited index.
Yes we have PROD system using are using gringofts, that's why we open source it and maintain it. This design has already be proven in our system.

jackjoesh · 2022-04-11T08:33:31Z

thank you. I get it

jackjoesh · 2022-04-12T03:50:28Z

In practice, there is no extra EAL event to apply, since EAL should catch up with commited index. Yes we have PROD system using are using gringofts, that's why we open source it and maintain it. This design has already be proven in our system.

Last question, how max tps can support in gringofts single group（just single group, not multi group）? Thank you

jackyjia · 2022-04-15T01:51:39Z

Our state machine takes advantage of rocksdb. EAL only write rocksdb, and CPL read from memory cache and same shared rockdb. You can image rocksdb state is the latested commited state, which also is the EAL state. And CPL state is the rocksdb state + uncommited command state in memory. So when follower becomes to new leader, The swap processing of EAL and CPL is like CPL start to write data in memory on the base of lasted commited state in rocksdb, and EAL still write to the same rocksdb. So both of some are latest. And the recover is super fast

Thank you, I get your design point. Because we want to store idempotent and original request datas (one update may store N fund items) in rocksdb, so data is very large. I think very large data will affect the frequency of snapshots, resulting in a very large amount of data that needs to be applied on apply events. Finally, when follower to leader, the apply of EAL will also be very slow. Do your actual business store idempotent and original request data in EAL rocksdb? how to solve this problem, thank you！

the apply function should execute quickly, and it can execute quickly, since most heavy stuff has been handled in the process function, the apply just apply the processed result.

jackyjia · 2022-04-15T01:53:48Z

thank you. I get it

Just curious, which industry are you in and what's your use case?

jackyjia · 2022-04-15T01:56:16Z

In practice, there is no extra EAL event to apply, since EAL should catch up with commited index. Yes we have PROD system using are using gringofts, that's why we open source it and maintain it. This design has already be proven in our system.

Last question, how max tps can support in gringofts single group（just single group, not multi group）? Thank you

It depends on several factors:

how complicated the process logic is
request payload
network bandwidth
cluster setup: usually more instances in the cluster, lower the throughput

In our production, tps is around ~8K.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to improve high-availability when waitTillLeaderIsReadyOrStepDown #73

how to improve high-availability when waitTillLeaderIsReadyOrStepDown #73

jackjoesh commented Apr 1, 2022

huyumars commented Apr 1, 2022

jackjoesh commented Apr 2, 2022

huyumars commented Apr 2, 2022

jackjoesh commented Apr 5, 2022 •

edited

Loading

huyumars commented Apr 7, 2022

jackjoesh commented Apr 11, 2022

jackjoesh commented Apr 12, 2022

jackyjia commented Apr 15, 2022

jackyjia commented Apr 15, 2022 •

edited

Loading

jackyjia commented Apr 15, 2022

how to improve high-availability when waitTillLeaderIsReadyOrStepDown #73

how to improve high-availability when waitTillLeaderIsReadyOrStepDown #73

Comments

jackjoesh commented Apr 1, 2022

huyumars commented Apr 1, 2022

jackjoesh commented Apr 2, 2022

huyumars commented Apr 2, 2022

jackjoesh commented Apr 5, 2022 • edited Loading

huyumars commented Apr 7, 2022

jackjoesh commented Apr 11, 2022

jackjoesh commented Apr 12, 2022

jackyjia commented Apr 15, 2022

jackyjia commented Apr 15, 2022 • edited Loading

jackyjia commented Apr 15, 2022

jackjoesh commented Apr 5, 2022 •

edited

Loading

jackyjia commented Apr 15, 2022 •

edited

Loading