-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
raft test framework for homeobject #218
Conversation
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #218 +/- ##
==========================================
- Coverage 68.69% 63.95% -4.74%
==========================================
Files 30 32 +2
Lines 1581 1784 +203
Branches 163 193 +30
==========================================
+ Hits 1086 1141 +55
- Misses 408 546 +138
- Partials 87 97 +10 ☔ View full report in Codecov by Sentry. |
@@ -37,6 +37,19 @@ class BitsGenerator { | |||
} | |||
|
|||
static void gen_random_bits(sisl::blob& b) { gen_random_bits(b.size(), b.bytes()); } | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why dont we add blob_it into parameters? if default value(not provided) use random device otherwise use blob_id to initialise .
sync_for(cleanup_start_count_, repl_test_phase_t::CLEANUP, num_members); | ||
} | ||
|
||
void sync_for_uint64_id(uint32_t max_count = 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a very bad naming
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is used for sync shard_id or blob_id among different replicas? actually, I wrote two functions for shard_id and blob_id , but they are almost the same, the only different is shard_id_t and blob_id_t. so I merge them into this one.
do you have any suggestion of a better name ^v^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
} | ||
ShardInfo seal_shard(shard_id_t shard_id) { | ||
// before seal shard, we need to wait all the memebers to complete shard state verification | ||
g_helper->sync_for_verify_start(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is very wired, why we can only do seal_shard in verify stage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can not use sync_for_verify_start or sync_for_test_start twice without a different sync type between them.
sync_for_verify_start() can be replaced by sync_for_test_start(). what I want is just a sync point.
will try to create another pr to change this to a more unified sync point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this
9ef1875
to
6656058
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some limitations in current implementation,
-
we cannot do multi-thread put_blob as the implementation based on blob_id prediction, i.e it generate blob_data based on predicted_blob_id, if the blob get assign another blob id it will fail either in write phase or in verify phase.
-
The sync_for_sth() is not friendly and be putted into some common place like seal_shard(). The fundamental problem is we need to sync before/after each write operations but we dont well implement that.
Jie and I had an offline discussion regarding whether we can use IPC for data sharing across replicas, i.e leader can put an array of {blobid, digest} to IPC and follower consume that. I think it is a solution which simplify the implementation and make the code cleaner, as well as support concurrent write. But there are surely other solutions can solve the problem in another way. This PR moves all existing UTs into multi process version which is a nice step forward, so I dont want to block this for personal preference.
Would like to hear more from the user of this UT framework @sanebay and @Hoolu, if the UT in your case can be well supported.
3cf48da
to
6656058
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
} | ||
|
||
void TearDown() override { app->clean(); } | ||
// schedule create_pg to replica_num | ||
void create_pg(pg_id_t pg_id, uint32_t replica_num = 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need create_pg with replica_num , isnt it executed only on leader and other replica's will get it via raft.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HSHomeObject::_create_pg will first create a repl_dev, then replicate a create_pg message across the raft group(repl_dev). only until we have a repl_dev , we can have a leader.
so, in homeobject level, before we create a pg(HSHomeObject::_create_pg is called), we do not have a repl_dev for this pg and thus we do not have a leader. here, replica_num
means which replica will be the leader when creating repl_dev of this pg
Can you add another PR to add support for spare replicas. This is needed for PG move. |
ok, will do |
pls do a squash merge with some commit msg |
1 add a raft test framework for homeobject , which will enable 3-replica raft-based test.
2 move all the current UT from single replica to 3 replicas raft-based
3 find and fix a blob sequence number bug after involving the new test framework.