Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support spare replicas in raft test framework #226

Merged
merged 5 commits into from
Nov 15, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions src/lib/homestore_backend/tests/homeobj_fixture.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -373,6 +373,14 @@ class HomeObjectFixture : public ::testing::Test {
return false;
}

// wait for the last blob to be created locally, which means all the blob before this blob are created
void wait_for_all(shard_id_t shard_id, blob_id_t blob_id) {
while (true) {
if (blob_exist(shard_id, blob_id)) return;
std::this_thread::sleep_for(1s);
}
}

private:
bool pg_exist(pg_id_t pg_id) {
std::vector< pg_id_t > pg_ids;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ TEST_F(HomeObjectFixture, ReplaceMember) {
std::map< pg_id_t, blob_id_t > pg_blob_id;
for (shard_id_t shard_id = 1; shard_id <= num_shards_per_pg; shard_id++) {
auto derived_shard_id = make_new_shard_id(pg_id, shard_id);
pg_shard_id_vec[1].emplace_back(derived_shard_id);
pg_shard_id_vec[pg_id].emplace_back(derived_shard_id);
}

// TODO:: if we add delete blobs case in baseline resync, we need also derive the last blob_id in this pg for spare
Expand All @@ -73,12 +73,15 @@ TEST_F(HomeObjectFixture, ReplaceMember) {
ASSERT_TRUE(r);
});

// the new member should wait until it joins the pg
// the new member should wait until it joins the pg and all the blobs are replicated to it
if (in_member_id == g_helper->my_replica_id()) {
while (!am_i_in_pg(pg_id)) {
std::this_thread::sleep_for(std::chrono::milliseconds(1000));
LOGINFO("new member is waiting to become a member of pg {}", pg_id);
}

wait_for_all(pg_shard_id_vec[pg_id].back() /*the last shard id in this pg*/,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as discussed, this doesnt guarantee to be right once baseline resync is on , it doesnt sync follow the LSN order.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but for now, we can not do this , since if a member is not leader , it can not get its own commit lsn
see

// replication_status can be empty in follower

and
https://github.com/eBay/nuraft_mesg/blob/a277e8acae561a4b3873337bb179e32817f10b85/src/lib/repl_service_ctx.cpp#L92

will implement a new wait_for_all as soon as we can get commit_lsn at follower
cc @sanebay

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, it makes sense to add self lsn to repl_service_ctx::get_raft_status().... When I implement this function I didnt realize we have this requirement./

num_shards_per_pg * num_blobs_per_shard - 1 /*the last blob id in this pg*/);
}

// step 4: after completing replace member, verify the blob on all the members of this pg including the newly added
Expand Down
Loading