Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

raft test framework for homeobject #218

Merged
merged 3 commits into from
Nov 1, 2024

Conversation

JacksonYao287
Copy link
Collaborator

@JacksonYao287 JacksonYao287 commented Oct 23, 2024

1 add a raft test framework for homeobject , which will enable 3-replica raft-based test.
2 move all the current UT from single replica to 3 replicas raft-based
3 find and fix a blob sequence number bug after involving the new test framework.

@codecov-commenter
Copy link

codecov-commenter commented Oct 23, 2024

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 75.00000% with 1 line in your changes missing coverage. Please review.

Project coverage is 63.95%. Comparing base (acb04e8) to head (2490a3c).
Report is 24 commits behind head on main.

Files with missing lines Patch % Lines
src/lib/homestore_backend/hs_blob_manager.cpp 50.00% 1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #218      +/-   ##
==========================================
- Coverage   68.69%   63.95%   -4.74%     
==========================================
  Files          30       32       +2     
  Lines        1581     1784     +203     
  Branches      163      193      +30     
==========================================
+ Hits         1086     1141      +55     
- Misses        408      546     +138     
- Partials       87       97      +10     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

src/lib/homestore_backend/hs_http_manager.cpp Outdated Show resolved Hide resolved
@@ -37,6 +37,19 @@ class BitsGenerator {
}

static void gen_random_bits(sisl::blob& b) { gen_random_bits(b.size(), b.bytes()); }

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why dont we add blob_it into parameters? if default value(not provided) use random device otherwise use blob_id to initialise .

src/lib/homestore_backend/tests/homeobj_fixture.hpp Outdated Show resolved Hide resolved
sync_for(cleanup_start_count_, repl_test_phase_t::CLEANUP, num_members);
}

void sync_for_uint64_id(uint32_t max_count = 0) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a very bad naming

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is used for sync shard_id or blob_id among different replicas? actually, I wrote two functions for shard_id and blob_id , but they are almost the same, the only different is shard_id_t and blob_id_t. so I merge them into this one.

do you have any suggestion of a better name ^v^

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

}
ShardInfo seal_shard(shard_id_t shard_id) {
// before seal shard, we need to wait all the memebers to complete shard state verification
g_helper->sync_for_verify_start();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is very wired, why we can only do seal_shard in verify stage?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can not use sync_for_verify_start or sync_for_test_start twice without a different sync type between them.
sync_for_verify_start() can be replaced by sync_for_test_start(). what I want is just a sync point.

will try to create another pr to change this to a more unified sync point

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this

src/lib/homestore_backend/tests/homeobj_fixture.hpp Outdated Show resolved Hide resolved
Copy link
Collaborator

@xiaoxichen xiaoxichen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some limitations in current implementation,

  1. we cannot do multi-thread put_blob as the implementation based on blob_id prediction, i.e it generate blob_data based on predicted_blob_id, if the blob get assign another blob id it will fail either in write phase or in verify phase.

  2. The sync_for_sth() is not friendly and be putted into some common place like seal_shard(). The fundamental problem is we need to sync before/after each write operations but we dont well implement that.

Jie and I had an offline discussion regarding whether we can use IPC for data sharing across replicas, i.e leader can put an array of {blobid, digest} to IPC and follower consume that. I think it is a solution which simplify the implementation and make the code cleaner, as well as support concurrent write. But there are surely other solutions can solve the problem in another way. This PR moves all existing UTs into multi process version which is a nice step forward, so I dont want to block this for personal preference.

Would like to hear more from the user of this UT framework @sanebay and @Hoolu, if the UT in your case can be well supported.

@JacksonYao287 JacksonYao287 force-pushed the test-framework branch 2 times, most recently from 3cf48da to 6656058 Compare October 26, 2024 05:43
Copy link
Contributor

@sanebay sanebay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

src/lib/homestore_backend/hs_http_manager.cpp Outdated Show resolved Hide resolved
}

void TearDown() override { app->clean(); }
// schedule create_pg to replica_num
void create_pg(pg_id_t pg_id, uint32_t replica_num = 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need create_pg with replica_num , isnt it executed only on leader and other replica's will get it via raft.

Copy link
Collaborator Author

@JacksonYao287 JacksonYao287 Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HSHomeObject::_create_pg will first create a repl_dev, then replicate a create_pg message across the raft group(repl_dev). only until we have a repl_dev , we can have a leader.

so, in homeobject level, before we create a pg(HSHomeObject::_create_pg is called), we do not have a repl_dev for this pg and thus we do not have a leader. here, replica_num means which replica will be the leader when creating repl_dev of this pg

@JacksonYao287 JacksonYao287 requested a review from sanebay October 31, 2024 13:09
@sanebay
Copy link
Contributor

sanebay commented Oct 31, 2024

Can you add another PR to add support for spare replicas. This is needed for PG move.
https://github.com/eBay/HomeStore/blob/master/src/tests/test_common/hs_repl_test_common.hpp#L41

@JacksonYao287
Copy link
Collaborator Author

Can you add another PR to add support for spare replicas. This is needed for PG move.

ok, will do

@xiaoxichen
Copy link
Collaborator

pls do a squash merge with some commit msg

@JacksonYao287 JacksonYao287 merged commit a90a20f into eBay:main Nov 1, 2024
25 checks passed
@JacksonYao287 JacksonYao287 deleted the test-framework branch November 1, 2024 00:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants