SDSTOR-11601 add create_shard implementation #34

zichanglai · 2023-09-01T10:18:02Z

this is the draft version of create_shard implementation in homeobject and can not build successfully until ReplDev in homestore is finalized.

src/lib/homestore/homeobject.cpp

JacksonYao287

do we miss the implement on_precommit and async_alloc_write for now, and will be implemented later?

src/lib/homeobject_impl.hpp

src/lib/homestore/homeobject.cpp

src/lib/homestore/shard_manager.cpp

JacksonYao287 · 2023-09-08T02:51:55Z

src/lib/homestore/shard_manager.cpp

+    pg_iter->second.shards.push_back(shard_info);
+
+    //following part is must for follower members or when member is restarted;
+    auto sequence_num = get_sequence_num_from_shard_id(shard_info->id);


if the follower apply all the logs correctly, will this happen? or it is just a way to keep safe here

another question, should we flush the pg_sequence_number to metadataservice or somewhere else to make sure will will get the latest sequence_number when restarting, or just set the largest sequence_num(get from shardid)of all the recovered shardid as the latest sequence_number of this pg

1.follower will happens, shard is generated from the leader side and sequence num s increased only in leader side, this gives the chance for follower to catch up sequence num with leader.
2. it is better to flush pg_sequence_number to disk but it will cause extra IO and I am trying to avoid it by recovering the lastest sequence num from meta blks, as all commited shard creation is already persisted in meta blks. there are still another two options for this from my side: using raft LSN or combined with (raft term , local sequence num) to be new proposal shard id

i prefer lastest sequence num from meta blks, that is the source of truth

that is the current implementation behaviors.

src/lib/homestore/shard_manager.cpp

zichanglai · 2023-09-14T12:23:50Z

This PR still needs to take the create_shared tests from MemoryShardManager tests so we can get a code-coverage check. The few tests there should cover most paths we care about as is I think.

yes, homestore version ShardManager unit test is added in the latest commit.

src/lib/homestore/shard_manager.cpp

JacksonYao287 · 2023-09-14T15:46:20Z

src/lib/homestore/shard_manager.cpp

+    auto iter = _flying_shards.find(lsn);
+    if (iter != _flying_shards.end()) {
+        _flying_shards.erase(iter);
+    }


here, should we delete the block written in on_precommit?

No, I think. I think the written shard header will be used for recovery in case of meta blks are lost from the previous discussion meeting.

if we rollback the pre_commit, this means this is a canceled create_shard, why to recovery this?

coming to this, i have another question. when rolling back a pre_commit, seems nuraft does not append a log for rollback, what the statemachine does is just call the rollback function(i just only read the nuraft doc, but not deep dive the code for now, so if this is wrong, please correct me). so , if it crashes after we called pre_commit but before or not finish calling statemachie#rollback , who will take the task to finish the rollback job after restart

yes, it is absolutely more clear to truncate/overwrite the shard header which is already written in on_pre_commit(), but homestore DataService is append-only model, I think overwritten is conflict with this model. even this shard header is leaked, I think it will be finalized recycled by chunk compaction, right? because this log is rollbacked and will never be commited, so from the homeobject side, it can not see this shard any more(whether now or after recovery).

I think your mentioned case may happens, restart happens after on_pre_commit() but before on_rollback(), in such case, this shard header is leaked, but same to question 1, it will be recycled finalized.

src/lib/homestore/shard_manager.cpp

zichanglai · 2023-09-15T00:22:48Z

Can you merge main? I think that may be the reason we're not seeing coverage reports...

Also force-pushing makes reviewing incredibly difficult. If we can just merge that'd be best; we can always squash at the end.

sure, I do not realize it before and will take cake of this from now.

codecov-commenter · 2023-09-15T13:24:50Z

Codecov Report

Patch coverage: 82.69% and project coverage change: +0.40% 🎉

Comparison is base (291b070) 81.04% compared to head (3b9a0a6) 81.44%.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #34      +/-   ##
==========================================
+ Coverage   81.04%   81.44%   +0.40%     
==========================================
  Files          16       18       +2     
  Lines         248      469     +221     
  Branches       26       48      +22     
==========================================
+ Hits          201      382     +181     
- Misses         31       57      +26     
- Partials       16       30      +14

Files Changed	Coverage Δ
src/include/homeobject/shard_manager.hpp	`0.00% <ø> (ø)`
src/lib/homestore/replication_state_machine.cpp	`72.00% <72.00%> (ø)`
src/lib/blob_manager.cpp	`83.33% <75.00%> (ø)`
src/lib/homestore/homeobject.cpp	`85.00% <80.00%> (-3.47%)`	⬇️
src/lib/homestore/shard_manager.cpp	`81.11% <81.81%> (+21.11%)`	⬆️
src/lib/shard_manager.cpp	`87.50% <88.23%> (+0.40%)`	⬆️
src/lib/homeobject_impl.hpp	`100.00% <100.00%> (ø)`
src/lib/homestore/replication_state_machine.hpp	`100.00% <100.00%> (ø)`
src/lib/memory/shard_manager.cpp	`81.81% <100.00%> (+1.81%)`	⬆️
src/lib/pg_manager.cpp	`76.78% <100.00%> (+2.78%)`	⬆️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

szmyd · 2023-09-15T14:34:43Z

Nice; I think once we get the CodeCoverage up above 80% we can consider merging this. Let me know if any help is needed to get those unit tests made.

zichanglai · 2023-09-16T14:26:25Z

Nice; I think once we get the CodeCoverage up above 80% we can consider merging this. Let me know if any help is needed to get those unit tests made.

thanks, Brian. I had added some more unit tests to improve code coverage to 81.99% with the latest few commits.

szmyd

let's merge this and make fixes/api changes later. 👍

szmyd · 2023-09-19T03:25:16Z

src/lib/homestore/shard_manager.cpp

+    if (!replica_set->is_leader()) {
+        LOGWARN("pg [{}] replica set is not a leader, please retry other pg members", pg_owner);
+        return folly::makeUnexpected(ShardError::NOT_LEADER);
+    }


Oh; shard creation should be able to happen everywhere. We can relax this in M3 I think when we have real replicas. Ideally yes it happens on leader, but it's "best-effort" on a follower (shard id may be invalid).

Just to follow up; this has to be the case since by the time we actually call append_entries in HomeStore the leadership has changed since it was checked. We can never pre-check leadership...

maybe let HS return an error in this case? Same situation is also true for write_blob request right?

Yes; if we want to prevent this we can just turn forwarding off in the raft_server; it will reject append_entries if it's not currently the leader. But I'm not sure why we care about leadership here...we're basically just doing what the DM would retry anyways by forwarding it.
We stick who the leader was in the response in case the DM wants to adjust.

yes, if raft server on follower side support msg forward, then this check will not be necessary. before I do not know this feature is enabled in nuraft.

OK yes, only for the optimize_path. DM and GW need to adjust as well.

Actually I was kind of not sure why OM/CM team need that strict regarding sending to leader :

This comment was marked as resolved.

Sign in to view

zichanglai closed this Sep 4, 2023

zichanglai force-pushed the create_shard branch from ea7f37d to 1272ed1 Compare September 4, 2023 13:03

zichanglai reopened this Sep 4, 2023

zichanglai force-pushed the create_shard branch 3 times, most recently from 9945e64 to d933d65 Compare September 5, 2023 05:21

zichanglai requested review from xiaoxichen, szmyd, hkadayam and JacksonYao287 September 5, 2023 07:45

zichanglai force-pushed the create_shard branch 3 times, most recently from 9d064e8 to b17a28d Compare September 6, 2023 13:07

This comment was marked as resolved.

Sign in to view

zichanglai force-pushed the create_shard branch from b17a28d to 60c9f69 Compare September 7, 2023 08:40

zichanglai added the MileStone2 label Sep 7, 2023

zichanglai force-pushed the create_shard branch from 60c9f69 to 7b17b72 Compare September 7, 2023 08:49

zichanglai requested a review from szmyd September 7, 2023 08:50

This comment was marked as resolved.

Sign in to view

szmyd reviewed Sep 7, 2023

View reviewed changes

src/lib/homestore/homeobject.cpp Outdated Show resolved Hide resolved

This comment was marked as resolved.

Sign in to view

JacksonYao287 reviewed Sep 8, 2023

View reviewed changes

zichanglai added 2 commits September 11, 2023 03:03

SDSTOR-11601 add create_shard implementation

8d103f8

add metablk recovery support for create_shard

0af7f2e

zichanglai force-pushed the create_shard branch from 7b17b72 to 0af7f2e Compare September 11, 2023 10:12

This comment was marked as outdated.

Sign in to view

zichanglai force-pushed the create_shard branch from ce58fe9 to 61ec156 Compare September 13, 2023 07:45

fix some review comments

f87d68f

zichanglai force-pushed the create_shard branch from 61ec156 to f87d68f Compare September 13, 2023 10:33

szmyd reviewed Sep 13, 2023

View reviewed changes

src/lib/homestore/shard_manager.cpp Outdated Show resolved Hide resolved

szmyd reviewed Sep 13, 2023

View reviewed changes

src/lib/homestore/shard_manager.cpp Outdated Show resolved Hide resolved

This comment was marked as outdated.

Sign in to view

add unit test for create_shard

0f29aee

JacksonYao287 reviewed Sep 14, 2023

View reviewed changes

src/lib/homestore/shard_manager.cpp Outdated Show resolved Hide resolved

JacksonYao287 reviewed Sep 14, 2023

View reviewed changes

This comment was marked as outdated.

Sign in to view

zichanglai requested a review from yamingk September 15, 2023 00:19

zichanglai added 2 commits September 15, 2023 06:11

fix some review comments in create_shard

f995734

Merge branch 'origin/main'

ad3860c

zichanglai added 3 commits September 15, 2023 08:45

improve create_shard code coverage

320ea2b

continue to improve create_shard code coverage

806e15d

Merge origin/main branch

13a0d45

zichanglai requested review from JacksonYao287 and szmyd September 19, 2023 01:50

szmyd previously approved these changes Sep 19, 2023

View reviewed changes

szmyd added this to the 2 milestone Sep 19, 2023

Merge 'origin/main' into create_shard

c909e05

zichanglai dismissed szmyd’s stale review via c909e05 September 19, 2023 03:12

szmyd reviewed Sep 19, 2023

View reviewed changes

no need to check pg leader when create_shard

3b9a0a6

xiaoxichen approved these changes Sep 19, 2023

View reviewed changes

zichanglai merged commit 1cbf09d into eBay:main Sep 19, 2023
24 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SDSTOR-11601 add create_shard implementation #34

SDSTOR-11601 add create_shard implementation #34

zichanglai commented Sep 1, 2023 •

edited

Loading

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

JacksonYao287 left a comment

JacksonYao287 Sep 8, 2023

zichanglai Sep 11, 2023

JacksonYao287 Sep 14, 2023

zichanglai Sep 15, 2023

This comment was marked as outdated.

This comment was marked as outdated.

zichanglai commented Sep 14, 2023

JacksonYao287 Sep 14, 2023

zichanglai Sep 15, 2023

JacksonYao287 Sep 15, 2023 •

edited

Loading

zichanglai Sep 15, 2023

This comment was marked as outdated.

zichanglai commented Sep 15, 2023

codecov-commenter commented Sep 15, 2023 •

edited

Loading

szmyd commented Sep 15, 2023

zichanglai commented Sep 16, 2023

szmyd left a comment

szmyd Sep 19, 2023 •

edited

Loading

szmyd Sep 19, 2023

xiaoxichen Sep 19, 2023

szmyd Sep 19, 2023 •

edited

Loading

zichanglai Sep 19, 2023

xiaoxichen Sep 19, 2023

SDSTOR-11601 add create_shard implementation #34

SDSTOR-11601 add create_shard implementation #34

Conversation

zichanglai commented Sep 1, 2023 • edited Loading

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

JacksonYao287 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

This comment was marked as outdated.

This comment was marked as outdated.

zichanglai commented Sep 14, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JacksonYao287 Sep 15, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

This comment was marked as outdated.

zichanglai commented Sep 15, 2023

codecov-commenter commented Sep 15, 2023 • edited Loading

Codecov Report

szmyd commented Sep 15, 2023

zichanglai commented Sep 16, 2023

szmyd left a comment

Choose a reason for hiding this comment

szmyd Sep 19, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

szmyd Sep 19, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zichanglai commented Sep 1, 2023 •

edited

Loading

JacksonYao287 Sep 15, 2023 •

edited

Loading

codecov-commenter commented Sep 15, 2023 •

edited

Loading

szmyd Sep 19, 2023 •

edited

Loading

szmyd Sep 19, 2023 •

edited

Loading