Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add index CR UT for basic merge #556

Merged
merged 1 commit into from
Oct 29, 2024
Merged

Add index CR UT for basic merge #556

merged 1 commit into from
Oct 29, 2024

Conversation

koujl
Copy link
Contributor

@koujl koujl commented Sep 19, 2024

https://docs.google.com/presentation/d/1NUvu1ZlDKqJWfBayg2lSWHSB6TEaZ4Y3VbnGc996drM/edit#slide=id.g2f8c68a8252_1_10


Add basic index CR logic & UT.

Major changes including:

  • Reconstruct our handling strategy in wb_cache::recover() for recovered new/freed buffers
  • Add protection for several corner cases in index_table::repair_link()

As well as some bug fixes:

  • Fix several down_buffer counter leaks
  • Fix a bug in wb_cache async_flush that may cause one buffer get processed multiple times
  • Fix some bugs in racing conditions during homestore restart

There are still some known issues in some corner cases, such as incorrect next_node pointer. However, since solving these may introduce complicated modification to the current logic, let's leave them to future PRs and discuss later.

@koujl koujl force-pushed the merge-ut-1 branch 4 times, most recently from 8c86f50 to 90dbfd9 Compare September 20, 2024 08:20
@codecov-commenter
Copy link

codecov-commenter commented Sep 20, 2024

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 61.48148% with 52 lines in your changes missing coverage. Please review.

Project coverage is 67.22%. Comparing base (1a0cef8) to head (4f67d65).
Report is 80 commits behind head on master.

Files with missing lines Patch % Lines
src/include/homestore/index/index_table.hpp 41.17% 18 Missing and 12 partials ⚠️
src/lib/index/wb_cache.cpp 77.61% 8 Missing and 7 partials ⚠️
src/include/homestore/btree/detail/btree_node.hpp 42.85% 2 Missing and 2 partials ⚠️
src/lib/index/index_cp.cpp 77.77% 1 Missing and 1 partial ⚠️
src/lib/device/virtual_dev.cpp 0.00% 1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@             Coverage Diff             @@
##           master     #556       +/-   ##
===========================================
+ Coverage   56.51%   67.22%   +10.71%     
===========================================
  Files         108      109        +1     
  Lines       10300    10661      +361     
  Branches     1402     1454       +52     
===========================================
+ Hits         5821     7167     +1346     
+ Misses       3894     2795     -1099     
- Partials      585      699      +114     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@koujl koujl force-pushed the merge-ut-1 branch 5 times, most recently from 953b8b5 to 05b8744 Compare September 30, 2024 10:48
@shosseinimotlagh shosseinimotlagh requested a review from szmyd October 3, 2024 00:08
Copy link
Collaborator

@szmyd szmyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we intending to do more work on this PR? It does not meet the 80% coverage requirement and will lower the current overall coverage once merged which we committed to avoiding.

@koujl koujl force-pushed the merge-ut-1 branch 2 times, most recently from a5152a2 to a2c1986 Compare October 8, 2024 08:01
@@ -247,7 +258,15 @@ class IndexTable : public IndexTableBase, public Btree< K, V > {
parent_node->node_id());
return btree_status_t::not_found;
}
BT_LOG(INFO, "Repairing node={} with last_parent_key={}", parent_node->to_string(),

// Get all current child ids
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please explain the algorithm here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added more comments.

The original child IDs serve as a supportive condition for deciding when to exit the loop. If a lower-level merge is committed while the parent-level merge is not, we may see children with keys beyond the original last key of the parent. In such case, we need to check if it's in the original children list to determine its parent.

@koujl
Copy link
Contributor Author

koujl commented Oct 9, 2024

@szmyd Is there any coverage report I can refer to for this specific commit? I can only find this but seems its comparison base is still not the latest:
https://app.codecov.io/gh/eBay/HomeStore/commit/0c2481714651b1ed5881fa37a9b8c979e0af7178

@koujl
Copy link
Contributor Author

koujl commented Oct 22, 2024

Rebased onto HEAD of master, removed freed bufs from common DAG (IndexWBCache::link_buf) and fixed comments.

auto ret = iomgr_flip::instance()->get_test_flip< uint64_t >("btree_merge_node_pct");
if (ret && occupied_size() < (ret.get() * node_data_size() / 100)) { return true; }
#endif
if (min_keys_in_node()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if we need min_keys_in_node is 0? should we set it as 1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since here's returning total_entries() < min_keys_in_node(), I don't think setting it to 0 can be meaningful. Or do you mean using min_keys_in_node=0 to disable merge?

freed_bufs.push_back(static_cast<IndexBtreeNode *>(freed_root.get())->m_idx_buf);
}
// Meta is similar to a leftmost child here - it should always be the up buffer for (both) root(s)
wb_cache().transact_bufs(ordinal(), nullptr, m_sb_buffer, {root_buf}, freed_bufs,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what would be the problem of meta to be parent? would you please elaborate here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this change since it may not be that necessary and I cannot recall the detailed case it was for.
We can go back to this later when we find more corner cases regarding the root node.

for (uint32_t i = 0; i < parent_node->total_entries(); ++i) {
BtreeLinkInfo link_info;
parent_node->get_nth_value(i, &link_info, true);
orig_child_ids.insert(link_info.bnode_id());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the reason of having orig_child_ids?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refer to https://docs.google.com/presentation/d/1NUvu1ZlDKqJWfBayg2lSWHSB6TEaZ4Y3VbnGc996drM/edit#slide=id.g3068b64e06d_0_0

Corner case 2 & 3, since the first/last key can both change and exceed the original parent last key, we need this table to see if they should belong to the parent.

@@ -424,6 +424,8 @@ std::error_code VirtualDev::sync_write(const char* buf, uint32_t size, BlkId con

Chunk* chunk;
uint64_t const dev_offset = to_dev_offset(bid, &chunk);
HS_LOG(TRACE, device, "Writing sync in device: {}, offset = {}", chunk->physical_dev_mutable()->pdev_id(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these changes in this file your change or has been rebased from the latest master? If it is your change please elaborate

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might need some sync write traces as well - but if it can be flooding in some cases it's ok to remove them.

src/lib/index/wb_cache.cpp Show resolved Hide resolved
down_buf->m_up_buffer->m_wait_for_down_buffers.decrement();
#ifndef NDEBUG
bool found{false};
for (auto it = down_buf->m_up_buffer->m_down_buffers.begin(); it != down_buf->m_up_buffer->m_down_buffers.end();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO : later change m_down_buffers to set (not in this pr).

}

//////////////////// Recovery Related section /////////////////////////////////
void IndexWBCache::load_buf(IndexBufferPtr const& buf) {
void IndexWBCache::load_buf(IndexBufferPtr const &buf) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the clang for you and mine is different.

@@ -764,7 +806,7 @@ void IndexWBCache::get_next_bufs_internal(IndexCPContext* cp_ctx, uint32_t max_c
std::optional< IndexBufferPtr > buf = cp_ctx->next_dirty();
if (!buf) { break; } // End of list

if ((*buf)->m_wait_for_down_buffers.testz()) {
if ((*buf)->state() == index_buf_state_t::DIRTY && (*buf)->m_wait_for_down_buffers.testz()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to check the state? is it possible state is clean or flushing and there is wait for down buffers? I believe it has been fixed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I met a case where the fiber in async_cp_flush was iterating over dirty buffers, while process_write_completion tried to track potential dirty up_buffer - these two processes can reach the same buf multiple times.

// this->remove_one(k.first);
// }
// this->visualize_keys(fmt::format("tree_before_first_crash.{}_{}.dot", s_idx, f_idx));
// this->remove_flip(flip_point);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we still need it?

Copy link
Contributor

@shosseinimotlagh shosseinimotlagh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suppose there is two consecutive merges:

     A 
 B     C   D

1- first merge: free D , DAG: A-> C
2- second merge: Free C, DAG: A->B->C(deleted is true)
Where does your change handle this situation in recover?

BtreeLinkInfo{child_node->node_id(), child_node->link_version()});
if (!child_node->is_node_deleted()) {
cur_parent->insert(cur_parent->total_entries(),
child_node->total_entries() > 0 ? child_last_key : last_parent_key,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

b) leave the empty children but assign unused parent last keys to the empty children

@koujl koujl force-pushed the merge-ut-1 branch 3 times, most recently from a448cc4 to 4f67d65 Compare October 24, 2024 11:23
@koujl
Copy link
Contributor Author

koujl commented Oct 29, 2024

Rebased onto HEAD of master.

@koujl
Copy link
Contributor Author

koujl commented Oct 29, 2024

@JacksonYao287 Can you help with an approval for merging? Just lost Mehdi's previous approval due to rebase.

@koujl koujl merged commit 60eea4a into eBay:master Oct 29, 2024
21 checks passed
@koujl koujl deleted the merge-ut-1 branch October 29, 2024 08:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants