-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add index CR UT for basic merge #556
Conversation
8c86f50
to
90dbfd9
Compare
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## master #556 +/- ##
===========================================
+ Coverage 56.51% 67.22% +10.71%
===========================================
Files 108 109 +1
Lines 10300 10661 +361
Branches 1402 1454 +52
===========================================
+ Hits 5821 7167 +1346
+ Misses 3894 2795 -1099
- Partials 585 699 +114 ☔ View full report in Codecov by Sentry. |
953b8b5
to
05b8744
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we intending to do more work on this PR? It does not meet the 80% coverage requirement and will lower the current overall coverage once merged which we committed to avoiding.
a5152a2
to
a2c1986
Compare
@@ -247,7 +258,15 @@ class IndexTable : public IndexTableBase, public Btree< K, V > { | |||
parent_node->node_id()); | |||
return btree_status_t::not_found; | |||
} | |||
BT_LOG(INFO, "Repairing node={} with last_parent_key={}", parent_node->to_string(), | |||
|
|||
// Get all current child ids |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please explain the algorithm here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added more comments.
The original child IDs serve as a supportive condition for deciding when to exit the loop. If a lower-level merge is committed while the parent-level merge is not, we may see children with keys beyond the original last key of the parent. In such case, we need to check if it's in the original children list to determine its parent.
@szmyd Is there any coverage report I can refer to for this specific commit? I can only find this but seems its comparison base is still not the latest: |
Rebased onto HEAD of master, removed freed bufs from common DAG ( |
auto ret = iomgr_flip::instance()->get_test_flip< uint64_t >("btree_merge_node_pct"); | ||
if (ret && occupied_size() < (ret.get() * node_data_size() / 100)) { return true; } | ||
#endif | ||
if (min_keys_in_node()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if we need min_keys_in_node is 0? should we set it as 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since here's returning total_entries() < min_keys_in_node()
, I don't think setting it to 0 can be meaningful. Or do you mean using min_keys_in_node=0
to disable merge?
freed_bufs.push_back(static_cast<IndexBtreeNode *>(freed_root.get())->m_idx_buf); | ||
} | ||
// Meta is similar to a leftmost child here - it should always be the up buffer for (both) root(s) | ||
wb_cache().transact_bufs(ordinal(), nullptr, m_sb_buffer, {root_buf}, freed_bufs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what would be the problem of meta to be parent? would you please elaborate here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed this change since it may not be that necessary and I cannot recall the detailed case it was for.
We can go back to this later when we find more corner cases regarding the root node.
for (uint32_t i = 0; i < parent_node->total_entries(); ++i) { | ||
BtreeLinkInfo link_info; | ||
parent_node->get_nth_value(i, &link_info, true); | ||
orig_child_ids.insert(link_info.bnode_id()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the reason of having orig_child_ids?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corner case 2 & 3, since the first/last key can both change and exceed the original parent last key, we need this table to see if they should belong to the parent.
@@ -424,6 +424,8 @@ std::error_code VirtualDev::sync_write(const char* buf, uint32_t size, BlkId con | |||
|
|||
Chunk* chunk; | |||
uint64_t const dev_offset = to_dev_offset(bid, &chunk); | |||
HS_LOG(TRACE, device, "Writing sync in device: {}, offset = {}", chunk->physical_dev_mutable()->pdev_id(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are these changes in this file your change or has been rebased from the latest master? If it is your change please elaborate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we might need some sync write traces as well - but if it can be flooding in some cases it's ok to remove them.
down_buf->m_up_buffer->m_wait_for_down_buffers.decrement(); | ||
#ifndef NDEBUG | ||
bool found{false}; | ||
for (auto it = down_buf->m_up_buffer->m_down_buffers.begin(); it != down_buf->m_up_buffer->m_down_buffers.end(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO : later change m_down_buffers to set (not in this pr).
} | ||
|
||
//////////////////// Recovery Related section ///////////////////////////////// | ||
void IndexWBCache::load_buf(IndexBufferPtr const& buf) { | ||
void IndexWBCache::load_buf(IndexBufferPtr const &buf) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the clang for you and mine is different.
@@ -764,7 +806,7 @@ void IndexWBCache::get_next_bufs_internal(IndexCPContext* cp_ctx, uint32_t max_c | |||
std::optional< IndexBufferPtr > buf = cp_ctx->next_dirty(); | |||
if (!buf) { break; } // End of list | |||
|
|||
if ((*buf)->m_wait_for_down_buffers.testz()) { | |||
if ((*buf)->state() == index_buf_state_t::DIRTY && (*buf)->m_wait_for_down_buffers.testz()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need to check the state? is it possible state is clean or flushing and there is wait for down buffers? I believe it has been fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I met a case where the fiber in async_cp_flush
was iterating over dirty buffers, while process_write_completion
tried to track potential dirty up_buffer - these two processes can reach the same buf multiple times.
// this->remove_one(k.first); | ||
// } | ||
// this->visualize_keys(fmt::format("tree_before_first_crash.{}_{}.dot", s_idx, f_idx)); | ||
// this->remove_flip(flip_point); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we still need it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suppose there is two consecutive merges:
A
B C D
1- first merge: free D , DAG: A-> C
2- second merge: Free C, DAG: A->B->C(deleted is true)
Where does your change handle this situation in recover?
BtreeLinkInfo{child_node->node_id(), child_node->link_version()}); | ||
if (!child_node->is_node_deleted()) { | ||
cur_parent->insert(cur_parent->total_entries(), | ||
child_node->total_entries() > 0 ? child_last_key : last_parent_key, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
b) leave the empty children but assign unused parent last keys to the empty children
a448cc4
to
4f67d65
Compare
Signed-off-by: Jilong Kou <[email protected]>
Rebased onto HEAD of master. |
@JacksonYao287 Can you help with an approval for merging? Just lost Mehdi's previous approval due to rebase. |
https://docs.google.com/presentation/d/1NUvu1ZlDKqJWfBayg2lSWHSB6TEaZ4Y3VbnGc996drM/edit#slide=id.g2f8c68a8252_1_10
Add basic index CR logic & UT.
Major changes including:
wb_cache::recover()
for recovered new/freed buffersindex_table::repair_link()
As well as some bug fixes:
There are still some known issues in some corner cases, such as incorrect next_node pointer. However, since solving these may introduce complicated modification to the current logic, let's leave them to future PRs and discuss later.