-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix split, merge, varlen remove, (parent size estimation), simple remove #181
Conversation
Codecov ReportAttention:
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## master #181 +/- ##
==========================================
- Coverage 69.16% 68.84% -0.33%
==========================================
Files 93 93
Lines 7577 7558 -19
Branches 972 964 -8
==========================================
- Hits 5241 5203 -38
- Misses 1876 1903 +27
+ Partials 460 452 -8
☔ View full report in Codecov by Sentry. |
@@ -207,7 +207,7 @@ retry: | |||
m_btree_lock.unlock_shared(); | |||
|
|||
ret = check_collapse_root(req); | |||
if (ret != btree_status_t::success) { | |||
if (ret != btree_status_t::success && ret != btree_status_t::merge_not_required) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for not raising an error when collapsing root does not perform
due to triggering disable_merge
in config.
@@ -552,6 +552,7 @@ btree_status_t Btree< K, V >::split_node(const BtreeNodePtr& parent_node, const | |||
template < typename K, typename V > | |||
template < typename ReqT > | |||
bool Btree< K, V >::is_split_needed(const BtreeNodePtr& node, const BtreeConfig& cfg, ReqT& req) const { | |||
if (node->total_entries() <= 1) { return false; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be possible that in some key/value size setting, the total used size of a node is more than ideal size even with one entity ( sizeof(key[0]) + sizeof (value[0]) + record_size > ideal_size
). In this scenario, the split cannot happen. So a check is needed that if the total entities is 1, split can be performed under no circumstance.
If the force_split_node is enabled in one thread and the do_put returns retry, it is possible that in another thread get the lock and remove all entities in this node. So a check is needed that if the node is empty no split is needed
@@ -288,6 +288,7 @@ bool Btree< K, V >::remove_extents_in_leaf(const BtreeNodePtr& node, BtreeRangeR | |||
template < typename K, typename V > | |||
template < typename ReqT > | |||
btree_status_t Btree< K, V >::check_collapse_root(ReqT& req) { | |||
if (!m_bt_cfg.m_merge_turned_on) { return btree_status_t::merge_not_required; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for not collapsing root when merge_disable
is triggered
@@ -147,8 +147,7 @@ class SimpleNode : public BtreeNode { | |||
} | |||
|
|||
uint32_t num_entries_by_size(uint32_t start_idx, uint32_t size) const override { | |||
uint32_t possible_entries = (size == 0) ? 0 : (size - 1) / get_nth_obj_size(0) + 1; | |||
return std::min(possible_entries, this->total_entries() - start_idx); | |||
return std::min(size / get_nth_obj_size(0), this->total_entries() - start_idx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for simple node type, when the size was not divisible, the round up value is returned which can lead to out of space happens in merge.
For example:
size = 100
get_nth_obj_size = 30
The correct return value must be 100/30 = 3
and not 4.
@@ -166,7 +166,7 @@ class VariableNode : public BtreeNode { | |||
get_nth_value(ind_s - 1, &last_1_val, false); | |||
this->set_edge_value(last_1_val); | |||
|
|||
for (uint32_t i = ind_s; i < total_entries; i++) { | |||
for (uint32_t i = ind_s - 1; i < total_entries; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wrong calculation of available size.
When removing the edge entity happens, the entity[ind_s-1]
is set as a new edge and removed as the normal entity. So its size must be released because it is set as a new edge.
if ((kb.size + vb.size + this->get_record_size()) > size_to_move) { | ||
// We reached threshold of how much we could move | ||
break; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the current implementation of insert, when there is no enough space for accommodating the new entity, an assert is triggered. To avoid hitting it, the check for available size needs to be performed in advance.
@@ -49,7 +49,8 @@ static std::string gen_random_string(size_t len, uint32_t preamble = std::numeri | |||
static thread_local std::random_device rd{}; | |||
static thread_local std::default_random_engine re{rd()}; | |||
std::uniform_int_distribution< size_t > rand_char{0, alphanum.size() - 1}; | |||
for (size_t i{0}; i < len; ++i) { | |||
if (len < str.size()) { len = str.size(); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
len + to_string(preamble).size()
must be always less than g_max_keysize
str += alphanum[rand_char(re)]; | ||
} | ||
str += '\0'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not needed
src/tests/test_mem_btree.cpp
Outdated
@@ -439,32 +439,6 @@ TYPED_TEST(BtreeTest, RangeUpdate) { | |||
this->query_validate(0, num_entries - 1, 75); | |||
} | |||
|
|||
TYPED_TEST(BtreeTest, SimpleRemoveRange) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a very primitive UT (not needed)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest not to remove this UT. I made changes in the btree logic and this test is the only one which could find me the bug, rest passed, so we don't need to throw away this tests.
if (old_node->total_entries()) { | ||
post_merge_size += old_node->get_nth_obj_size( | ||
std::min(leftmost_src.last_node_upto, old_node->total_entries() - 1)); // New leftmost entry | ||
auto excess = old_nodes.size() - new_nodes.size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is important to note that the available size for the parent node may appear to be adequate at first, but can potentially lead to space shortages during the merging process itself. To prevent such shortages, it is necessary to estimate the available space in advance, just as it would occur during the actual merge. This would enable the calculation and avoidance of any potential violations that may arise during the merging process, such as those caused by insufficient space.
the step is a following:
1- release space in the parent due to excess nodes
2- remove old last key and insert new last key (and check the available space for each iteration)
3- update the left most key
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to document the suggestion we discussed: While logic here might be correct, we probably can avoid this by simply checking for criteria for merge to be 2 * max_key_size(). That ensures that we always have room for writing 2 keys? right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be more precise, it is the updated keys in the parent * max_key_size(). The number of updated keys are either 1 or new_nodes.size()
. There is a possibility that the last key of all new nodes (expect the very last one) and also the last key of the leftmost node can be updated. It is worth noting that there might be no new nodes and all of old new nodes are consolidated into leftmost node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto excess_releasing_nodes = | ||
old_nodes.size() - new_nodes.size() - parent_node->total_entries() == end_idx ? 1 : 0; | ||
auto minimum_releasing_excess_size = | ||
excess_releasing_nodes * (BtreeLinkInfo::get_fixed_size() + parent_node->get_record_size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be not be true, in case of different node types, we can't always assume node releases the entire record (say if the node has format where node are ref counted). My PR removes this assumption and ask the node to calculate that for us than btree apis to assume that. I think its ok to do it and modify as part of my PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, except the condition I mentioned in merge logic. As long as we keep that in my mind to do following my PR, I am ok with these changes.
The objective of this pull request is to address specific edge cases in btree implementations that are more likely to occur with lower node sizes (such as 512).
The proposed fixes are:
For split nodes: a) Add a capacity check before inserting entities into the new child. b) Modify configuration settings so that variable sized key/value pairs never exceed the maximum size and provide sufficient capacity to accommodate at least two entities in the worst-case scenarios.
For merge nodes: a) Ensure that the parent node always has enough space to accommodate updated keys before performing the merge for variable-sized node types. b) Correct the number of entries that can be accommodated to the leftmost node for simple node types. c) Address the calculation of available space when removals happen in variable-length node types d) unlocking/free up the nodes must in reverse order.