Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix split, merge, varlen remove, (parent size estimation), simple remove #181

Merged
merged 1 commit into from
Oct 23, 2023
Merged

Fix split, merge, varlen remove, (parent size estimation), simple remove #181

merged 1 commit into from
Oct 23, 2023

Conversation

shosseinimotlagh
Copy link
Contributor

@shosseinimotlagh shosseinimotlagh commented Sep 22, 2023

The objective of this pull request is to address specific edge cases in btree implementations that are more likely to occur with lower node sizes (such as 512).

The proposed fixes are:

For split nodes: a) Add a capacity check before inserting entities into the new child. b) Modify configuration settings so that variable sized key/value pairs never exceed the maximum size and provide sufficient capacity to accommodate at least two entities in the worst-case scenarios.

For merge nodes: a) Ensure that the parent node always has enough space to accommodate updated keys before performing the merge for variable-sized node types. b) Correct the number of entries that can be accommodated to the leftmost node for simple node types. c) Address the calculation of available space when removals happen in variable-length node types d) unlocking/free up the nodes must in reverse order.

@codecov-commenter
Copy link

codecov-commenter commented Sep 23, 2023

Codecov Report

Attention: 11 lines in your changes are missing coverage. Please review.

Comparison is base (4fb8fed) 69.16% compared to head (7eefccc) 68.84%.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #181      +/-   ##
==========================================
- Coverage   69.16%   68.84%   -0.33%     
==========================================
  Files          93       93              
  Lines        7577     7558      -19     
  Branches      972      964       -8     
==========================================
- Hits         5241     5203      -38     
- Misses       1876     1903      +27     
+ Partials      460      452       -8     
Files Coverage Δ
src/include/homestore/btree/detail/simple_node.hpp 74.17% <100.00%> (-3.46%) ⬇️
src/include/homestore/btree/btree.ipp 70.46% <0.00%> (-0.96%) ⬇️
src/include/homestore/btree/detail/varlen_node.hpp 71.84% <80.00%> (-3.23%) ⬇️
...clude/homestore/btree/detail/btree_remove_impl.ipp 56.14% <40.00%> (+1.24%) ⬆️

... and 6 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@shosseinimotlagh shosseinimotlagh changed the title Fix btree merge, varlen remove (parent size estimation), simple remove Fix split, merge, varlen remove, (parent size estimation), simple remove Sep 27, 2023
@shosseinimotlagh shosseinimotlagh added this to the MileStone2 milestone Sep 27, 2023
@@ -207,7 +207,7 @@ retry:
m_btree_lock.unlock_shared();

ret = check_collapse_root(req);
if (ret != btree_status_t::success) {
if (ret != btree_status_t::success && ret != btree_status_t::merge_not_required) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for not raising an error when collapsing root does not perform
due to triggering disable_merge in config.

@@ -552,6 +552,7 @@ btree_status_t Btree< K, V >::split_node(const BtreeNodePtr& parent_node, const
template < typename K, typename V >
template < typename ReqT >
bool Btree< K, V >::is_split_needed(const BtreeNodePtr& node, const BtreeConfig& cfg, ReqT& req) const {
if (node->total_entries() <= 1) { return false; }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be possible that in some key/value size setting, the total used size of a node is more than ideal size even with one entity ( sizeof(key[0]) + sizeof (value[0]) + record_size > ideal_size). In this scenario, the split cannot happen. So a check is needed that if the total entities is 1, split can be performed under no circumstance.

If the force_split_node is enabled in one thread and the do_put returns retry, it is possible that in another thread get the lock and remove all entities in this node. So a check is needed that if the node is empty no split is needed

@@ -288,6 +288,7 @@ bool Btree< K, V >::remove_extents_in_leaf(const BtreeNodePtr& node, BtreeRangeR
template < typename K, typename V >
template < typename ReqT >
btree_status_t Btree< K, V >::check_collapse_root(ReqT& req) {
if (!m_bt_cfg.m_merge_turned_on) { return btree_status_t::merge_not_required; }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for not collapsing root when merge_disable is triggered

@@ -147,8 +147,7 @@ class SimpleNode : public BtreeNode {
}

uint32_t num_entries_by_size(uint32_t start_idx, uint32_t size) const override {
uint32_t possible_entries = (size == 0) ? 0 : (size - 1) / get_nth_obj_size(0) + 1;
return std::min(possible_entries, this->total_entries() - start_idx);
return std::min(size / get_nth_obj_size(0), this->total_entries() - start_idx);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for simple node type, when the size was not divisible, the round up value is returned which can lead to out of space happens in merge.

For example:

size = 100
get_nth_obj_size = 30

The correct return value must be 100/30 = 3 and not 4.

@@ -166,7 +166,7 @@ class VariableNode : public BtreeNode {
get_nth_value(ind_s - 1, &last_1_val, false);
this->set_edge_value(last_1_val);

for (uint32_t i = ind_s; i < total_entries; i++) {
for (uint32_t i = ind_s - 1; i < total_entries; i++) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrong calculation of available size.
When removing the edge entity happens, the entity[ind_s-1] is set as a new edge and removed as the normal entity. So its size must be released because it is set as a new edge.

if ((kb.size + vb.size + this->get_record_size()) > size_to_move) {
// We reached threshold of how much we could move
break;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current implementation of insert, when there is no enough space for accommodating the new entity, an assert is triggered. To avoid hitting it, the check for available size needs to be performed in advance.

@@ -49,7 +49,8 @@ static std::string gen_random_string(size_t len, uint32_t preamble = std::numeri
static thread_local std::random_device rd{};
static thread_local std::default_random_engine re{rd()};
std::uniform_int_distribution< size_t > rand_char{0, alphanum.size() - 1};
for (size_t i{0}; i < len; ++i) {
if (len < str.size()) { len = str.size(); }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

len + to_string(preamble).size() must be always less than g_max_keysize

str += alphanum[rand_char(re)];
}
str += '\0';
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not needed

@@ -439,32 +439,6 @@ TYPED_TEST(BtreeTest, RangeUpdate) {
this->query_validate(0, num_entries - 1, 75);
}

TYPED_TEST(BtreeTest, SimpleRemoveRange) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a very primitive UT (not needed)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest not to remove this UT. I made changes in the btree logic and this test is the only one which could find me the bug, rest passed, so we don't need to throw away this tests.

if (old_node->total_entries()) {
post_merge_size += old_node->get_nth_obj_size(
std::min(leftmost_src.last_node_upto, old_node->total_entries() - 1)); // New leftmost entry
auto excess = old_nodes.size() - new_nodes.size();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is important to note that the available size for the parent node may appear to be adequate at first, but can potentially lead to space shortages during the merging process itself. To prevent such shortages, it is necessary to estimate the available space in advance, just as it would occur during the actual merge. This would enable the calculation and avoidance of any potential violations that may arise during the merging process, such as those caused by insufficient space.
the step is a following:

1- release space in the parent due to excess nodes
2- remove old last key and insert new last key (and check the available space for each iteration)
3- update the left most key

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to document the suggestion we discussed: While logic here might be correct, we probably can avoid this by simply checking for criteria for merge to be 2 * max_key_size(). That ensures that we always have room for writing 2 keys? right?

Copy link
Contributor Author

@shosseinimotlagh shosseinimotlagh Oct 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be more precise, it is the updated keys in the parent * max_key_size(). The number of updated keys are either 1 or new_nodes.size(). There is a possibility that the last key of all new nodes (expect the very last one) and also the last key of the leftmost node can be updated. It is worth noting that there might be no new nodes and all of old new nodes are consolidated into leftmost node.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screenshot 2023-10-10 at 10 57 12 AM

auto excess_releasing_nodes =
old_nodes.size() - new_nodes.size() - parent_node->total_entries() == end_idx ? 1 : 0;
auto minimum_releasing_excess_size =
excess_releasing_nodes * (BtreeLinkInfo::get_fixed_size() + parent_node->get_record_size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be not be true, in case of different node types, we can't always assume node releases the entire record (say if the node has format where node are ref counted). My PR removes this assumption and ask the node to calculate that for us than btree apis to assume that. I think its ok to do it and modify as part of my PR.

hkadayam
hkadayam previously approved these changes Oct 23, 2023
Copy link
Contributor

@hkadayam hkadayam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, except the condition I mentioned in merge logic. As long as we keep that in my mind to do following my PR, I am ok with these changes.

@shosseinimotlagh shosseinimotlagh merged commit 5ee291c into eBay:master Oct 23, 2023
16 checks passed
@shosseinimotlagh shosseinimotlagh deleted the fix_remove_512 branch October 23, 2023 21:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix the btree split node in variable length node - membtree (512 node size only)
4 participants