Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes to support PrefixNode and consolidation of tree #211

Merged
merged 8 commits into from
Nov 10, 2023

Conversation

hkadayam
Copy link
Contributor

Following are the major changes in btree library:

  1. A new BtreeKey and BtreeValue type called IntervalKey and IntervalValue which allows a particular key to be part of an larger interval (say 1-3 will result in key1, key2, key3 and each key is an BtreeIntervalKey).

  2. Implemented a new tree node called PrefixNode which can store interval key and value effectively stores them with prefix and suffix to store the intervals compactly.

  3. Removed the Extent code which was not used and replaced it with the IntervalKey/Value as mentioned above.

  4. Added filtering for read/write/remove operations, so that consumer can change the way which key/value to be custom modify. This change effectively provides compare_and_set or compare_and_remove operations atomically. This change also removed the previous on_read_cb etc, which was passed on every callback

  5. Removed the requirement for BtreeKey and Value to implement some static methods like get_key_size() etc, by creating a dummyKey, dummyValue. Now the requirement of the implementor of btree key and value only needs to ensure it is default constructible (which is already the case today)

  6. Created new internal btree node APIs such as multi_put, multi_get instead of each btree operations repeat similar code to read or remove operations in leaf node. As a result, a method called match_range() is introduced in the btree node, which consolidates and searches for all range operation requirements. Moved most of the common operations into variant_node class and now variant_node is derived from BtreeNode. PrefixNode overrides variant_node and implements its own version of multi_put etc. range operations.

  7. Removed obsolete btree return status.

  8. Consolidated the mem_btree test, mem_btree concurrent test and index_btree test into one btree_test_helper library. Modified the way range_scheduler picks the existing and working keys from boost icl to sis::Bitset as it simplifies them and also provides some essential feature which is always pick some keys to schedule instead of failing.

}

uint32_t const search_start = m_rand_start_key_generator(m_rd);
auto bb = m_existing_keys.get_next_contiguous_n_reset_bits(search_start, max_keys);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function searches for the first occurance of exactly max_keys keys from search_start.
If the purpose is to return something even smaller than max_keys, this need to change to
get_next_contiguous_n_reset_bits(const uint64_t start_bit, const std::optional< uint64_t > end_bit, const uint32_t min_needed, const uint32_t max_needed).
If this request couldn't be found, the caller function for instance, pick_random_non_working_keys keeps in infinite loop and blocks other threads.

#include "btree_test_kvs.hpp"

template < typename K, typename V >
class ShadowMap {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need a lock

@@ -63,10 +71,6 @@ class VariableNode : public BtreeNode {

virtual ~VariableNode() = default;

uint32_t occupied_size(const BtreeConfig& cfg) const override {
return (cfg.node_data_size() - sizeof(var_node_header) - available_size(cfg));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is overriden here because in merge_nodes(), for varlen node type available_size -= old_nodes[i]->occupied_size(); results in miscalculation. It can happen frequently with reduced node size (for instance 512).
In merge if the total entries of a node is needed to move to left, only the size of actual data matters not var_node_header.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am probably missing something, but doing that overridden would defeat the generic purpose right? I am not clear why available_size() doesn't take discount the var_node_header area?

@codecov-commenter
Copy link

codecov-commenter commented Nov 7, 2023

Codecov Report

Attention: 237 lines in your changes are missing coverage. Please review.

Comparison is base (8556e48) 69.87% compared to head (a5b1e8f) 69.78%.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #211      +/-   ##
==========================================
- Coverage   69.87%   69.78%   -0.09%     
==========================================
  Files          95       97       +2     
  Lines        7515     7898     +383     
  Branches      955     1014      +59     
==========================================
+ Hits         5251     5512     +261     
- Misses       1821     1897      +76     
- Partials      443      489      +46     
Files Coverage Δ
src/include/homestore/btree/btree.hpp 100.00% <ø> (ø)
src/include/homestore/btree/btree_req.hpp 97.77% <100.00%> (+0.27%) ⬆️
...rc/include/homestore/btree/detail/btree_common.ipp 78.37% <ø> (+8.51%) ⬆️
.../include/homestore/btree/detail/btree_get_impl.ipp 80.95% <100.00%> (+14.28%) ⬆️
src/include/homestore/btree/detail/simple_node.hpp 76.71% <100.00%> (+1.87%) ⬆️
src/lib/blkalloc/bitmap_blk_allocator.cpp 65.95% <ø> (-1.07%) ⬇️
.../include/homestore/btree/detail/btree_internal.hpp 95.65% <0.00%> (-0.19%) ⬇️
src/include/homestore/index/index_table.hpp 87.27% <75.00%> (-3.39%) ⬇️
src/include/homestore/btree/btree.ipp 70.39% <75.00%> (-0.08%) ⬇️
src/include/homestore/btree/detail/btree_node.hpp 77.97% <87.50%> (+13.74%) ⬆️
... and 8 more

... and 6 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

hkadayam and others added 8 commits November 8, 2023 20:09
Following are the major changes in btree library:

1. A new BtreeKey and BtreeValue type called IntervalKey and IntervalValue which allows a particular key to be part of an larger interval (say 1-3 will result in key1, key2, key3 and each key is an BtreeIntervalKey).

2. Implemented a new tree node called PrefixNode which can store interval key and value effectively stores them with prefix and suffix to store the intervals compactly.

3. Removed the Extent code which was not used and replaced it with the IntervalKey/Value as mentioned above.

4. Added filtering for read/write/remove operations, so that consumer can change the way which key/value to be custom modify. This change effectively provides compare_and_set or compare_and_remove operations atomically. This change also removed the previous on_read_cb etc, which was passed on every callback

5. Removed the requirement for BtreeKey and Value to implement some static methods like get_key_size() etc, by creating a dummyKey, dummyValue. Now the requirement of the implementor of btree key and value only needs to ensure it is default constructible (which is already the case today)

6. Created new internal btree node APIs such as multi_put, multi_get instead of each btree operations repeat similar code to read or remove operations in leaf node. As a result, a method called match_range() is introduced in the btree node, which consolidates and searches for all range operation requirements. Moved most of the common operations into variant_node class and now variant_node is derived from BtreeNode. PrefixNode overrides variant_node and implements its own version of multi_put etc. range operations.

7. Removed obsolete btree return status.

8. Consolidated the mem_btree test, mem_btree concurrent test and index_btree test into one btree_test_helper library. Modified the way range_scheduler picks the existing and working keys from boost icl to sis::Bitset as it simplifies them and also provides some essential feature which is always pick some keys to schedule instead of failing.
Without this commit, homestore 4.x doesn't persist the bitmap of allocated blks during CP.
This commit handles that gap. On top of that, the free blk id collection is moved to VirtualDev
layer, so that both index and data service has common blk id collection. This blkid collection
is essential so that all free blk ids are moved to the next CP after the current CP is flushed.

Changed it to C++-20 standard which resulted in few compilation errors, fixed that.
if (!node->is_leaf()) { // if internal node, size is atmost one additional entry, size of K/V
size_needed = K::get_estimate_max_size() + BtreeLinkInfo::get_fixed_size() + node->get_record_size();
return !node->has_room_for_put(btree_put_type::UPSERT, K::get_max_size(), BtreeLinkInfo::get_fixed_size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it upsert / insert of new child node ?

if (count == 0) {
BT_NODE_LOG_ASSERT(false, my_node, "get_all returns 0 entries for interior node is not valid pattern");
ret = btree_status_t::retry;
const auto matched = my_node->match_range(req.working_range(), start_idx, end_idx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment about match_range in btree_node.hpp will be helpful ?

uint32_t start_ind{0};
uint32_t end_ind{0};
auto cur_count = to_variant_node(my_node)->multi_get(qreq.working_range(),
qreq.batch_size() - uint32_cast(out_values.size()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the use of out_value.size() in batch_count ? out_values is used as argument later.

// An extension of BtreeKey where each key is part of an interval range. Keys are not neccessarily only needs to be
// integers, but it needs to be able to get next or prev key from a given key in the key range
class BtreeIntervalKey : public BtreeKey {
public:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments about the functions will be helpful.

@hkadayam hkadayam merged commit c979057 into eBay:master Nov 10, 2023
16 checks passed
@hkadayam hkadayam deleted the btree_prefix_node branch November 10, 2023 00:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support Btree filtered read/write, along with compare_and_set for mutate/remove operations
5 participants