-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Index write back cache fixes. #207
Conversation
a308448
to
6971f4c
Compare
Codecov ReportAttention:
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## master #207 +/- ##
==========================================
- Coverage 73.52% 69.75% -3.77%
==========================================
Files 120 97 -23
Lines 11926 7941 -3985
Branches 1442 1022 -420
==========================================
- Hits 8768 5539 -3229
+ Misses 2519 1899 -620
+ Partials 639 503 -136
☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add git comments and explain the issue and the fix in description.
9a9af75
to
8d92ff4
Compare
@@ -307,7 +307,7 @@ BtreeNode* Btree< K, V >::init_node(uint8_t* node_buf, uint32_t node_ctx_size, b | |||
/* Note:- This function assumes that access of this node is thread safe. */ | |||
template < typename K, typename V > | |||
void Btree< K, V >::free_node(const BtreeNodePtr& node, locktype_t cur_lock, void* context) { | |||
BT_NODE_LOG(DEBUG, node, "Freeing node"); | |||
BT_NODE_LOG(TRACE, node, "Freeing node"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Too much trace in logs during shutdown.
@@ -687,7 +687,7 @@ class FixedPrefixNode : public VariantNode< K, V > { | |||
--phdr->used_slots; | |||
prefix_bitset_.reset_bit(slot_num); | |||
if ((slot_num != 0) && (slot_num == phdr->tail_slot)) { | |||
uint16_t prev_slot = prefix_bitset_.get_prev_set_bit(slot_num); | |||
uint16_t prev_slot = prefix_bitset_.get_next_set_bit(slot_num); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will remove once rebased to latest.
bool is_clean() const { | ||
RELEASE_ASSERT(m_node_buf, "Node buffer null"); | ||
return (m_node_buf->m_state.load() == index_buf_state_t::CLEAN); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added some wrapper functions and to_string cleaner
str = fmt::format("cp {} left {} parent {}", cp_ctx->id(), left_child_buf->to_string(), | ||
parent_buf->to_string()); | ||
for (const auto& right_child_node : new_nodes) { | ||
auto right_child = IndexBtreeNode::convert(right_child_node.get()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add these trace logs only if its enabled. hopefully trace_index_bufs only called when logtrace enabled.
// buffer prior to this copy, would have been written and already added into the dirty buffer list. | ||
idx_node->m_idx_buf = wb_cache().copy_buffer(idx_node->m_idx_buf); | ||
idx_node->m_idx_buf = wb_cache().copy_buffer(idx_node->m_idx_buf, cp_ctx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Creating per cp IndexBuffer for writes.
src/lib/index/index_cp.hpp
Outdated
@@ -47,11 +47,6 @@ struct IndexCPContext : public CPContext { | |||
CPContext(cp_id), m_dirty_buf_list{dirty_list}, m_free_node_blkid_list{free_blkid_list} {} | |||
|
|||
virtual ~IndexCPContext() { | |||
auto it = m_dirty_buf_list->begin(true /* latest */); | |||
IndexBufferPtr *tmp = nullptr; | |||
while((tmp = m_dirty_buf_list->next(it)) != nullptr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
m_dirty_buf_list->clear() will cleanup
src/lib/index/index_cp.hpp
Outdated
// Mapping from a node to all its parents in the graph. | ||
// Display all buffers and its dependencies and state. | ||
std::unordered_map< IndexBuffer*, std::vector< IndexBuffer* > > parents; | ||
IndexBufferPtr* buf; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
show a buf and dependencies
ex:
IndexBuffer 0x60b000055a80 blkid=562954248388608 state=1 node_buf=0x6210000a3e00 next_buffer=0x60b000055870 wait_for=2
Depends: 0x60b00004bdc0(1) 0x60b000055b30(1)
src/lib/index/index_cp.hpp
Outdated
void check_cycle() { | ||
// Use dfs to find if the graph is cycle | ||
IndexBufferPtr* buf; | ||
auto it = m_dirty_buf_list->begin(true /* latest */); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Utility functions to find cycle and validate leader count.
|
||
// When we copy the buffer we check if the node buffer is clean or not. If its clean | ||
// we could reuse it otherwise create a copy. | ||
if (cur_buf->is_clean()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments added.
third->m_wait_for_leaders.increment(1); | ||
if (chain != second) { | ||
// We want buffers to be append to the end of the chain which are related. | ||
// If we split a node multiple times in same or different CP's, each dirty buffer will be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Details in the comments
// TODO the index buffer are added to end of the chain, instead add to the dependency. | ||
auto& last_in_chain = r_cast< IndexCPContext* >(cp_ctx)->m_last_in_chain; | ||
if (last_in_chain) { | ||
// Add this to the end of the chain. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This last_in_chain was not used in the past. Removed it.
buf->m_buf_state = index_buf_state_t::FLUSHING; | ||
void IndexWBCache::do_flush_one_buf(IndexCPContext* cp_ctx, IndexBufferPtr buf, bool part_of_batch) { | ||
LOGTRACEMOD(wbcache, "cp {} buf {}", cp_ctx->id(), buf->to_string()); | ||
{ buf->set_state(index_buf_state_t::FLUSHING); } | ||
m_vdev->async_write(r_cast< const char* >(buf->raw_buffer()), m_node_size, buf->m_blkid, part_of_batch) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of using IndexBuffer* , using IndexBufferPtr for safety and consistency.
@@ -70,6 +70,7 @@ struct BtreeTestHelper : public testing::Test { | |||
protected: | |||
std::shared_ptr< typename T::BtreeType > m_bt; | |||
ShadowMap< K, V > m_shadow_map; | |||
std::mutex m_shadow_map_mutex; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lock needed for the map as below api's are called multi threaded.
std::ifstream b(before, std::ifstream::ate); | ||
std::ifstream a(after, std::ifstream::ate); | ||
if (a.fail() || b.fail()) { | ||
LOGINFO("Failed to open file"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Too much memory and overhead comparing large files earlier. So read in chunks and compare. Optimize to compare size first.
while (!stop_cp_flush) { | ||
LOGINFO("Trigger checkpoint flush wait=false."); | ||
test_common::HSTestHelper::trigger_cp(false /* wait */); | ||
auto remove_io_thread = std::thread([this, &stop, num_entries, &last_index] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Run insert, remove and flush cp threads in parallel.
@@ -64,29 +64,70 @@ enum class index_buf_state_t : uint8_t { | |||
}; | |||
|
|||
///////////////////////// Btree Node and Buffer Portion ////////////////////////// | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General comment, it would be very useful if we put a README under wb_cache directory to illustrate the hierarchy of btree_node -> IndexBuffer and NodeBuffer relationship and its clean/dirty copy behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And it will be super helpful if we can explain first, second and third meaning while creating the chain with an example, so that after several month we can read the code easier. :)
8d92ff4
to
5b5e1a0
Compare
e2a29a6
to
4036ffb
Compare
4036ffb
to
f26dfb2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Add atomic for state for indexbuffer to avoid concurrency issues with cp flush and insert threads. Create per CP indexbuffer. Create NodeBuffer which points to actual data buffer. Several indexbuffer an can point to same node buffer. Add locks in test for index btree shadow map as there are concurrent requests. Use shared ptr's in wb cache.
5ebfae2
f26dfb2
to
5ebfae2
Compare
Add atomic for state for indexbuffer to avoid concurrency issues with cp flush and insert threads. Create per CP indexbuffer. Create NodeBuffer which points to actual data buffer. Several indexbuffer an can point to same node buffer. Add locks in test for index btree shadow map as there are concurrent requests. Use shared ptr's in wb cache.
Add atomic for state for indexbuffer to avoid concurrency issues with cp flush and insert threads. Create per CP indexbuffer. Create NodeBuffer which points to actual data buffer. Several indexbuffer an can point to same node buffer. Add locks in test for index btree shadow map as there are concurrent requests. Use shared ptr's in wb cache.
Add locks for indexbuffer to avoid concurrency issues with cp flush and insert threads.
Create dependency graph.
Add locks for index btree test code shadow map as there are concurrent requests.
Use shared ptr's in wb cache.
Testing. Run the index_btree continuously back to back for several hours.
./../failuntil.sh ./bin/test_index_btree --gtest_filter=BtreeTest/0.* --num_entries 100000