Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

github issue #159: Setup CP periodic timer and dirty buf exceed callback #182

Merged
merged 2 commits into from
Sep 27, 2023
Merged

github issue #159: Setup CP periodic timer and dirty buf exceed callback #182

merged 2 commits into from
Sep 27, 2023

Conversation

yamingk
Copy link
Contributor

@yamingk yamingk commented Sep 23, 2023

  1. Setup periodic timer for cp flush.
  2. Setup dirty buffer exceed callback.

Test:

  1. Manually tested with a smaller timer and cp triggered and completed with no problem.
Thread 28 "iomgr_thread_0" hit Breakpoint 1, homestore::CPManager::trigger_cp_flush (this=0x55555a670210, force=false) at /tmp/source/git/yk_cp_trigger/src/lib/checkpoint/cp_mgr.cpp:134
134     folly::Future< bool > CPManager::trigger_cp_flush(bool force) {
(gdb) bt
#0  homestore::CPManager::trigger_cp_flush (this=0x55555a670210, force=false) at /tmp/source/git/yk_cp_trigger/src/lib/checkpoint/cp_mgr.cpp:134
#1  0x0000555556e1dbdc in operator() (__closure=0x555558fb8748) at /tmp/source/git/yk_cp_trigger/src/lib/checkpoint/cp_mgr.cpp:57
#2  0x0000555556e286f6 in std::__invoke_impl<void, homestore::CPManager::start(bool)::<lambda(void*)>&, void*>(std::__invoke_other, struct {...} &) (__f=...) at /usr/include/c++/11/bits/invoke.h:61
#3  0x0000555556e27c15 in std::__invoke_r<void, homestore::CPManager::start(bool)::<lambda(void*)>&, void*>(struct {...} &) (__fn=...) at /usr/include/c++/11/bits/invoke.h:111
#4  0x0000555556e26d66 in std::_Function_handler<void(void*), homestore::CPManager::start(bool)::<lambda(void*)> >::_M_invoke(const std::_Any_data &, void *&&) (__functor=..., __args#0=@0x7fffdcfe9150: 0x0)
    at /usr/include/c++/11/bits/std_function.h:290
#5  0x000055555726a653 in std::function<void (void*)>::operator()(void*) const (this=0x555558fb8748, __args#0=0x0) at /usr/include/c++/11/bits/std_function.h:590
#6  0x0000555557262e75 in iomgr::timer_epoll::on_timer_armed (this=0x555558e85180, iodev=0x55555b070420)
    at /root/.conan/data/iomgr/10.0.1-81/oss/master/build/05982b0b42374ef49c40634cd591c536b25c5a05/src/lib/iomgr_timer.cpp:170
#7  0x0000555557262c71 in iomgr::timer_epoll::on_timer_fd_notification (iodev=0x55555b070420)

@@ -46,6 +50,11 @@ void CPManager::start(bool first_time_boot) {
create_first_cp();
m_sb.write();
}

LOGINFO("cp timer is set to {} usec", HS_DYNAMIC_CONFIG(generic.cp_timer_us));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is here instead of constructor to make some of the recover test happy.

@yamingk yamingk linked an issue Sep 23, 2023 that may be closed by this pull request
@codecov-commenter
Copy link

codecov-commenter commented Sep 23, 2023

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (db23fc8) 48.48% compared to head (b9a77f2) 48.51%.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #182      +/-   ##
==========================================
+ Coverage   48.48%   48.51%   +0.02%     
==========================================
  Files          94       94              
  Lines        7452     7462      +10     
  Branches      956      958       +2     
==========================================
+ Hits         3613     3620       +7     
- Misses       3463     3469       +6     
+ Partials      376      373       -3     
Files Coverage Δ
src/include/homestore/checkpoint/cp_mgr.hpp 66.66% <ø> (ø)
src/lib/common/resource_mgr.hpp 100.00% <ø> (ø)
src/lib/checkpoint/cp_mgr.cpp 65.25% <83.33%> (+0.52%) ⬆️

... and 6 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

sanebay
sanebay previously approved these changes Sep 27, 2023
@@ -35,6 +36,9 @@ CPManager::CPManager() :
[this](meta_blk* mblk, sisl::byte_view buf, size_t size) { on_meta_blk_found(std::move(buf), (void*)mblk); },
nullptr);

resource_mgr().register_dirty_buf_exceed_cb(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we wait for cp flush to complete in this exceeded size case ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cp manager controls this, right? If a cp is running, a back-2-back cp will be created.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: will dirty_buf_exceed_cb every time a new dirty_buf added after exceeding the limit?
Trying to avoid the case that we exceeded the dirty limit for a moment and a lot of unnecessary CP get triggered.

Copy link
Contributor Author

@yamingk yamingk Sep 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CP manager will garunteen only one cp can be running at anytime. e.g. if a cp is already running, it returns immediately. And if consumer asks for a forced trigger_cp, a back-2-back cp will be created.
If cp is slow or stuck (due to bugs or disk issues), it push the back pressure to consumers, e.g. no more dirty buffer can be allocated and push back I/O failures.

Copy link
Collaborator

@xiaoxichen xiaoxichen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yamingk yamingk merged commit bd80010 into eBay:master Sep 27, 2023
16 checks passed
@yamingk yamingk deleted the yk_cp_trigger branch September 27, 2023 18:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Checkpointing trigger point setup
4 participants