-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible fix for refresh node lock problem #330
Possible fix for refresh node lock problem #330
Conversation
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## master #330 +/- ##
==========================================
- Coverage 49.87% 47.62% -2.25%
==========================================
Files 134 107 -27
Lines 14275 9342 -4933
Branches 1724 1213 -511
==========================================
- Hits 7119 4449 -2670
+ Misses 6513 4416 -2097
+ Partials 643 477 -166 ☔ View full report in Codecov by Sentry. |
auto cpg = hs()->cp_mgr().cp_guard(); | ||
put_req.m_op_context = (void*)cpg.context(cp_consumer_t::INDEX_SVC); | ||
ret = Btree< K, V >::put(put_req); | ||
} while (ret == btree_status_t::cp_mismatch); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check for retry error too. should we wait for few us before trying the next attempt. its rare.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sanebay I don't think we need to wait for timeout before trying cp_mismatch. It is just that entering CP is higher than whats modified. I think we should retry right away.
} | ||
}); | ||
auto fiber_id = i; | ||
iomanager.run_on_forget( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please move/copy this printing the progress to a function if possible and use it on concurrent test ops too. Today we dont have any visibility of whether its stuck or in progress.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
both of them have progress printing but they are slightly different (in terms of printing) because preload doesn't have deadline (time to stop).
Test failure happens when a fiber is slow. This triggers
idx_node->m_last_mod_cp_id > cp_ctx->id()
, hence returns put/remove returnsbtree_status_t::cp_mismatch
. One possible solution is that the caller performs retries. Having said that, in the UT, if itcp_mismatch
happens, the caller retries until it happens.Testing
In order to hit the problem, the number of fibers increases to 100 per thread and 10 threads (1000 fibers in total)
bin/test_index_btree --gtest_filter=*/0.ConcurrentAllOps --preload_size=400000 --num_entries=800000 --num_iters=50000 --gtest_break_on_failure --num_fibers=100 --num_threads=10