Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible fix for refresh node lock problem #330

Merged
merged 1 commit into from
Feb 28, 2024
Merged

Possible fix for refresh node lock problem #330

merged 1 commit into from
Feb 28, 2024

Conversation

shosseinimotlagh
Copy link
Contributor

@shosseinimotlagh shosseinimotlagh commented Feb 20, 2024

Test failure happens when a fiber is slow. This triggers idx_node->m_last_mod_cp_id > cp_ctx->id(), hence returns put/remove returns btree_status_t::cp_mismatch. One possible solution is that the caller performs retries. Having said that, in the UT, if it cp_mismatch happens, the caller retries until it happens.

Testing

In order to hit the problem, the number of fibers increases to 100 per thread and 10 threads (1000 fibers in total)
bin/test_index_btree --gtest_filter=*/0.ConcurrentAllOps --preload_size=400000 --num_entries=800000 --num_iters=50000 --gtest_break_on_failure --num_fibers=100 --num_threads=10

@shosseinimotlagh shosseinimotlagh linked an issue Feb 20, 2024 that may be closed by this pull request
@codecov-commenter
Copy link

codecov-commenter commented Feb 20, 2024

Codecov Report

Attention: Patch coverage is 75.00000% with 4 lines in your changes are missing coverage. Please review.

Project coverage is 47.62%. Comparing base (13e235d) to head (6ad7447).

❗ Current head 6ad7447 differs from pull request most recent head e1bf315. Consider uploading reports for the commit e1bf315 to get more accurate results

Files Patch % Lines
src/include/homestore/index/index_table.hpp 75.00% 2 Missing and 2 partials ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #330      +/-   ##
==========================================
- Coverage   49.87%   47.62%   -2.25%     
==========================================
  Files         134      107      -27     
  Lines       14275     9342    -4933     
  Branches     1724     1213     -511     
==========================================
- Hits         7119     4449    -2670     
+ Misses       6513     4416    -2097     
+ Partials      643      477     -166     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

auto cpg = hs()->cp_mgr().cp_guard();
put_req.m_op_context = (void*)cpg.context(cp_consumer_t::INDEX_SVC);
ret = Btree< K, V >::put(put_req);
} while (ret == btree_status_t::cp_mismatch);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check for retry error too. should we wait for few us before trying the next attempt. its rare.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sanebay I don't think we need to wait for timeout before trying cp_mismatch. It is just that entering CP is higher than whats modified. I think we should retry right away.

}
});
auto fiber_id = i;
iomanager.run_on_forget(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please move/copy this printing the progress to a function if possible and use it on concurrent test ops too. Today we dont have any visibility of whether its stuck or in progress.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both of them have progress printing but they are slightly different (in terms of printing) because preload doesn't have deadline (time to stop).

sanebay
sanebay previously approved these changes Feb 27, 2024
@shosseinimotlagh shosseinimotlagh merged commit e0cd9a8 into eBay:master Feb 28, 2024
20 checks passed
@shosseinimotlagh shosseinimotlagh deleted the issue312_lock_refresh branch February 28, 2024 18:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

lock refresh problem in btree
4 participants