Possible fix for refresh node lock problem #330

shosseinimotlagh · 2024-02-20T23:28:55Z

Test failure happens when a fiber is slow. This triggers idx_node->m_last_mod_cp_id > cp_ctx->id(), hence returns put/remove returns btree_status_t::cp_mismatch. One possible solution is that the caller performs retries. Having said that, in the UT, if it cp_mismatch happens, the caller retries until it happens.

Testing

In order to hit the problem, the number of fibers increases to 100 per thread and 10 threads (1000 fibers in total)
bin/test_index_btree --gtest_filter=*/0.ConcurrentAllOps --preload_size=400000 --num_entries=800000 --num_iters=50000 --gtest_break_on_failure --num_fibers=100 --num_threads=10

codecov-commenter · 2024-02-20T23:44:57Z

Codecov Report

Attention: Patch coverage is 75.00000% with 4 lines in your changes are missing coverage. Please review.

Project coverage is 47.62%. Comparing base (13e235d) to head (6ad7447).

❗ Current head 6ad7447 differs from pull request most recent head e1bf315. Consider uploading reports for the commit e1bf315 to get more accurate results

Files	Patch %	Lines
src/include/homestore/index/index_table.hpp	75.00%	2 Missing and 2 partials ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #330      +/-   ##
==========================================
- Coverage   49.87%   47.62%   -2.25%     
==========================================
  Files         134      107      -27     
  Lines       14275     9342    -4933     
  Branches     1724     1213     -511     
==========================================
- Hits         7119     4449    -2670     
+ Misses       6513     4416    -2097     
+ Partials      643      477     -166

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

sanebay · 2024-02-23T18:54:50Z

src/include/homestore/index/index_table.hpp

+            auto cpg = hs()->cp_mgr().cp_guard();
+            put_req.m_op_context = (void*)cpg.context(cp_consumer_t::INDEX_SVC);
+            ret = Btree< K, V >::put(put_req);
+        } while (ret == btree_status_t::cp_mismatch);


Please check for retry error too. should we wait for few us before trying the next attempt. its rare.

@sanebay I don't think we need to wait for timeout before trying cp_mismatch. It is just that entering CP is higher than whats modified. I think we should retry right away.

sanebay · 2024-02-23T19:03:24Z

src/tests/btree_helpers/btree_test_helper.hpp

-                    }
-                });
+                auto fiber_id = i;
+                iomanager.run_on_forget(


Could you please move/copy this printing the progress to a function if possible and use it on concurrent test ops too. Today we dont have any visibility of whether its stuck or in progress.

both of them have progress printing but they are slightly different (in terms of printing) because preload doesn't have deadline (time to stop).

shosseinimotlagh linked an issue Feb 20, 2024 that may be closed by this pull request

lock refresh problem in btree #312

Closed

shosseinimotlagh requested review from hkadayam and sanebay February 20, 2024 23:47

sanebay reviewed Feb 23, 2024

View reviewed changes

shosseinimotlagh requested a review from sanebay February 23, 2024 23:37

sanebay previously approved these changes Feb 27, 2024

View reviewed changes

Possible fix for refresh node lock problem

e1bf315

shosseinimotlagh dismissed sanebay’s stale review via e1bf315 February 28, 2024 04:05

shosseinimotlagh requested a review from sanebay February 28, 2024 09:04

sanebay approved these changes Feb 28, 2024

View reviewed changes

shosseinimotlagh merged commit e0cd9a8 into eBay:master Feb 28, 2024
20 checks passed

shosseinimotlagh deleted the issue312_lock_refresh branch February 28, 2024 18:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible fix for refresh node lock problem #330

Possible fix for refresh node lock problem #330

shosseinimotlagh commented Feb 20, 2024 •

edited

Loading

codecov-commenter commented Feb 20, 2024 •

edited

Loading

sanebay Feb 23, 2024

hkadayam Feb 24, 2024

sanebay Feb 23, 2024

shosseinimotlagh Feb 27, 2024

Possible fix for refresh node lock problem #330

Possible fix for refresh node lock problem #330

Conversation

shosseinimotlagh commented Feb 20, 2024 • edited Loading

Testing

codecov-commenter commented Feb 20, 2024 • edited Loading

Codecov Report

sanebay Feb 23, 2024

Choose a reason for hiding this comment

hkadayam Feb 24, 2024

Choose a reason for hiding this comment

sanebay Feb 23, 2024

Choose a reason for hiding this comment

shosseinimotlagh Feb 27, 2024

Choose a reason for hiding this comment

shosseinimotlagh commented Feb 20, 2024 •

edited

Loading

codecov-commenter commented Feb 20, 2024 •

edited

Loading