-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sync_flush should wait is_completed not is_active. #461
Conversation
I dont know how much removing async will remove this part of code @JacksonYao287 |
src/lib/logstore/log_store.cpp
Outdated
@@ -405,7 +408,7 @@ void HomeLogStore::flush_sync(logstore_seq_num_t upto_seq_num) { | |||
if (upto_seq_num == invalid_lsn()) { upto_seq_num = m_records.active_upto(); } | |||
|
|||
// if we have flushed already, we are done | |||
if (!m_records.status(upto_seq_num).is_active) { return; } | |||
if (m_records.status(upto_seq_num).is_completed) { return; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This status is a tri-state check.
- Inactive: We have not received this lsn at all
- Active: Received LSN, yet to flush
- Completed: Completed flush.
So we only need to flush if it is not active. This change will flush when it is inactive as well. So I don't think we should make this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get the point, we need to flush only active, it make sense.
However I am not getting how is_active == false
implies log getting flushed? are we reset bit in m_active_slot_bits
when doing update? I didnt find this part of code in SISL.
src/lib/logstore/log_store.cpp
Outdated
@@ -416,13 +419,13 @@ void HomeLogStore::flush_sync(logstore_seq_num_t upto_seq_num) { | |||
|
|||
// Step 2: After marking this lsn, we again do a check, to avoid a race where completion checked for no lsn | |||
// and the lsn is stored in step 1 above. | |||
if (!m_records.status(upto_seq_num).is_active) { return; } | |||
if (m_records.status(upto_seq_num).is_completed) { return; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
src/lib/logstore/log_store.cpp
Outdated
|
||
// Step 3: Force a flush (with least threshold) | ||
m_logdev->flush_if_needed(1); | ||
|
||
// Step 4: Wait for completion | ||
m_sync_flush_cv.wait(lk, [this, upto_seq_num] { return !m_records.status(upto_seq_num).is_active; }); | ||
m_sync_flush_cv.wait_for(lk, std::chrono::milliseconds(10), [this, upto_seq_num] { return m_records.status(upto_seq_num).is_completed; }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am assuming it is temporary, because caller expect data at the end of it. Either we return an error or we assert? In any case, after new flush changes, we probably don't need this cv so at that point it is moot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes this is temporary for solving the racing we had seen. It is not right , but generally ensure we dont stuck on this CV infinitely for concurrent issue.
542fb2b
to
cab0a0f
Compare
Signed-off-by: Xiaoxi Chen <[email protected]>
Signed-off-by: Xiaoxi Chen <[email protected]>
Signed-off-by: Xiaoxi Chen <[email protected]>
Previous code will add `ndevices` simulated drive into device list which make each replica go with one real drive and ndevices simulated drive. Those simulated drives are identified as FAST, meta/log were on them. Due to the very limited size of simulated drive, we can hit size limit in long running test. Fixing by honor input from hs_repl_test_common. After this fix, if only one drive passed in for a replica of test_raft_repl_dev, that drive will be used as FAST. All services will be started on that real drive. Signed-off-by: Xiaoxi Chen <[email protected]>
Log entry added into StreamTracker in write_async , through m_records.create(). Once create the status is active.
Later on in HomeLogStore::on_write_completion it update the tracker and in the
m_records.update
, is_completed set to true.We should wait for is_completed == true to indicate a log entry has been persisted.