-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Index crash recovery test cases and fixes #458
Conversation
@@ -109,7 +109,7 @@ class AppendBlkAllocator : public BlkAllocator { | |||
std::string to_string() const override; | |||
|
|||
void cp_flush(CP* cp) override; | |||
|
|||
void recovery_completed() override {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needed to be explicit notification that recovery is completed, so that it can rebuild the caches (for certain blk allocators)
return BlkAllocStatus::SUCCESS; | ||
} | ||
|
||
void FixedBlkAllocator::recovery_completed() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved the recovery processing from first alloc to the explicit call recovery_completed()
~CrashSimulator() = default; | ||
|
||
void crash() { | ||
if (m_restart_cb) { | ||
m_crashed.update([](auto* s) { *s = true; }); | ||
|
||
// We can restart on a new thread to allow other operations to continue | ||
std::thread t([this]() { m_restart_cb(); }); | ||
std::thread t([cb = std::move(m_restart_cb)]() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not access this pointer (instead move the cb out of this variable)
@@ -149,6 +149,7 @@ class DeviceManager { | |||
|
|||
bool is_first_time_boot() const { return m_first_time_boot; } | |||
void format_devices(); | |||
void commit_formatting(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yamingk The files device.h, device_manager.cpp, hs_super_blk.h and homestore.cpp are relevant changes regarding taking cp on first_time_boot related changes. Please review these.
} | ||
|
||
first_block* fblk = r_cast< first_block* >(buf); | ||
fblk->formatting_done = 0x1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case of there are three pdevs, and 1 got persisted with formatting_done set to 0x1, before 2 other pdevs persist their superblock, we crash.
On reboot, what we want is we should return need_format true in homestore::start().
However, on reboot in above case, in DeviceManager::DeviceManager
, when we detect 1st pdev is_valid() is true (its formatting_done already set to 0x1), we will set m_first_time_boot
as false, which will return need_format
as false, which is not expected, right?
We want to change DeviceManager::DeviceManager
to only return m_first_time_boot as true when all pdevs is_valid() return true, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be fixed globally and decide strategically, not just for formatting, but in general its a valid device or not. Currently we need superblk from only one device to be valid and if they are, we proceed. We can choose to do it only if all (but we need to worry about new device insertion). I would just use is_valid() and let the DeviceManager::DeviceManager make a decision.
* Fixed the case where if cp is not flushed and homestore crashed, restart would not init certain key parameters. This is fixed by taking cp everytime we do formatting and then we persist this information to homestore first_blk. * Fixed the crash simulator issue where restart doesn't initialize existing HSTestHelper instances * Several fixes in IndexTable to make sure we can restart and reapply the logs
Add visualize change diff and some print
2afe7f8
to
12d9398
Compare
12d9398
to
ed6bd0d
Compare
dd9c2da
to
e255e13
Compare
* Fixed the case where if cp is not flushed and homestore crashed, restart would not init certain key parameters. This is fixed by taking cp everytime we do formatting and then we persist this information to homestore first_blk. * Fixed the crash simulator issue where restart doesn't initialize existing HSTestHelper instances * Several fixes in IndexTable to make sure we can restart and reapply the logs
|
||
if (buf->m_up_buffer != real_up_buf) { | ||
real_up_buf->m_wait_for_down_buffers.increment(1); | ||
buf->m_up_buffer = real_up_buf; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exactly
fmt::format_to(std::back_inserter(str), " [EMPTY] "); | ||
return str; | ||
// return ""; | ||
// #if 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please return it backif 0
to not break homeobject . I will fix it soon to work for it too, soon
Fixed the case where if cp is not flushed and homestore crashed, restart would not init certain key parameters. This is fixed by taking cp everytime we do formatting and then we persist this information to homestore first_blk.
Fixed the crash simulator issue where restart doesn't initialize existing HSTestHelper instances
Several fixes in IndexTable to make sure we can restart and reapply the logs