-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage 'erase_from' does not work #16
Comments
Hey, @ultrabug! Raft algorithm does not provide this kind of corruption fixing mechanism. In your example:
However, if you remove some log entries from the top or add couple wrong ones, it should fix it. You can check out Raft paper, page 4 — https://raft.github.io/raft.pdf Let me know if I'm wrong :) |
Thanks for your quick reply @zhebrak. I know that the corruption fixing is up for the implementation. However I think we have a race condition in raftos If the log is small, the bad log from the follower is correctly reset but when the log is large it is not. |
Are you sure it's not the last entry or entries that you remove when it's fixing itself correctly? |
I'm sorry, I don't understand what you mean ? |
Follower only fixes log entries when the last one does not match leaders' index & term. So it's not possible to erase something when current entry checks out. If the log is small the probability of removing exactly the last log entry is higher (with random choice), that's why I asked you if that's the case. |
This is not what I understand form the code actually # If an existing entry conflicts with a new one (same index but different terms),
# delete the existing entry and all that follow it
new_index = data['prev_log_index'] + 1
try:
if self.log[new_index]['term'] != data['term'] or (
self.log.last_log_index != prev_log_index
):
self.log.erase_from(new_index)
except IndexError:
pass Looks to me that it can be triggered at any line of the log when the follower comes back up and is fast-forwarding from the leader. If for a reason line 500 of a 2000 lines log matches this condition, we would remove the last 1501 lines from the log and we should start back from 499. I guess I'll have to provide a gist to replicate the issue so you can see it happening by yourself. |
Yes, I as said before, it fixes all inconsistency until the last followers' index and term match the one coming from the leader. It removes everything after the matching index-term pair. |
Ok I have created a reproducible scenario here: http://ultrabug.fr/github/raftos_gist.tar.gz To use it after unpacking the archive:
On another terminal you can check the logs of the worker id 3 :
You can also reproduce this by:
|
Hi
I have been testing multiple times to "corrupt" a log file by removing a random line from it to test the log trimming functionality and it fails to catch up to the leader last log.
https://github.com/zhebrak/raftos/blob/master/raftos/state.py#L413
On my debugging, it seems related to the last_log_index calculation or a sort of mismatch between the leader and the follower indexes but I couldn't find out the fix yet.
Would you be so kind to have a look and tell me if you can reproduce this or if you have an idea on the fix?
Thanks!
The text was updated successfully, but these errors were encountered: