You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Feb 1, 2024. It is now read-only.
I've been noticing an issue when using PBFT consensus and hoped I could find some help here.
When two different nodes attempt to write different blocks at very similar times, one node will beat the other, and the other node will fail to write a block. That is expected, but then the node that failed to write a block will report this failed to cancel block error:
INFO | pbft_engine::node:47 | Failed to cancel block when becoming secondary: InvalidState("Cannot cancel block in current state")
This leads to the pbft engine behaving irregularly and eventually crashing.
When the pbft engine eventually crashes, it will state there has been a zmq error which states socket dropped with little other context.
Before the crash it is unable to properly use consensus.
Other nodes may also crash when this failure state is met, even though they did not fail to write a block.
Stopping and rebuilding the docker containers tends to fix this issue, but it is concerning that it occurs at all, and that other nodes fail that did not enter this error state.
I realized that the nodes I am running are running pfbt engine version 1.0.2, and I am planning on upgrading to the latest version. However, I'm uncertain if that will prevent this issue from happening again. I so far have not been able to consistently replicate the issue as it is a specific timing error that occurs when two nodes are attempting to write a block at very similar times. However, I have seen it occur multiple times and am concerned about pbft's stability.
The text was updated successfully, but these errors were encountered:
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
I've been noticing an issue when using PBFT consensus and hoped I could find some help here.
When two different nodes attempt to write different blocks at very similar times, one node will beat the other, and the other node will fail to write a block. That is expected, but then the node that failed to write a block will report this failed to cancel block error:
This leads to the pbft engine behaving irregularly and eventually crashing.
When the pbft engine eventually crashes, it will state there has been a zmq error which states
socket dropped
with little other context.Before the crash it is unable to properly use consensus.
Other nodes may also crash when this failure state is met, even though they did not fail to write a block.
Stopping and rebuilding the docker containers tends to fix this issue, but it is concerning that it occurs at all, and that other nodes fail that did not enter this error state.
I realized that the nodes I am running are running pfbt engine version 1.0.2, and I am planning on upgrading to the latest version. However, I'm uncertain if that will prevent this issue from happening again. I so far have not been able to consistently replicate the issue as it is a specific timing error that occurs when two nodes are attempting to write a block at very similar times. However, I have seen it occur multiple times and am concerned about pbft's stability.
The text was updated successfully, but these errors were encountered: