You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I encountered an issue when using etcd as the service discovery mechanism for a ProtoActor cluster. When a node loses connection to etcd (for example, due to network fluctuation or during breakpoint debugging), and the network is restored, A gocoroutine to call startKeepAlive to re-register the lease. While the lease is successfully renewed, the current node does not seem to be re-added to the members list in etcd.Provider. This causes the ActorSystem to remain active but the node is no longer part of the cluster.
Steps to Reproduce:
Start a node and use etcd for cluster service discovery. by default, keepAliveTTL=3s and retryInterval=1s.
Disconnect the node from etcd due to network fluctuations or debugging.
After network recovery, use the scheduled coroutine to call startKeepAlive and renew the lease.
Notice that the members list in etcd.Provider does not include the current node, and as a result, the node's ActorSystem does not rejoin the cluster.
Expected Behavior:
After network recovery, the node should successfully re-register itself via startKeepAlive and be added back to the etcd.Providermembers list. The node's ActorSystem should then rejoin the cluster and function normally.
Current Behavior:
The node fails to rejoin the cluster. Even though the lease is renewed, the members list in etcd.Provider is not updated to include the node, which causes the node's ActorSystem to no longer participate in the cluster.
Description:
I encountered an issue when using
etcd
as the service discovery mechanism for aProtoActor
cluster. When a node loses connection toetcd
(for example, due to network fluctuation or during breakpoint debugging), and the network is restored, A gocoroutine to callstartKeepAlive
to re-register the lease. While the lease is successfully renewed, the current node does not seem to be re-added to themembers
list inetcd.Provider
. This causes theActorSystem
to remain active but the node is no longer part of the cluster.Steps to Reproduce:
etcd
for cluster service discovery. by default,keepAliveTTL=3s
andretryInterval=1s
.etcd
due to network fluctuations or debugging.startKeepAlive
and renew the lease.members
list inetcd.Provider
does not include the current node, and as a result, the node'sActorSystem
does not rejoin the cluster.Expected Behavior:
After network recovery, the node should successfully re-register itself via
startKeepAlive
and be added back to theetcd.Provider
members
list. The node'sActorSystem
should then rejoin the cluster and function normally.Current Behavior:
The node fails to rejoin the cluster. Even though the lease is renewed, the
members
list inetcd.Provider
is not updated to include the node, which causes the node'sActorSystem
to no longer participate in the cluster.Environment:
ProtoActor-Go
version: v0.0.0-20240822202345-3c0e61ca19c9etcd
version: v3Additional Information:
keepAliveTTL
configuration can be customized?I would appreciate assistance on how to ensure the node can properly rejoin the cluster after network recovery.
The text was updated successfully, but these errors were encountered: