Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there any way to reset cluster manually ? #5776

Closed
zcz3313 opened this issue May 18, 2021 · 8 comments
Closed

Is there any way to reset cluster manually ? #5776

zcz3313 opened this issue May 18, 2021 · 8 comments
Labels
kind/question Category issues related to questions or problems

Comments

@zcz3313
Copy link

zcz3313 commented May 18, 2021

Is there any way to reset cluster manually ?

@KomachiSion
Copy link
Collaborator

deregister it and redo register

@KomachiSion KomachiSion added the kind/question Category issues related to questions or problems label May 18, 2021
@zcz3313
Copy link
Author

zcz3313 commented May 19, 2021

deregister it and redo register

Sorry, I'll go into more detail.
I use cluster mode in k8s env.
Now I will delete ~/nacos/data/protocol dir and restart pod each time when cluster got something wrong.
Am I right? Or is there another better way to recovery cluster in prod env?

@weissxu
Copy link

weissxu commented May 20, 2021

I have experienced the similar problem , may be nacos can conside that:
And some feature, for example, a property for ' delete the local datas and reload it from other nacos instance.'

@plusmancn
Copy link
Contributor

@zcz3313 @weissxu
I am kind of confused about why do we need to delete the local data of a node for service recovery?
In my mind, the consensus protocol Jraft which Naocs depends on would guarantee the data to be consistent between nodes in finally.

Maybe you can tell me that what's original demand you really want.

@zcz3313
Copy link
Author

zcz3313 commented May 21, 2021

@zcz3313 @weissxu
I am kind of confused about why do we need to delete the local data of a node for service recovery?
In my mind, the consensus protocol Jraft which Naocs depends on would guarantee the data to be consistent between nodes in finally.

Maybe you can tell me that what's original demand you really want.

I found nacos is unstable under cluster mode after some tests.
E.g, more than half of nodes down at the same time will cause raft protocol couln't select a leader, and nacos cluster couln't recovery even those nodes restart later.
So I need to find a way to recovery nacos cluster in prod env.
Now in my test env, I found that delete the protocol dir seems to work.

@plusmancn
Copy link
Contributor

E.g, more than half of nodes down at the same time will cause raft protocol couln't select a leader, and nacos cluster couln't recovery even those nodes restart later.

According to the standard raft protocol, there is no way to recover clusters from more than half of the nodes that can't respond to the leader, unless you don't want to follow the CAP principle.

If any other causes result in cluster's instability, you should try to find the specific problem and fix it.

@zcz3313
Copy link
Author

zcz3313 commented May 21, 2021

E.g, more than half of nodes down at the same time will cause raft protocol couln't select a leader, and nacos cluster couln't recovery even those nodes restart later.

According to the standard raft protocol, there is no way to recover clusters from more than half of the nodes that can't respond to the leader, unless you don't want to follow the CAP principle.

I konw it.
I mean we could do something manually to recover clusters, not just let it being.
e.g, restart node after delete some cache data?

@KomachiSion
Copy link
Collaborator

If more than half node down. Raft will have no leader. But It can recovery when your nodes restart. No need to delete dir.

If you use k8s env. Please use host or domain to set cluster. If use ip. It might cause the raft metadata include removed ip and can't recovery because of more half nodes down.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/question Category issues related to questions or problems
Projects
None yet
Development

No branches or pull requests

4 participants