Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rolling update strategy with graceful shutdown feature #127

Closed
amotl opened this issue Nov 25, 2020 · 7 comments
Closed

Rolling update strategy with graceful shutdown feature #127

amotl opened this issue Nov 25, 2020 · 7 comments
Labels
question Further information is requested

Comments

@amotl
Copy link
Member

amotl commented Nov 25, 2020

Dear @MarkusH, @lukasbals and @chaudum,

at orchestracities/ngsi-timeseries-api#384 (comment) ff., we and others are looking at the preferred way to upgrade a running CrateDB cluster via the rolling upgrade method [1] and also reflect somehow on the details when running on Kubernetes.

In that specific case, @c0c0n3 outlined on orchestracities/charts#81 how the upgrade procedure usually takes place when using the k8s RollingUpdate update strategy.

In order to start the graceful stop procedure [2] on a single node, one would have to invoke the ALTER CLUSTER DECOMMISSION <nodeId | nodeName> SQL command [3]. In order to orchestrate this across the whole cluster, things will get more complicated, right?

So, I believe that might be a valid scenario to cover through the crate-operator. Would the architecture behind this yield the possibility to do that?

With kind regards,
Andreas.

cc @mfussenegger, @seut

[1] https://crate.io/docs/crate/howtos/en/latest/admin/rolling-upgrade.html
[2] https://crate.io/docs/crate/reference/en/4.3/config/cluster.html#graceful-stop
[3] https://crate.io/docs/crate/reference/en/4.3/sql/statements/alter-cluster.html#decommission-nodeid-nodename

@MarkusH
Copy link
Contributor

MarkusH commented Nov 25, 2020

The operator already implements a rolling upgrade. The restart behavior is documented in https://crate-operator.readthedocs.io/en/latest/concepts.html#cluster-restart . At this point, however, it's not deallocating data from nodes before restarting them, though. Once #115 has been implemented, it's a rather easy addition, though.

ALTER CLUSTER DECOMMISSION IMO isn't particular viable on Kubernetes, because K8s is going to restart pods when containers fail. https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/ It may useful in the case of upgrades, however, during scaling operations, it won't work.

I'd suggest, instead of implementing two ways for deallocation in the operator (one using ALTER CLUSTER DECOMMISSION and one the manual way as outlined in https://crate-operator.readthedocs.io/en/latest/concepts.html#cluster-scaling), there should be one way the operator is handling that. Since the former option isn't viable in all situations, the approach would need to be the latter.

@amotl
Copy link
Member Author

amotl commented Nov 25, 2020

Dear Markus,

thanks for your quick answer and for providing such valuable insights behind the scenes of crate-operator. So, feel free to close this issue right away or eventually keep it open in order to track if the possibility to invoke an ALTER CLUSTER DECOMMISSION command on top of #115 might be implemented in one way or another.

With kind regards,
Andreas.

@c0c0n3
Copy link

c0c0n3 commented Nov 25, 2020

@amotl, @MarkusH

The operator already implements a rolling upgrade

would the procedure implemented in the Crate Operator be safe for major version upgrades? My understanding of the Crate docs:

is that in this case you have to shut down every node first, then restart the cluster with the new software version. Do you recommend using the Crate Operator instead of plain K8s rolling upgrades in the case of a Crate major version upgrade?

Many thanks!!

@MarkusH
Copy link
Contributor

MarkusH commented Nov 25, 2020

Hi @c0c0n3. No, at this point it won't be sufficient. There have been discussions about adding full cluster restarts. But since the operator was implemented by us for CrateDB 4.x and 5.x isn't out there yet, we didn't spent much time on looking into that.

I'll open a ticket to track the full cluster restart feature.

@MarkusH MarkusH added the question Further information is requested label Nov 25, 2020
@c0c0n3
Copy link

c0c0n3 commented Nov 25, 2020

Hi @MarkusH, thanks so much for getting back to me, much appreciated!

I'll open a ticket to track the full cluster restart feature.

That's great news for us---and all the others out there who are deploying Crate on K8s I suppose. If you could please tag me on that issue when you open it so I can follow it. Thank you sooo much!!

@MarkusH
Copy link
Contributor

MarkusH commented Nov 25, 2020

@amotl and @c0c0n3 I opened #128 to implement the full cluster restart.

@amotl
Copy link
Member Author

amotl commented Nov 25, 2020

Thanks a bunch for this nice conversation. I am closing this now in favor of #128.

@amotl amotl closed this as completed Nov 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants