Rolling update strategy with graceful shutdown feature #127

amotl · 2020-11-25T11:26:43Z

at orchestracities/ngsi-timeseries-api#384 (comment) ff., we and others are looking at the preferred way to upgrade a running CrateDB cluster via the rolling upgrade method [1] and also reflect somehow on the details when running on Kubernetes.

In that specific case, @c0c0n3 outlined on orchestracities/charts#81 how the upgrade procedure usually takes place when using the k8s RollingUpdate update strategy.

In order to start the graceful stop procedure [2] on a single node, one would have to invoke the ALTER CLUSTER DECOMMISSION <nodeId | nodeName> SQL command [3]. In order to orchestrate this across the whole cluster, things will get more complicated, right?

So, I believe that might be a valid scenario to cover through the crate-operator. Would the architecture behind this yield the possibility to do that?

With kind regards,
Andreas.

cc @mfussenegger, @seut

[1] https://crate.io/docs/crate/howtos/en/latest/admin/rolling-upgrade.html
[2] https://crate.io/docs/crate/reference/en/4.3/config/cluster.html#graceful-stop
[3] https://crate.io/docs/crate/reference/en/4.3/sql/statements/alter-cluster.html#decommission-nodeid-nodename

The text was updated successfully, but these errors were encountered:

MarkusH · 2020-11-25T11:41:01Z

The operator already implements a rolling upgrade. The restart behavior is documented in https://crate-operator.readthedocs.io/en/latest/concepts.html#cluster-restart . At this point, however, it's not deallocating data from nodes before restarting them, though. Once #115 has been implemented, it's a rather easy addition, though.

ALTER CLUSTER DECOMMISSION IMO isn't particular viable on Kubernetes, because K8s is going to restart pods when containers fail. https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/ It may useful in the case of upgrades, however, during scaling operations, it won't work.

I'd suggest, instead of implementing two ways for deallocation in the operator (one using ALTER CLUSTER DECOMMISSION and one the manual way as outlined in https://crate-operator.readthedocs.io/en/latest/concepts.html#cluster-scaling), there should be one way the operator is handling that. Since the former option isn't viable in all situations, the approach would need to be the latter.

amotl · 2020-11-25T11:47:38Z

Dear Markus,

thanks for your quick answer and for providing such valuable insights behind the scenes of crate-operator. So, feel free to close this issue right away or eventually keep it open in order to track if the possibility to invoke an ALTER CLUSTER DECOMMISSION command on top of #115 might be implemented in one way or another.

With kind regards,
Andreas.

c0c0n3 · 2020-11-25T12:15:35Z

@amotl, @MarkusH

The operator already implements a rolling upgrade

would the procedure implemented in the Crate Operator be safe for major version upgrades? My understanding of the Crate docs:

https://crate.io/docs/crate/howtos/en/latest/admin/full-restart-upgrade.html

is that in this case you have to shut down every node first, then restart the cluster with the new software version. Do you recommend using the Crate Operator instead of plain K8s rolling upgrades in the case of a Crate major version upgrade?

Many thanks!!

MarkusH · 2020-11-25T12:21:04Z

Hi @c0c0n3. No, at this point it won't be sufficient. There have been discussions about adding full cluster restarts. But since the operator was implemented by us for CrateDB 4.x and 5.x isn't out there yet, we didn't spent much time on looking into that.

I'll open a ticket to track the full cluster restart feature.

c0c0n3 · 2020-11-25T12:30:04Z

Hi @MarkusH, thanks so much for getting back to me, much appreciated!

I'll open a ticket to track the full cluster restart feature.

That's great news for us---and all the others out there who are deploying Crate on K8s I suppose. If you could please tag me on that issue when you open it so I can follow it. Thank you sooo much!!

MarkusH · 2020-11-25T12:30:25Z

@amotl and @c0c0n3 I opened #128 to implement the full cluster restart.

amotl · 2020-11-25T12:42:00Z

Thanks a bunch for this nice conversation. I am closing this now in favor of #128.

amotl mentioned this issue Nov 25, 2020

Mention rolling update strategy is not appropriate for Crate major version upgrade orchestracities/charts#81

Open

MarkusH added the question Further information is requested label Nov 25, 2020

amotl closed this as completed Nov 25, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rolling update strategy with graceful shutdown feature #127

Rolling update strategy with graceful shutdown feature #127

amotl commented Nov 25, 2020 •

edited

Loading

MarkusH commented Nov 25, 2020

amotl commented Nov 25, 2020 •

edited

Loading

c0c0n3 commented Nov 25, 2020

MarkusH commented Nov 25, 2020

c0c0n3 commented Nov 25, 2020

MarkusH commented Nov 25, 2020

amotl commented Nov 25, 2020

Rolling update strategy with graceful shutdown feature #127

Rolling update strategy with graceful shutdown feature #127

Comments

amotl commented Nov 25, 2020 • edited Loading

MarkusH commented Nov 25, 2020

amotl commented Nov 25, 2020 • edited Loading

c0c0n3 commented Nov 25, 2020

MarkusH commented Nov 25, 2020

c0c0n3 commented Nov 25, 2020

MarkusH commented Nov 25, 2020

amotl commented Nov 25, 2020

amotl commented Nov 25, 2020 •

edited

Loading

amotl commented Nov 25, 2020 •

edited

Loading