Manually scale down Confluence/Jira statefulsets #435

bianchi2 · 2024-10-15T01:44:01Z

Problem Statement

When scaling down Jira/Confluence with pre-created local-home PVs, Terraform destroys PVC/PV/EBS first and only then updates Helm release. It happens because local-home vols depend on Helm charts and need to be created first. PVC is stuck in terminating state if it's being deleted when the pod is still running.

This PR adds a script that will identify if terraform apply is a scale down event and scale down StatefulSet (and wait until pods are gone). Then Terraform will proceed with deleting local home PVC, PV and EBS which happens almost instantly.

Checklist

I have successful end to end tests run (with & without domain)
I have added unit tests (if applicable)
I have user documentation (if applicable)

nanux · 2024-10-16T21:42:36Z

install.sh

+    if echo "$PRODUCTS" | grep -qE 'jira|confluence'; then
+      SNAPSHOTS_JSON_FILE_PATH=$(get_variable 'snapshots_json_file_path' "${CONFIG_ABS_PATH}")
+      if [ "${PATH}" ]; then
+        local EKS_PREFIX="atlas-"


Are these fixed constants? Any chance we can refer to them rather than magical string?

Cluster name is build on top of environment_name, I have reused a chunk of existing code from install.sh :)

nanux · 2024-10-16T21:43:42Z

install.sh

@@ -393,6 +393,32 @@ set_current_context_k8s() {
  fi
 }

+scale_down() {


This is not the easiest read, would be useful to have some comments describing at least the flow.

Thanks, added a comment.

nanux · 2024-10-16T21:48:03Z

This wasn't happening before? Or it was just ignored?

bianchi2 · 2024-10-17T01:53:15Z

@nanux the problem popped up when DCAPT started taking snapshots of local-home volumes (in addition to shared home) to speed up cold start of Jira and Confluence. We create EBS vol, PV and PVC before Helm chart is deployed, and we create as many of them as there are replicas in tfvars. It works well when deploying with scale up and deleting environment. However, if you deploy Jira with 4 nodes and then decrease jira_replica_count to 2, Terraform will first delete PVC, PV and EBS for pods jira-3 and jira-2. However, at this point, pods are running, and PVCs deletion will time out (it will stuck in Terminating because of pvc protection in finalizer).

I have experimented with dependencies in Terraform but failed to achieve desired results with purely Terraform approach. So, this new script will just check if your Terraform apply operation is a scale down event, and if it is, it'll scale down StatefulSet to desired replica count, and when Terraform kicks in it'll be able to delete PVC, PV and EBS volume for the pods that are already gone, and then update Helm release (it'll find no changes really)

Yevhen Ivantsov added 2 commits October 9, 2024 20:16

Scale down sts before terraform destroys PVCs

e3092b9

Fix label selector to get confluence sts too

de6105d

bianchi2 requested review from yzha645 and nanux as code owners October 15, 2024 01:44

bianchi2 added the e2e label Oct 15, 2024

nanux approved these changes Oct 16, 2024

View reviewed changes

Add comment to function

c287645

bianchi2 merged commit 4f7a1f4 into main Oct 17, 2024
2 checks passed

bianchi2 deleted the CLIP-1918-scale-down branch October 17, 2024 03:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Manually scale down Confluence/Jira statefulsets #435

Manually scale down Confluence/Jira statefulsets #435

bianchi2 commented Oct 15, 2024 •

edited

Loading

nanux Oct 16, 2024

bianchi2 Oct 17, 2024

nanux Oct 16, 2024

bianchi2 Oct 17, 2024

nanux commented Oct 16, 2024

bianchi2 commented Oct 17, 2024

Manually scale down Confluence/Jira statefulsets #435

Manually scale down Confluence/Jira statefulsets #435

Conversation

bianchi2 commented Oct 15, 2024 • edited Loading

Problem Statement

Checklist

nanux Oct 16, 2024

Choose a reason for hiding this comment

bianchi2 Oct 17, 2024

Choose a reason for hiding this comment

nanux Oct 16, 2024

Choose a reason for hiding this comment

bianchi2 Oct 17, 2024

Choose a reason for hiding this comment

nanux commented Oct 16, 2024

bianchi2 commented Oct 17, 2024

bianchi2 commented Oct 15, 2024 •

edited

Loading