Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manually scale down Confluence/Jira statefulsets #435

Merged
merged 3 commits into from
Oct 17, 2024
Merged

Conversation

bianchi2
Copy link
Collaborator

@bianchi2 bianchi2 commented Oct 15, 2024

Problem Statement

When scaling down Jira/Confluence with pre-created local-home PVs, Terraform destroys PVC/PV/EBS first and only then updates Helm release. It happens because local-home vols depend on Helm charts and need to be created first. PVC is stuck in terminating state if it's being deleted when the pod is still running.

This PR adds a script that will identify if terraform apply is a scale down event and scale down StatefulSet (and wait until pods are gone). Then Terraform will proceed with deleting local home PVC, PV and EBS which happens almost instantly.

Checklist

  • I have successful end to end tests run (with & without domain)
  • I have added unit tests (if applicable)
  • I have user documentation (if applicable)

if echo "$PRODUCTS" | grep -qE 'jira|confluence'; then
SNAPSHOTS_JSON_FILE_PATH=$(get_variable 'snapshots_json_file_path' "${CONFIG_ABS_PATH}")
if [ "${PATH}" ]; then
local EKS_PREFIX="atlas-"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these fixed constants? Any chance we can refer to them rather than magical string?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cluster name is build on top of environment_name, I have reused a chunk of existing code from install.sh :)

@@ -393,6 +393,32 @@ set_current_context_k8s() {
fi
}

scale_down() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the easiest read, would be useful to have some comments describing at least the flow.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, added a comment.

@nanux
Copy link
Collaborator

nanux commented Oct 16, 2024

This wasn't happening before? Or it was just ignored?

@bianchi2
Copy link
Collaborator Author

@nanux the problem popped up when DCAPT started taking snapshots of local-home volumes (in addition to shared home) to speed up cold start of Jira and Confluence. We create EBS vol, PV and PVC before Helm chart is deployed, and we create as many of them as there are replicas in tfvars. It works well when deploying with scale up and deleting environment. However, if you deploy Jira with 4 nodes and then decrease jira_replica_count to 2, Terraform will first delete PVC, PV and EBS for pods jira-3 and jira-2. However, at this point, pods are running, and PVCs deletion will time out (it will stuck in Terminating because of pvc protection in finalizer).

I have experimented with dependencies in Terraform but failed to achieve desired results with purely Terraform approach. So, this new script will just check if your Terraform apply operation is a scale down event, and if it is, it'll scale down StatefulSet to desired replica count, and when Terraform kicks in it'll be able to delete PVC, PV and EBS volume for the pods that are already gone, and then update Helm release (it'll find no changes really)

@bianchi2 bianchi2 merged commit 4f7a1f4 into main Oct 17, 2024
2 checks passed
@bianchi2 bianchi2 deleted the CLIP-1918-scale-down branch October 17, 2024 03:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants