-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Manually scale down Confluence/Jira statefulsets #435
Conversation
if echo "$PRODUCTS" | grep -qE 'jira|confluence'; then | ||
SNAPSHOTS_JSON_FILE_PATH=$(get_variable 'snapshots_json_file_path' "${CONFIG_ABS_PATH}") | ||
if [ "${PATH}" ]; then | ||
local EKS_PREFIX="atlas-" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these fixed constants? Any chance we can refer to them rather than magical string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cluster name is build on top of environment_name, I have reused a chunk of existing code from install.sh :)
@@ -393,6 +393,32 @@ set_current_context_k8s() { | |||
fi | |||
} | |||
|
|||
scale_down() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not the easiest read, would be useful to have some comments describing at least the flow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, added a comment.
This wasn't happening before? Or it was just ignored? |
@nanux the problem popped up when DCAPT started taking snapshots of local-home volumes (in addition to shared home) to speed up cold start of Jira and Confluence. We create EBS vol, PV and PVC before Helm chart is deployed, and we create as many of them as there are replicas in tfvars. It works well when deploying with scale up and deleting environment. However, if you deploy Jira with 4 nodes and then decrease jira_replica_count to 2, Terraform will first delete PVC, PV and EBS for pods jira-3 and jira-2. However, at this point, pods are running, and PVCs deletion will time out (it will stuck in Terminating because of pvc protection in finalizer). I have experimented with dependencies in Terraform but failed to achieve desired results with purely Terraform approach. So, this new script will just check if your Terraform apply operation is a scale down event, and if it is, it'll scale down StatefulSet to desired replica count, and when Terraform kicks in it'll be able to delete PVC, PV and EBS volume for the pods that are already gone, and then update Helm release (it'll find no changes really) |
Problem Statement
When scaling down Jira/Confluence with pre-created local-home PVs, Terraform destroys PVC/PV/EBS first and only then updates Helm release. It happens because local-home vols depend on Helm charts and need to be created first. PVC is stuck in terminating state if it's being deleted when the pod is still running.
This PR adds a script that will identify if terraform apply is a scale down event and scale down StatefulSet (and wait until pods are gone). Then Terraform will proceed with deleting local home PVC, PV and EBS which happens almost instantly.
Checklist