feat(datahub-gms): Enable autoscaling via HPA #517

7onn · 2024-11-03T14:33:29Z

Summary

datahub-project/datahub#11761
The company where I work for, started crashing datahub-gms during Snowflake ingestion, and I thought it would be handy to have an autoscaler for this workload. Hence, this pull request.

How to test it?

Edit the values.yaml of both datahub chart, and datahub-gms subchart. Make sure to enable datahub-gms.hpa.enabled and global.datahub_standalone_consumers_enabled. Then, navige to the subchart folder and:

cd charts/datahub/subcharts/datahub-gms
helm template test . --values values.yaml --values ../../values.yaml | yq '. | select(.kind == "HorizontalPodAutoscaler")'

Checklist

The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
Links to related issues (if applicable)
Tests for the changes have been added/updated (if applicable)
Docs related to the changes have been added/updated (if applicable)

7onn · 2024-11-03T14:56:41Z

charts/datahub/subcharts/datahub-gms/templates/_helpers.tpl

+{{/*
+Create image registry, name and tag for a datahub component
+*/}}
+{{- define "datahub.image" -}}
+{{- $registry := .image.registry | default .imageRegistry -}}
+{{ $registry }}/{{ .image.repository }}:{{ required "Global or specific tag is required" (.image.tag | default .version) -}}
+{{- end -}}


the root chart has a similar template. I had to copy this so I could helm template from within the subchart folder.

charts/datahub/values.yaml

.github/workflows/lint-test.yaml

david-leifker · 2024-11-29T15:09:04Z

This looks good to me. I would be interested to know how this works in production, can GMS scale quick enough to help with load spikes. I can see this being helpful for long running ingestion runs for sure.

Thank you!

7onn · 2024-12-03T16:42:07Z

This looks good to me. I would be interested to know how this works in production, can GMS scale quick enough to help with load spikes. I can see this being helpful for long running ingestion runs for sure.

Thank you!

Hi David, first of all, thanks a lot for your time reviewing my contribution. I highly appreciate it.

In regards to production, I suppose the default setting isn't the most helpful thing with targetCPUUtilizationPercentage: 100 as this would require the app to be under some spike for some time and perhaps even get throttled for a while which could cause the health check to fail and the pod to restart. What I had planned was to release this with targetCPUUtilizationPercentage: 60 to scale out precociously and not even let GMS get throttled by Kubernetes.

7onn added 4 commits November 3, 2024 15:32

feat(datahub-gms): Enable autoscaling via HPA

58773fa

trigger chart lint

4d88b8f

trigger workflow

1715d2b

Add missing datahub.image template for datahub-gms

0ba87cc

7onn commented Nov 3, 2024

View reviewed changes

Apply suggestions from code review

779b0a9

7onn marked this pull request as ready for review November 3, 2024 15:06

david-leifker self-assigned this Nov 22, 2024

Merge branch 'master' into 7onn/autoscale-gms

46681b4

david-leifker approved these changes Nov 29, 2024

View reviewed changes

david-leifker merged commit 2a0dc8c into acryldata:master Nov 29, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(datahub-gms): Enable autoscaling via HPA #517

feat(datahub-gms): Enable autoscaling via HPA #517

7onn commented Nov 3, 2024 •

edited

Loading

7onn Nov 3, 2024

david-leifker commented Nov 29, 2024

7onn commented Dec 3, 2024

feat(datahub-gms): Enable autoscaling via HPA #517

feat(datahub-gms): Enable autoscaling via HPA #517

Conversation

7onn commented Nov 3, 2024 • edited Loading

Summary

How to test it?

Checklist

7onn Nov 3, 2024

Choose a reason for hiding this comment

david-leifker commented Nov 29, 2024

7onn commented Dec 3, 2024

7onn commented Nov 3, 2024 •

edited

Loading