Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(datahub-gms): Enable autoscaling via HPA #517

Merged
merged 6 commits into from
Nov 29, 2024

Conversation

7onn
Copy link
Contributor

@7onn 7onn commented Nov 3, 2024

Summary

datahub-project/datahub#11761
The company where I work for, started crashing datahub-gms during Snowflake ingestion, and I thought it would be handy to have an autoscaler for this workload. Hence, this pull request.

How to test it?

Edit the values.yaml of both datahub chart, and datahub-gms subchart. Make sure to enable datahub-gms.hpa.enabled and global.datahub_standalone_consumers_enabled. Then, navige to the subchart folder and:

cd charts/datahub/subcharts/datahub-gms
helm template test . --values values.yaml --values ../../values.yaml | yq '. | select(.kind == "HorizontalPodAutoscaler")'

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable)

Comment on lines +89 to +95
{{/*
Create image registry, name and tag for a datahub component
*/}}
{{- define "datahub.image" -}}
{{- $registry := .image.registry | default .imageRegistry -}}
{{ $registry }}/{{ .image.repository }}:{{ required "Global or specific tag is required" (.image.tag | default .version) -}}
{{- end -}}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the root chart has a similar template. I had to copy this so I could helm template from within the subchart folder.

charts/datahub/values.yaml Outdated Show resolved Hide resolved
.github/workflows/lint-test.yaml Outdated Show resolved Hide resolved
.github/workflows/lint-test.yaml Outdated Show resolved Hide resolved
@7onn 7onn marked this pull request as ready for review November 3, 2024 15:06
@david-leifker david-leifker self-assigned this Nov 22, 2024
@david-leifker
Copy link
Contributor

This looks good to me. I would be interested to know how this works in production, can GMS scale quick enough to help with load spikes. I can see this being helpful for long running ingestion runs for sure.

Thank you!

@david-leifker david-leifker merged commit 2a0dc8c into acryldata:master Nov 29, 2024
1 check passed
@7onn
Copy link
Contributor Author

7onn commented Dec 3, 2024

This looks good to me. I would be interested to know how this works in production, can GMS scale quick enough to help with load spikes. I can see this being helpful for long running ingestion runs for sure.

Thank you!

Hi David, first of all, thanks a lot for your time reviewing my contribution. I highly appreciate it.

In regards to production, I suppose the default setting isn't the most helpful thing with targetCPUUtilizationPercentage: 100 as this would require the app to be under some spike for some time and perhaps even get throttled for a while which could cause the health check to fail and the pod to restart. What I had planned was to release this with targetCPUUtilizationPercentage: 60 to scale out precociously and not even let GMS get throttled by Kubernetes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants