Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scalability by workspace sharding #310

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jdesnoes
Copy link

@jdesnoes jdesnoes commented Jan 7, 2025

Description of your changes

This pull request introduces a scalability enhancement to the terraform provider, enabling workspace and provider configuration sharding for better resource distribution and management in multi-replica environments. This feature significantly improves reconciliation speed and reduces the time taken to apply changes in large environments (e.g., 300+ workspaces).

Key features:

  • Replica Indexing: The replica index is used to determine which resources are managed by a specific replica. This replica index is calculated dynamically when adding, updating or deleting pods in the replicaset.
  • Modulo Operation for Sharding: The provider uses a modulo operation to filter resources according to the replica index and total number of replicas, ensuring even distribution of workloads.
  • Workspace & Provider Config Sharding: In the reconciliation process, resources are now distributed across replicas based on a unique replica index.

The feature is disabled by default and can be activated by adding the --enable-workspace-sharding flag.

Limitations:

  • To retrieve the replica index and total replica count, the provider queries the replicaset K8s resource, which requires read permissions (get, list and watch) on the pods and replicaset resources. This requires a specific role binding to be set up. Considering this, it's useful to set a specific service account name in the Deployment Runtime Config.
  • The current replica need to know its pod name to determine its index in the replicaset. This can be done thanks to the valueFrom and fieldRef keywords in the Deployment Runtime Config.

This change enables faster performance in multi-replica environments, especially in large-scale use cases with numerous workspaces.

I have:

  • Run make reviewable to ensure this PR is ready for review.
  • Updated the documentation: I will do it if you are interested by this feature.

How has this code been tested

The code has been tested by using debug messages in real conditions to verify that the workspaces are correctly sharded to unique replicas. Additionally, I monitored the speed of the reconciliation process and observed that it performs faster, especially when handling large environments with multiple workspaces. This confirmed that the sharding mechanism effectively improves the speed of resource reconciliation.

@Upbound-CLA
Copy link

Upbound-CLA commented Jan 7, 2025

CLA assistant check
All committers have signed the CLA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants