-
Notifications
You must be signed in to change notification settings - Fork 206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(options): Add consolidation timeout options #1754
base: main
Are you sure you want to change the base?
feat(options): Add consolidation timeout options #1754
Conversation
Adds two options for the following timeouts - multinodeconsolidation - singlenodeconsolidation These are exposed on the following ways: - `--multi-node-consolidation-timeout` or `MULTI_NODE_CONSOLIDATION_TIMEOUT` - `--single-node-consolidation-timeout` or `SINGLE_NODE_CONSOLIDATION_TIMEOUT` The primary way of testing this was by building the image and running within dev and production clusters within Grafana Labs fleet. --- - refs kubernetes-sigs#1733 Signed-off-by: pokom <[email protected]>
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: Pokom The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Welcome @Pokom! |
Hi @Pokom. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spotted a few changes that are needed.
🚀 let's go!
@@ -119,7 +117,7 @@ func (m *MultiNodeConsolidation) firstNConsolidationOption(ctx context.Context, | |||
lastSavedCommand := Command{} | |||
lastSavedResults := scheduling.Results{} | |||
// Set a timeout | |||
timeout := m.clock.Now().Add(MultiNodeConsolidationTimeoutDuration) | |||
timeout := m.clock.Now().Add(options.FromContext(ctx).SinglenodeConsolidationTimeout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be:
timeout := m.clock.Now().Add(options.FromContext(ctx).SinglenodeConsolidationTimeout) | |
timeout := m.clock.Now().Add(options.FromContext(ctx).MultiNodeConsolidationTimeout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixing this might fix the failing test (specifically this) - not sure 🤔
pkg/operator/options/options.go
Outdated
@@ -96,6 +98,8 @@ func (o *Options) AddFlags(fs *FlagSet) { | |||
fs.DurationVar(&o.BatchMaxDuration, "batch-max-duration", env.WithDefaultDuration("BATCH_MAX_DURATION", 10*time.Second), "The maximum length of a batch window. The longer this is, the more pods we can consider for provisioning at one time which usually results in fewer but larger nodes.") | |||
fs.DurationVar(&o.BatchIdleDuration, "batch-idle-duration", env.WithDefaultDuration("BATCH_IDLE_DURATION", time.Second), "The maximum amount of time with no new pending pods that if exceeded ends the current batching window. If pods arrive faster than this time, the batching window will be extended up to the maxDuration. If they arrive slower, the pods will be batched separately.") | |||
fs.StringVar(&o.FeatureGates.inputStr, "feature-gates", env.WithDefaultString("FEATURE_GATES", "SpotToSpotConsolidation=false"), "Optional features can be enabled / disabled using feature gates. Current options are: SpotToSpotConsolidation") | |||
fs.DurationVar(&o.MultinodeConsolidationTimeout, "multi-node-consolidation-timeout", env.WithDefaultDuration("MULTI_NODE_CONSOLIDATION_TIMEOUT", 1*time.Minute), "The maximum amount of time that can be spent doing multinode consolidation before timing out. Defaults to 1 minute") | |||
fs.DurationVar(&o.SinglenodeConsolidationTimeout, "single-node-consolidation-timeout", env.WithDefaultDuration("SINGLE_NODE_CONSOLIDATION_TIMEOUT", 3*time.Minute), "The maximum amount of time that can be spent doing single node consolidation before timing out. Defaults to 3 minute") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit 😄
fs.DurationVar(&o.SinglenodeConsolidationTimeout, "single-node-consolidation-timeout", env.WithDefaultDuration("SINGLE_NODE_CONSOLIDATION_TIMEOUT", 3*time.Minute), "The maximum amount of time that can be spent doing single node consolidation before timing out. Defaults to 3 minute") | |
fs.DurationVar(&o.SinglenodeConsolidationTimeout, "single-node-consolidation-timeout", env.WithDefaultDuration("SINGLE_NODE_CONSOLIDATION_TIMEOUT", 3*time.Minute), "The maximum amount of time that can be spent doing single node consolidation before timing out. Defaults to 3 minutes") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to remove those all together since it's redundant if you pass the --help
flag:
-multi-node-consolidation-timeout duration
The maximum amount of time that can be spent doing multinode consolidation before timing out. Defaults to 1 minute (default 1m0s)
-single-node-consolidation-timeout duration
The maximum amount of time that can be spent doing single node consolidation before timing out. Defaults to 3 minute (default 3m0s)
pkg/operator/options/options.go
Outdated
MultinodeConsolidationTimeout time.Duration | ||
SinglenodeConsolidationTimeout time.Duration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we rename these to MultiNodeConsolidationTimeoutDuration
(capitalised N
for Node
) and SingleNodeConsolidationTimeout
? This would be similar to the previous const and keep consistency with the MultiNodeConsolidation
struct.
Pull Request Test Coverage Report for Build 11371495375Details
💛 - Coveralls |
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main problem I have with this is that it locks in API that looks at our implementation. If we change the implementation in the future, it makes it tougher to do so. I could see this being an "alpha" feature, but I also wonder if there's a way to make this apply to both types without specifically naming them?
That's fair. I'm not tied to any implementation here, and having one variable + config is certainly easier to manage then multiple. I'll be at KubeCon next week, so it'll be a bit of time before I can pick this up again |
This PR has been inactive for 14 days. StaleBot will close this stale PR after 14 more days of inactivity. |
Adds two options for the following timeouts
These are exposed on the following ways:
--multi-node-consolidation-timeout
orMULTI_NODE_CONSOLIDATION_TIMEOUT
--single-node-consolidation-timeout
orSINGLE_NODE_CONSOLIDATION_TIMEOUT
The primary way of testing this was by building the image and running within dev and production clusters within Grafana Labs fleet.
Fixes #N/A
Description
How was this change tested?
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.