Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[develop] Add exp backoff in launch ec2 instances call on throttling #594

Merged
merged 3 commits into from
Nov 8, 2023

Conversation

lukeseawalker
Copy link
Contributor

@lukeseawalker lukeseawalker commented Nov 7, 2023

Description of changes

  • Add exp backoff in launch ec2 instances call on throttling.
    This is specially useful during all-or-nothing scaling, during all-in optimization call, to avoid quiting the all-in call and enter the job loop.
    The longer retry time requires to increases the orphaned_instance_timeout by 3 min, from 120 to 300 secs
  • Add error code and message on ClientError when launching instances, for RunInstances and CreateFleet API calls
  • Reword debug log message

Tests

  • manually tested on running cluster

References

n/a

Checklist

  • Make sure you are pointing to the right branch.
  • If you're creating a patch for a branch other than develop add the branch name as prefix in the PR title (e.g. [release-3.6]).
  • Check all commits' messages are clear, describing what and why vs how.
  • Make sure to have added unit tests or integration tests to cover the new/modified code.
  • Check if documentation is impacted by this change.

Please review the guidelines for contributing and Pull Request Instructions.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Signed-off-by: Luca Carrogu <[email protected]>
Copy link

codecov bot commented Nov 7, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (22cddc6) 90.05% compared to head (28606fd) 90.17%.
Report is 1 commits behind head on develop.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #594      +/-   ##
===========================================
+ Coverage    90.05%   90.17%   +0.11%     
===========================================
  Files           16       16              
  Lines         2706     2708       +2     
===========================================
+ Hits          2437     2442       +5     
+ Misses         269      266       -3     
Flag Coverage Δ
unittests 90.17% <100.00%> (+0.11%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
src/slurm_plugin/clustermgtd.py 92.44% <ø> (ø)
src/slurm_plugin/fleet_manager.py 94.95% <100.00%> (+1.43%) ⬆️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@lukeseawalker lukeseawalker marked this pull request as ready for review November 7, 2023 15:02
@lukeseawalker lukeseawalker requested review from a team as code owners November 7, 2023 15:02
@lukeseawalker lukeseawalker enabled auto-merge (rebase) November 7, 2023 15:02
@lukeseawalker lukeseawalker force-pushed the wip/nodeSharingJLS branch 2 times, most recently from 79916fc to d73d4ba Compare November 7, 2023 17:00
Add error code and message on ClientError when launching instances, for RunInstances and CreateFleet API calls

Signed-off-by: Luca Carrogu <[email protected]>
Add exp backoff in launch ec2 instances call on throttling.
This is specially useful during all-or-nothing scaling, during all-in optimization call, to avoid quiting the all-in call and enter the job loop.
The longer retry time requires to increases the orphaned_instance_timeout by 1 min, from 120 to 180 secs

Signed-off-by: Luca Carrogu <[email protected]>
@lukeseawalker lukeseawalker merged commit 322b376 into aws:develop Nov 8, 2023
12 checks passed
@lukeseawalker lukeseawalker changed the title [develop ] Add exp backoff in launch ec2 instances call on throttling [develop] Add exp backoff in launch ec2 instances call on throttling Nov 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants