Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draining CC-api VMs should let local-worker jobs finish #496

Open
wants to merge 3 commits into
base: develop
Choose a base branch
from

Conversation

kathap
Copy link
Contributor

@kathap kathap commented Dec 11, 2024

A short explanation of the proposed change:

In case of a graceful shutdown of a CC api VM, local-worker will wait before shutdown if there are still jobs running on the local-worker queue. The default grace period is set to 5 minutes, it is configurable by setting a value for: cc.jobs.local.local_worker_grace_period_seconds.

The order of the shutdown is now:

  • shutdown nginx
  • shutdown cloud_controller
  • shutdown local_worker

Before the order was shutdown local_worker, then nginx, then cloud_controller.

An explanation of the use cases your change solves:
It seems that a graceful shutdown of a CC API VM (e.g., during an update) does not properly account for draining the worker jobs on the API VM that handle file uploads.
When the CC API VM is restarted or recreated while a local worker on the API VM is processing an upload job—transferring files from disk to the blobstore—the package status gets stuck in PROCESSING_UPLOAD. The upload job seems to have the standard timeout of 4h configured - which leads to hanging deployments that are stopped finally by client side timeouts.
With the proposed change the local-worker will wait for 5 minutes if there are still jobs running, until shutdown is performed. That will give the upload job more time to finish.

  • Links to any other associated PRs

  • I have viewed signed and have submitted the Contributor License Agreement

  • I have made this pull request to the develop branch

  • [] I have run CF Acceptance Tests on bosh lite

@kathap kathap marked this pull request as draft December 11, 2024 15:45
@kathap kathap marked this pull request as ready for review December 16, 2024 10:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant