Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prod_deploy_ami_job fails transiently #403

Closed
robrap opened this issue Aug 17, 2023 · 1 comment
Closed

prod_deploy_ami_job fails transiently #403

robrap opened this issue Aug 17, 2023 · 1 comment
Labels
on-call pipeline-failure Related to deployment pipeline failures.

Comments

@robrap
Copy link
Contributor

robrap commented Aug 17, 2023

ELB never becomes healthy.

Example alert:
https://2u-internal.app.opsgenie.com/alert/detail/95f3c8d6-d4a4-4f14-9e86-cedbc56efff6-1692131223910/details

Example stage url:
https://gocd.tools.edx.org/go/pipelines/deploy_to_prod_or_rollback/2334/deploy_ami/1

Example errors:

INFO:tubular.ec2:Number of load balancers remaining with unhealthy instances: 1
INFO:tubular.asgard:Some ASGs are failing ELB health checks. Disabling traffic to all new ASGs.
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/tubular/asgard.py", line 892, in _red_black_deploy
    ec2.wait_for_healthy_elbs(elbs_to_monitor, ASGARD_ELB_HEALTH_TIMEOUT)
  File "/usr/local/lib/python3.8/dist-packages/tubular/ec2.py", line 603, in wait_for_healthy_elbs
    raise TimeoutException("The following ELBs never became healthy: {}".format(elbs_left))
tubular.exception.TimeoutException: The following ELBs never became healthy: {'prod-edx-edxapp-internal'}
ERROR:tubular.scripts.asgard_deploy:Error Deploying AMI: ami-04807cf4635f34492.
Traceback (most recent call last):
Message: Error performing red/black deploy - deploy was unsuccessful. enabled_asgs: {'prod-edx-EdxappServerASGroup-1AW7ASMFKGSXH': ['prod-edx-EdxappServerASGroup-1AW7ASMFKGSXH-v509'], 'prod-edx-studio': ['prod-edx-studio-v669'], 'prod-edx-WorkerServerASGroup-G78P41HZ1V6X': ['prod-edx-WorkerServerASGroup-G78P41HZ1V6X-v492', 'prod-edx-WorkerServerASGroup-G78P41HZ1V6X-v496']} - disabled_asgs: {'prod-edx-EdxappServerASGroup-1AW7ASMFKGSXH': ['prod-edx-EdxappServerASGroup-1AW7ASMFKGSXH-v510'], 'prod-edx-studio': ['prod-edx-studio-v670'], 'prod-edx-WorkerServerASGroup-G78P41HZ1V6X': ['prod-edx-WorkerServerASGroup-G78P41HZ1V6X-v497']}
  File "/usr/local/lib/python3.8/dist-packages/tubular/scripts/asgard_deploy.py", line 71, in deploy
    deploy_info = asgard.deploy(ami_id)
  File "/usr/local/lib/python3.8/dist-packages/tubular/asgard.py", line 778, in deploy
    raise BackendError("Error performing red/black deploy - deploy was unsuccessful. "
tubular.exception.BackendError: Error performing red/black deploy - deploy was unsuccessful. enabled_asgs: {'prod-edx-EdxappServerASGroup-1AW7ASMFKGSXH': ['prod-edx-EdxappServerASGroup-1AW7ASMFKGSXH-v509'], 'prod-edx-studio': ['prod-edx-studio-v669'], 'prod-edx-WorkerServerASGroup-G78P41HZ1V6X': ['prod-edx-WorkerServerASGroup-G78P41HZ1V6X-v492', 'prod-edx-WorkerServerASGroup-G78P41HZ1V6X-v496']} - disabled_asgs: {'prod-edx-EdxappServerASGroup-1AW7ASMFKGSXH': ['prod-edx-EdxappServerASGroup-1AW7ASMFKGSXH-v510'], 'prod-edx-studio': ['prod-edx-studio-v670'], 'prod-edx-WorkerServerASGroup-G78P41HZ1V6X': ['prod-edx-WorkerServerASGroup-G78P41HZ1V6X-v497']}
@robrap robrap added this to Arch-BOM Aug 17, 2023
@robrap robrap converted this from a draft issue Aug 17, 2023
@robrap robrap added the pipeline-failure Related to deployment pipeline failures. label Aug 21, 2023
@rgraber rgraber moved this to Prioritized in Arch-BOM Aug 24, 2023
@jristau1984 jristau1984 moved this from Prioritized to Backlog in Arch-BOM Jul 1, 2024
@jristau1984
Copy link

@robrap after almost a year, should this be closed now and a new ticket created if it becomes a problem again?

@robrap robrap closed this as completed Jul 2, 2024
@github-project-automation github-project-automation bot moved this from Backlog to Done in Arch-BOM Jul 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
on-call pipeline-failure Related to deployment pipeline failures.
Projects
Archived in project
Development

No branches or pull requests

2 participants