You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
(This was captured during Summer 2023 All Hands Team Check-In)
Make a “fire drill” ticket for each component. Don’t work it alone! Find a partner, do the thing, document it. Link to the resolution documentation in the related alert if possible.
Fire drills should be structured for individual service outages which might be encountered for any given application maintained by the RDSS team. Perhaps the canonical list of applications should be referenced.
Services/Components which should have fire drills documented are the following:
Amazon Web Services outage
Amazon Web Services S3 Bucket deletions
Globus infrastructure failure
Globus resource/asset deletions
Sidekiq infrastructure failure
Redis infrastructure failure
PostgreSQL infrastructure failure
Ansible provisioning failure
Capistrano deployment failure
NGINX server failure
Load balancer (NGINX Plus) failure
Application host server failure
The text was updated successfully, but these errors were encountered:
jrgriffiniii
changed the title
Propose and Document Fire Drills for Each Service
Propose and Document Fire Drills for Infrastructure Services
Jun 15, 2023
(This was captured during Summer 2023 All Hands Team Check-In)
Fire drills should be structured for individual service outages which might be encountered for any given application maintained by the RDSS team. Perhaps the canonical list of applications should be referenced.
Services/Components which should have fire drills documented are the following:
The text was updated successfully, but these errors were encountered: