This repository has been archived by the owner on May 3, 2022. It is now read-only.
Target strategy steps to groups of clusters ('vertical' rollout progression) #252
Labels
enhancement
New feature or request
(this is not a new issue, just creating a place to write down some of the discussions we've had)
Related to #244: whatever evolution of the API we implement for #244 should have this issue in mind.
Currently, Shipper exposes new releases on a 'horizontal' basis: a change in capacity or traffic applies all at once across all the clusters selected for the Release.
However, availability zone- or region-based rollout strategies give the user better control over the blast radius.
Here are some examples of rollout issues that Shipper is not currently well suited to protect against:
Performance degrades in the new release such that the given fleet of pods can no longer service the required traffic, and upstream services begin experiencing timeouts. This kind of issue typically only manifests late in a release, when the contender is a majority of the pods active. Today, that kind of issue would be a global outage. Ideally, we would roll out a release to a single availability zone first, which would give us the opportunity to detect that issue.
The new release has a bug which corrupts data, or otherwise changes some AZ-level state. In this instance, even a 10% exposure is enough to cause an outage.
Ideally, Shipper should extend the strategy language such that it allows users to specify groups of clusters for each step. This has some tricky things to figure out, though. For example, one thing I've never had a clear idea about: if step 0 and step 1 have totally disjoint cluster selections, does that mean that advancing from 0 to 1 resets the clusters from step 0 to baseline?
The text was updated successfully, but these errors were encountered: