Skip to content
This repository has been archived by the owner on May 3, 2022. It is now read-only.

Target strategy steps to groups of clusters ('vertical' rollout progression) #252

Open
kanatohodets opened this issue Dec 30, 2019 · 1 comment
Labels
enhancement New feature or request

Comments

@kanatohodets
Copy link
Contributor

(this is not a new issue, just creating a place to write down some of the discussions we've had)

Related to #244: whatever evolution of the API we implement for #244 should have this issue in mind.

Currently, Shipper exposes new releases on a 'horizontal' basis: a change in capacity or traffic applies all at once across all the clusters selected for the Release.

However, availability zone- or region-based rollout strategies give the user better control over the blast radius.

Here are some examples of rollout issues that Shipper is not currently well suited to protect against:

  • Performance degrades in the new release such that the given fleet of pods can no longer service the required traffic, and upstream services begin experiencing timeouts. This kind of issue typically only manifests late in a release, when the contender is a majority of the pods active. Today, that kind of issue would be a global outage. Ideally, we would roll out a release to a single availability zone first, which would give us the opportunity to detect that issue.

  • The new release has a bug which corrupts data, or otherwise changes some AZ-level state. In this instance, even a 10% exposure is enough to cause an outage.

Ideally, Shipper should extend the strategy language such that it allows users to specify groups of clusters for each step. This has some tricky things to figure out, though. For example, one thing I've never had a clear idea about: if step 0 and step 1 have totally disjoint cluster selections, does that mean that advancing from 0 to 1 resets the clusters from step 0 to baseline?

@kanatohodets kanatohodets added the enhancement New feature or request label Dec 30, 2019
@osdrv
Copy link
Contributor

osdrv commented Jan 30, 2020

if step 0 and step 1 have totally disjoint cluster selections, does that mean that advancing from 0 to 1 resets the clusters from step 0 to baseline?

The semantics of a rollout strategy dictates progression when moving from a lower step upwards. Therefore, I'd prefer to see the strategy declaration progressively extending the cluster selection set. The DSL could be explicit about the extension intent, using lexems like: extendTo: [clusterA, clusterB].

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants