From 7218f12f88a76fa9c32300da922c3b17acff0b5e Mon Sep 17 00:00:00 2001 From: David Eliahu Date: Tue, 6 Jul 2021 15:35:00 -0700 Subject: [PATCH] Add docs for scale-to-zero (cherry picked from commit 296d0b93863d99727b0be9b65cb4830f2a47d7ba) --- docs/workloads/async/async.md | 2 +- docs/workloads/async/autoscaling.md | 4 ++-- docs/workloads/realtime/autoscaling.md | 4 ++-- docs/workloads/realtime/realtime.md | 1 + 4 files changed, 6 insertions(+), 5 deletions(-) diff --git a/docs/workloads/async/async.md b/docs/workloads/async/async.md index 45a32c7d8d..ee06996644 100644 --- a/docs/workloads/async/async.md +++ b/docs/workloads/async/async.md @@ -10,7 +10,7 @@ Async APIs are a good fit for users who want to submit longer workloads (such as * retrieve status and response via HTTP endpoint * autoscale based on queue length * avoid cold starts -* scale to 0 +* scale to zero * perform rolling updates * automatically recover from failures and spot instance termination diff --git a/docs/workloads/async/autoscaling.md b/docs/workloads/async/autoscaling.md index 3ac46beb1c..a747c0ecab 100644 --- a/docs/workloads/async/autoscaling.md +++ b/docs/workloads/async/autoscaling.md @@ -6,11 +6,11 @@ Cortex auto-scales AsyncAPIs on a per-API basis based on your configuration. ### Autoscaling configuration -**`min_replicas`**: The lower bound on how many replicas can be running for an API. +**`min_replicas`** (default: 1): The lower bound on how many replicas can be running for an API. Scale-to-zero is supported.
-**`max_replicas`**: The upper bound on how many replicas can be running for an API. +**`max_replicas`** (default: 100): The upper bound on how many replicas can be running for an API.
diff --git a/docs/workloads/realtime/autoscaling.md b/docs/workloads/realtime/autoscaling.md index 089e3581c4..ead284984a 100644 --- a/docs/workloads/realtime/autoscaling.md +++ b/docs/workloads/realtime/autoscaling.md @@ -18,11 +18,11 @@ In addition to the autoscaling configuration options (described below), there ar ### Autoscaling configuration -**`min_replicas`**: The lower bound on how many replicas can be running for an API. +**`min_replicas`** (default: 1): The lower bound on how many replicas can be running for an API. Scale-to-zero is supported (experimental).
-**`max_replicas`**: The upper bound on how many replicas can be running for an API. +**`max_replicas`** (default: 100): The upper bound on how many replicas can be running for an API.
diff --git a/docs/workloads/realtime/realtime.md b/docs/workloads/realtime/realtime.md index ac57c7cfd8..c11ae346ca 100644 --- a/docs/workloads/realtime/realtime.md +++ b/docs/workloads/realtime/realtime.md @@ -9,6 +9,7 @@ Realtime APIs are a good fit for users who want to run stateless containers as a * respond to requests synchronously * autoscale based on request volume * avoid cold starts +* scale to zero * perform rolling updates * automatically recover from failures and spot instance termination * perform A/B tests and canary deployments