Add common Retry strategy including jitter (#43)

* Extend common retry strategy + doc * Improve retry docs
svroonland · Sep 30, 2020 · d5474bd · d5474bd
1 parent 7cf5e1d
commit d5474bd
Show file tree

Hide file tree

Showing 5 changed files with 142 additions and 62 deletions.
diff --git a/docs/docs/docs/circuitbreaker.md b/docs/docs/docs/circuitbreaker.md
@@ -18,7 +18,7 @@ Make calls to an external system through the CircuitBreaker to safeguard that sy
 
 ## Usage example
 
-```scala mdoc
+```scala mdoc:silent
 import nl.vroste.rezilience.CircuitBreaker._
 import nl.vroste.rezilience._
 import zio._

diff --git a/docs/docs/docs/general_usage.md b/docs/docs/docs/general_usage.md
@@ -73,7 +73,7 @@ val result3: ZIO[Any, Throwable, Int] =
   result1.mapError(policyError => policyError.toException)
 ```
 
-Similar methods exist on `BulkheadError` and `PolicyError` (see [Bulkhead](./bulkhead) and [Combining Policies](./combining))
+Similar methods exist on `BulkheadError` and `PolicyError` (see [Bulkhead](../bulkhead) and [Combining Policies](../combining_policies))
 
 ## ZLayer integration
 You can apply `rezilience` policies at the level of an individual ZIO effect. But having to wrap all your calls in eg a rate limiter can clutter your code somewhat. When you are using the [ZIO module pattern](https://zio.dev/docs/howto/howto_use_layers) using `ZLayer`, it is also possible to integrate a `rezilience` policy with some service at the `ZLayer` level. In the spirit of aspect oriented programming, the code using your service will not be cluttered with the aspect of rate limiting.
@@ -96,4 +96,4 @@ val env: ZLayer[Clock, Nothing, Database] = (Clock.live ++ databaseLayer) >>> ad
 
 For policies where the result type has a different `E` you will need to map the error back to your own `E`. An option is to have something like a general `case class UnknownServiceError(e: Exception)` in your service error type, to which you can map the policy errors. If that is not possible for some reason, you can also define a new service type like `ResilientDatabase` where the error types are `PolicyError[E]`.
 
-See the [full example](rezilience/shared/src/test/scala/nl/vroste/rezilience/examples/ZLayerIntegrationExample.scala) for more.
+See the [full example](https://github.com/svroonland/rezilience/blob/master/rezilience/shared/src/test/scala/nl/vroste/rezilience/examples/ZLayerIntegrationExample.scala) for more.
diff --git a/docs/docs/docs/retry.md b/docs/docs/docs/retry.md
@@ -5,30 +5,72 @@ permalink: docs/retry/
 ---
 
 # Retry
-ZIO already has excellent built-in support for retrying effects on failures using a `Schedule`, there is not much this library can add.
 
-Two helper methods are made available:
+`Retry` is a policy that retries effects on failure
 
-* `Retry.exponentialBackoff`  
-  Exponential backoff with a maximum delay and an optional maximum number of recurs. When the maximum delay is reached, subsequent delays are the maximum. 
-
-* `Retry.whenCase`  
-  Accepts a partial function and a schedule and will apply the schedule only when the input matches partial function. This is useful to retry only on certain types of failures/exceptions
-
-For consistency with the other policies and to support combining policies, there is `Retry.make(schedule)`.
-
-## Usage
-
-```scala
+## Common retry strategy
+
+`Retry` implements a common-practice strategy for retrying:
+
+* The first retry is performed immediately. With transient failures this method gives the highest chance of fast success.
+* After that, Retry uses an exponential backoff capped to a maximum duration.
+* Some random jitter is added to prevent spikes of retries from many call sites applying the same retry strategy.
+* An optional maximum number of retries ensures that retrying does not continue forever.
+
+## Usage example
+
+```scala mdoc:silent
 import zio._
 import zio.duration._
+import zio.clock.Clock
+import zio.random.Random
 import nl.vroste.rezilience._
+
+val myEffect: ZIO[Any, Exception, Unit] = ZIO.unit
+
+val retry: ZManaged[Clock with Random, Nothing, Retry[Any]] = Retry.make(min = 1.second, max = 10.seconds)
+
+retry.use { retryPolicy => 
+  retryPolicy(myEffect)
+}
+```
+
+## Custom retry strategy
+ZIO already has excellent built-in support for retrying effects on failures using a `Schedule` and `rezilience` is built on top of that. Retry can accept any `ZIO` [`Schedule`](https://zio.dev/docs/datatypes/datatypes_schedule).
+
+Some Schedule building blocks are available in `Retry.Schedules`:
+
+* `Retry.Schedules.common(min: Duration, max: Duration, factor: Double, retryImmediately: Boolean, maxRetries: Option[Int])`  
+  The strategy with immediate retry, exponential backoff and jitter as outlined above.
+
+* `Retry.Schedules.exponentialBackoff(min: Duration, max: Duration, factor: Double = 2.0)`  
+  Exponential backoff with a maximum delay and an optional maximum number of recurs. When the maximum delay is reached, subsequent delays are the maximum. 
+
+* `Retry.Schedules.whenCase[Env, In, Out](pf: PartialFunction[In, Any])(schedule: Schedule[Env, In, Out])`  
+  Accepts a partial function and a schedule and will apply the schedule only when the input matches partial function. This is useful to retry only on certain types of failures/exceptions.
+
+## Different retry strategies for different errors
+
+By composing ZIO `Schedule`s, you can define different retries for different types of errors:
+
+```scala mdoc:silent
 import java.util.concurrent.TimeoutException
+import java.net.UnknownHostException
+
+val isTimeout: PartialFunction[Exception, Any] = {
+  case _ : TimeoutException => 
+}
 
-val myEffect: ZIO[Any, Exception, Unit] = ???
+val isUnknownHostException: PartialFunction[Exception, Any] = {
+  case _ : UnknownHostException => 
+}
 
-// Retry exponentially on timeout exceptions
-myEffect.retry(
-  Retry.Schedule.whenCase({ case TimeoutException => })(Retry.Schedule.exponentialBackoff(min = 1.second, max = 1.minute))
+val retry2 = Retry.make(
+  Retry.Schedules.whenCase(isTimeout) { Retry.Schedules.common(min = 1.second, max = 1.minute) } || 
+    Retry.Schedules.whenCase(isUnknownHostException) { Retry.Schedules.common(min = 1.day, max = 5.days) }
 )
+
+retry2.use { retryPolicy => 
+  retryPolicy(myEffect)
+}
 ```
diff --git a/rezilience/shared/src/main/scala/nl/vroste/rezilience/Retry.scala b/rezilience/shared/src/main/scala/nl/vroste/rezilience/Retry.scala
@@ -1,6 +1,7 @@
 package nl.vroste.rezilience
 import zio.clock.Clock
 import zio.duration._
+import zio.random.Random
 import zio.{ Schedule, ZIO, ZManaged }
 
 trait Retry[-E] { self =>
@@ -24,56 +25,36 @@ trait Retry[-E] { self =>
 }
 
 object Retry {
-  object Schedule {
-
-    /**
-     * Schedule for exponential backoff up to a maximum interval
-     *
-     * @param min Minimum backoff time
-     * @param max Maximum backoff time. When this value is reached, subsequent intervals will be equal to this value.
-     * @param factor Exponential factor. 2 means doubling, 1 is constant, < 1 means decreasing
-     * @tparam E Schedule input
-     */
-    def exponentialBackoff[E](
-      min: Duration,
-      max: Duration,
-      factor: Double = 2.0
-    ): Schedule[Any, E, Duration] =
-      zio.Schedule.exponential(min, factor).whileOutput(_ <= max) andThen zio.Schedule.fixed(max).as(max)
-
-    /**
-     * Apply the given schedule only when inputs match the partial function
-     */
-    def whenCase[Env, In, Out](pf: PartialFunction[In, Any])(
-      schedule: Schedule[Env, In, Out]
-    ): Schedule[Env, In, (In, Out)] =
-      zio.Schedule.recurWhile(pf.isDefinedAt) && schedule
-  }
-
-  /**
-   * Create a Retry from a ZIO Schedule
-   * @param schedule
-   * @tparam R
-   * @tparam E
-   * @return
-   */
-  def make[R, E](schedule: Schedule[R, E, Any]): ZManaged[Clock with R, Nothing, Retry[E]] =
-    ZManaged.environment[Clock with R].map(RetryImpl(_, schedule))
 
   /**
-   * Create a Retry policy with exponential backoff
+   * Create a Retry policy with a common retry schedule
+   *
+   * By default the first retry is done immediately. With transient / random failures this method gives the
+   * highest chance of fast success.
+   * After that Retry uses exponential backoff between some minimum and maximum duration. Jitter is added
+   * to prevent spikes of retries.
+   * An optional maximum number of retries ensures that retrying does not continue forever.
    *
    * @param min Minimum retry backoff delay
-   * @param max Maximum retry backoff delay
+   * @param max Maximum backoff time. When this value is reached, subsequent intervals will be equal to this value.
    * @param factor Factor with which delays increase
-   * @return
+   * @param retryImmediately Retry immediately after the first failure
+   * @param maxRetries Maximum number of retries
    */
   def make(
     min: Duration = 1.second,
     max: Duration = 1.minute,
-    factor: Double = 2.0
-  ): ZManaged[Clock, Nothing, Retry[Any]] =
-    ZManaged.environment[Clock].map(RetryImpl(_, Schedule.exponentialBackoff(min, max, factor)))
+    factor: Double = 2.0,
+    retryImmediately: Boolean = true,
+    maxRetries: Option[Int] = Some(3)
+  ): ZManaged[Clock with Random, Nothing, Retry[Any]] =
+    make(Schedules.common(min, max, factor, retryImmediately, maxRetries))
+
+  /**
+   * Create a Retry from a ZIO Schedule
+   */
+  def make[R, E](schedule: Schedule[R, E, Any]): ZManaged[Clock with R, Nothing, Retry[E]] =
+    ZManaged.environment[Clock with R].map(RetryImpl(_, schedule))
 
   private case class RetryImpl[-E, ScheduleEnv](
     scheduleEnv: Clock with ScheduleEnv,
@@ -89,4 +70,61 @@ object Retry {
       }
     )
   }
+
+  /**
+   * Convenience methods to create common ZIO schedules for retrying
+   */
+  object Schedules {
+
+    /**
+     * A common-practice schedule for retrying
+     *
+     * By default the first retry is done immediately. With transient / random failures this method gives the
+     * highest chance of fast success.
+     * After that Retry uses exponential backoff between some minimum and maximum duration. Jitter is added
+     * to prevent spikes of retries.
+     * An optional maximum number of retries ensures that retrying does not continue forever.
+     *
+     * See also https://aws.amazon.com/blogs/architecture/exponential-backoff-and-jitter/
+     *
+     * @param min Minimum retry backoff delay
+     * @param max Maximum backoff time. When this value is reached, subsequent intervals will be equal to this value.
+     * @param factor Factor with which delays increase
+     * @param retryImmediately Retry immediately after the first failure
+     * @param maxRetries Maximum number of retries
+     */
+    def common(
+      min: Duration = 1.second,
+      max: Duration = 1.minute,
+      factor: Double = 2.0,
+      retryImmediately: Boolean = true,
+      maxRetries: Option[Int] = Some(3)
+    ): Schedule[Any with Random, Any, (Any, Long)] =
+      ((if (retryImmediately) zio.Schedule.once else zio.Schedule.stop) andThen
+        exponentialBackoff(min, max, factor).jittered) &&
+        maxRetries.fold(zio.Schedule.forever)(zio.Schedule.recurs)
+
+    /**
+     * Schedule for exponential backoff up to a maximum interval
+     *
+     * @param min Minimum backoff time
+     * @param max Maximum backoff time. When this value is reached, subsequent intervals will be equal to this value.
+     * @param factor Exponential factor. 2 means doubling, 1 is constant, < 1 means decreasing
+     * @tparam E Schedule input
+     */
+    def exponentialBackoff[E](
+      min: Duration,
+      max: Duration,
+      factor: Double = 2.0
+    ): Schedule[Any, E, Duration] =
+      zio.Schedule.exponential(min, factor).whileOutput(_ <= max) andThen zio.Schedule.fixed(max).as(max)
+
+    /**
+     * Apply the given schedule only when inputs match the partial function
+     */
+    def whenCase[Env, In, Out](pf: PartialFunction[In, Any])(
+      schedule: Schedule[Env, In, Out]
+    ): Schedule[Env, In, (In, Out)] =
+      zio.Schedule.recurWhile(pf.isDefinedAt) && schedule
+  }
 }
diff --git a/rezilience/shared/src/test/scala/nl/vroste/rezilience/RetrySpec.scala b/rezilience/shared/src/test/scala/nl/vroste/rezilience/RetrySpec.scala
@@ -11,7 +11,7 @@ object RetrySpec extends DefaultRunnableSpec {
   override def spec = suite("Retry")(
     testM("widen should not retry unmatched errors") {
       Retry
-        .make(Retry.Schedule.exponentialBackoff(1.second, 2.seconds))
+        .make(Retry.Schedules.exponentialBackoff(1.second, 2.seconds))
         .map(_.widen(Policy.unwrap[Throwable]))
         .use { retry =>
           for {