-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid Caffeine exceptions on missing values #56
Avoid Caffeine exceptions on missing values #56
Conversation
We want a way for our `Loading` object to indicate the value for a key is `Missing` (eg we are trying to load an S3 key that does not exist) without having to throw an exception - as Caffeine will noisily log almost every exception it sees, even if the client does not mind missing values, as is the case with the Facia Scala Client. We've introduced the `MissingOrETagged` sealed trait to allow us to represent that a fetch result may be either missing or found, returned with ETagged data. This is an alternative approach to the hacky solution in #32 . Co-authored-by: Jamie B <[email protected]>
@rtyley has published a preview version of this PR with release workflow run #108, based on commit cbc4304: 4.0.0-PREVIEW.suppress-caffeine-exceptions-by-allowing-fetch-results-to-indicate-missing-keys.2024-07-09T1701.cbc4304a Want to make another preview release?Click 'Run workflow' in the GitHub UI, specifying the suppress-caffeine-exceptions-by-allowing-fetch-results-to-indicate-missing-keys branch, or use the GitHub CLI command: gh workflow run release.yml --ref suppress-caffeine-exceptions-by-allowing-fetch-results-to-indicate-missing-keys Want to make a full release after this PR is merged?Click 'Run workflow' in the GitHub UI, leaving the branch as the default, or use the GitHub CLI command: gh workflow run release.yml |
Missing
valuesThe `etag-caching` library has had a small change to its API with guardian/etag-caching#56, so we need make a small code change to adapt to it. # Background ETag Caching was originally introduced to Frontend with #26338.
def get(key: K): Future[V] = read(key).map(_.result) | ||
|
||
def get(key: K): Future[Option[V]] = read(key).map(_.toOption) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only external API change in this PR.
Internally, you can see we are using the toOption
field on the new MissingOrETagged
type - we'll return None
if the value is Missing
. If we have a populated ETaggedData
, we just return the value, because the end user won't care about about the ETag.
def toOption: Option[T] | ||
} | ||
|
||
case object Missing extends MissingOrETagged[Nothing] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just like the None
singleton that extends Option[Nothing]
, we take advantage here that in Scala Nothing
extends all types, and so this Missing
singleton is a valid value for any MissingOrETagged[+T]
.
The `etag-caching` library has had a small change to its API with guardian/etag-caching#56, so we need make a small code change to adapt to it. ETag Caching was originally introduced to Frontend with #26338.
This change adds these improvements: * Facia data is only re-downloaded & re-parsed if the S3 content has _changed_ (thanks to [ETag-caching](https://github.com/guardian/etag-caching)) * Independence from AWS SDK version (v1 vs v2) _(this PR can replace #286 The [ETag-caching](https://github.com/guardian/etag-caching) library itself is also being used in DotCom PROD, introduced with guardian/frontend#26338. ### Usage ```scala import com.gu.etagcaching.aws.sdkv2.s3.S3ObjectFetching import com.gu.facia.client.{ApiClient, Environment} import software.amazon.awssdk.services.s3.S3AsyncClient val s3AsyncClient = S3AsyncClient.builder().region(...).credentialsProvider(...).build() val apiClient = ApiClient.withCaching( "facia-tool-store", Environment.Prod, S3ObjectFetching.byteArraysWith(s3AsyncClient) ) ``` _PR using this updated version of the FAPI client: https://github.com/guardian/ophan/pull/6741_ ### Independence from AWS SDK version (v1 vs v2) Ideally, the whole of `facia-scala-client` would be independent of AWS SDK version - we'd _like_ consumers of this library to be able to use whatever AWS SDK version they want, without us pulling in dependency on either SDK version. For `facia-scala-client` this is an attainable goal, as the only AWS API action it performs is fetching from S3, and [guardian/etag-caching](https://github.com/guardian/etag-caching) provides the [`S3ByteArrayFetching`](https://github.com/guardian/etag-caching/blob/v6.0.0/aws-s3/base/src/main/scala/com/gu/etagcaching/aws/s3/package.scala#L6-L22) abstraction that encapsulates this action without tying to a specific AWS SDK version. Due to legacy code compatibility, we can't completely remove AWS SDK v1 from `fapi-client` for now, but we _have_ removed it from `fapi-client-core`, which is the new home of `com.gu.facia.client.ApiClient`, which is now a trait, with 2 constructor methods that provide different implementations: * **`ApiClient()`** - legacy, using the existing `com.gu.facia.client.S3Client` abstraction on S3 behaviour * **`ApiClient.withCaching()`** - provides ETag-based caching and is independent of AWS SDK version - the consumer just needs to provide an appropriate instance of `com.gu.etagcaching.aws.sdkv2.s3.S3ObjectFetching` (ie with `"com.gu.etag-caching" %% "aws-s3-sdk-v2"` and `S3ObjectFetching.byteArraysWith(s3AsyncClient)`, introduced with guardian/etag-caching#65) ### Solved problems * **Noisy logging associated with absent collection JSON** - the `etag-caching` library has been updated with guardian/etag-caching#56 to avoid excessive logging that would occur in the Facia client, due to it typically trying to access collections that aren't yet persisted: #32.
This change adds these improvements: * Facia data is only re-downloaded & re-parsed if the S3 content has _changed_ (thanks to [ETag-caching](https://github.com/guardian/etag-caching)) * Independence from AWS SDK version (v1 vs v2) _(this PR can replace #286 The [ETag-caching](https://github.com/guardian/etag-caching) library itself is also being used in DotCom PROD, introduced with guardian/frontend#26338. ### Usage ```scala import com.gu.etagcaching.aws.sdkv2.s3.S3ObjectFetching import com.gu.facia.client.{ApiClient, Environment} import software.amazon.awssdk.services.s3.S3AsyncClient val s3AsyncClient = S3AsyncClient.builder().region(...).credentialsProvider(...).build() val apiClient = ApiClient.withCaching( "facia-tool-store", Environment.Prod, S3ObjectFetching.byteArraysWith(s3AsyncClient) ) ``` _PR using this updated version of the FAPI client: https://github.com/guardian/ophan/pull/6741_ ### Independence from AWS SDK version (v1 vs v2) Ideally, the whole of `facia-scala-client` would be independent of AWS SDK version - we'd _like_ consumers of this library to be able to use whatever AWS SDK version they want, without us pulling in dependency on either SDK version. For `facia-scala-client` this is an attainable goal, as the only AWS API action it performs is fetching from S3, and [guardian/etag-caching](https://github.com/guardian/etag-caching) provides the [`S3ByteArrayFetching`](https://github.com/guardian/etag-caching/blob/v6.0.0/aws-s3/base/src/main/scala/com/gu/etagcaching/aws/s3/package.scala#L6-L22) abstraction that encapsulates this action without tying to a specific AWS SDK version. Due to legacy code compatibility, we can't completely remove AWS SDK v1 from `fapi-client` for now, but we _have_ removed it from `fapi-client-core`, which is the new home of `com.gu.facia.client.ApiClient`, which is now a trait, with 2 constructor methods that provide different implementations: * **`ApiClient()`** - legacy, using the existing `com.gu.facia.client.S3Client` abstraction on S3 behaviour * **`ApiClient.withCaching()`** - provides ETag-based caching and is independent of AWS SDK version - the consumer just needs to provide an appropriate instance of `com.gu.etagcaching.aws.sdkv2.s3.S3ObjectFetching` (ie with `"com.gu.etag-caching" %% "aws-s3-sdk-v2"` and `S3ObjectFetching.byteArraysWith(s3AsyncClient)`, introduced with guardian/etag-caching#65) ### Solved problems * **Noisy logging associated with absent collection JSON** - the `etag-caching` library has been updated with guardian/etag-caching#56 to avoid excessive logging that would occur in the Facia client, due to it typically trying to access collections that aren't yet persisted: #32.
Why?
We want to use the
etag-caching
library with Facia Scala Client (see guardian/facia-scala-client#287) so that Ophan can do more frequent querying of the Fronts API while reducing CPU & network consumption.However, the Facia Scala Client, in normal use, will frequently make requests for Collection JSON entries that don't exist in S3. Code already exists in Facia Scala Client to catch & suppress the exceptions when entires don't exist, but as we now want to now put fetching behind a Caffeine cache (ie with
etag-caching
) we run into the problem that any exception during loading will be logged by Caffeine and in our case this makes excessive logging noise.To solve this problem, we now introduce a way for our
Loading
object to indicate the value for a key isMissing
(eg we are trying to load an S3 key that does not exist) without having to throw an exception. This is an alternative approach to the hacky solution I worked on in #32 !Change to external API of
ETagCache
Before
After
The only code using the
etag-caching
library at the moment is Frontend, we have a small PR to update Frontend to cope with the API change:etag-caching
lib with small API change for missing values frontend#27303Internal changes
The Caffeine cache is now storing values which are the new
MissingOrETagged
type, rather than justETaggedData
.We've introduced the
MissingOrETagged
sealed trait to allow us to represent that a fetch result may be eitherMissing
or found, returned withETaggedData
.See also: