Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid Caffeine exceptions on missing values #56

Conversation

rtyley
Copy link
Member

@rtyley rtyley commented Jul 9, 2024

Why?

We want to use the etag-caching library with Facia Scala Client (see guardian/facia-scala-client#287) so that Ophan can do more frequent querying of the Fronts API while reducing CPU & network consumption.

However, the Facia Scala Client, in normal use, will frequently make requests for Collection JSON entries that don't exist in S3. Code already exists in Facia Scala Client to catch & suppress the exceptions when entires don't exist, but as we now want to now put fetching behind a Caffeine cache (ie with etag-caching) we run into the problem that any exception during loading will be logged by Caffeine and in our case this makes excessive logging noise.

To solve this problem, we now introduce a way for our Loading object to indicate the value for a key is Missing (eg we are trying to load an S3 key that does not exist) without having to throw an exception. This is an alternative approach to the hacky solution I worked on in #32 !

Change to external API of ETagCache

Before

def get(key: K): Future[V]

After

def get(key: K): Future[Option[V]]

The only code using the etag-caching library at the moment is Frontend, we have a small PR to update Frontend to cope with the API change:

Internal changes

The Caffeine cache is now storing values which are the new MissingOrETagged type, rather than just ETaggedData.

We've introduced the MissingOrETagged sealed trait to allow us to represent that a fetch result may be either Missing or found, returned with ETaggedData.

See also:

We want a way for our
`Loading` object to indicate the value for a key is `Missing` (eg we are
trying to load an S3 key that does not exist) without having to throw an
exception - as Caffeine will noisily log almost every exception it sees,
even if the client does not mind missing values, as is the case with
the Facia Scala Client.

We've introduced the `MissingOrETagged` sealed trait to allow us to
represent that a fetch result may be either missing or found, returned
with ETagged data.

This is an alternative approach to the hacky solution in
#32 .

Co-authored-by: Jamie B <[email protected]>
@gu-scala-library-release
Copy link
Contributor

@rtyley has published a preview version of this PR with release workflow run #108, based on commit cbc4304:

4.0.0-PREVIEW.suppress-caffeine-exceptions-by-allowing-fetch-results-to-indicate-missing-keys.2024-07-09T1701.cbc4304a

Want to make another preview release?

Click 'Run workflow' in the GitHub UI, specifying the suppress-caffeine-exceptions-by-allowing-fetch-results-to-indicate-missing-keys branch, or use the GitHub CLI command:

gh workflow run release.yml --ref suppress-caffeine-exceptions-by-allowing-fetch-results-to-indicate-missing-keys

Want to make a full release after this PR is merged?

Click 'Run workflow' in the GitHub UI, leaving the branch as the default, or use the GitHub CLI command:

gh workflow run release.yml

@rtyley rtyley changed the title Suppress Caffeine exceptions: model Missing values Avoid Caffeine exceptions on missing values Jul 11, 2024
rtyley added a commit to guardian/frontend that referenced this pull request Jul 11, 2024
The `etag-caching` library has had a small change to its API with
guardian/etag-caching#56, so we need make a small
code change to adapt to it.

# Background

ETag Caching was originally introduced to Frontend with #26338.
Comment on lines -44 to +48
def get(key: K): Future[V] = read(key).map(_.result)

def get(key: K): Future[Option[V]] = read(key).map(_.toOption)
Copy link
Member Author

@rtyley rtyley Jul 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only external API change in this PR.

Internally, you can see we are using the toOption field on the new MissingOrETagged type - we'll return None if the value is Missing. If we have a populated ETaggedData, we just return the value, because the end user won't care about about the ETag.

def toOption: Option[T]
}

case object Missing extends MissingOrETagged[Nothing] {
Copy link
Member Author

@rtyley rtyley Jul 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just like the None singleton that extends Option[Nothing], we take advantage here that in Scala Nothing extends all types, and so this Missing singleton is a valid value for any MissingOrETagged[+T].

image

@rtyley rtyley requested a review from jonflynng July 11, 2024 12:31
@rtyley rtyley merged commit 65040d9 into main Jul 23, 2024
8 checks passed
@rtyley rtyley deleted the suppress-caffeine-exceptions-by-allowing-fetch-results-to-indicate-missing-keys branch July 23, 2024 13:35
rtyley added a commit to guardian/frontend that referenced this pull request Jul 23, 2024
The `etag-caching` library has had a small change to its API with
guardian/etag-caching#56, so we need make a small
code change to adapt to it.

ETag Caching was originally introduced to Frontend with #26338.
rtyley added a commit to guardian/facia-scala-client that referenced this pull request Jan 8, 2025
This change adds these improvements:

* Facia data is only re-downloaded & re-parsed if the S3 content has _changed_ (thanks to [ETag-caching](https://github.com/guardian/etag-caching))
* Independence from AWS SDK version (v1 vs v2) _(this PR can replace #286

The [ETag-caching](https://github.com/guardian/etag-caching) library itself is also being used in DotCom PROD, introduced with guardian/frontend#26338.

### Usage

```scala
import com.gu.etagcaching.aws.sdkv2.s3.S3ObjectFetching
import com.gu.facia.client.{ApiClient, Environment}
import software.amazon.awssdk.services.s3.S3AsyncClient

val s3AsyncClient = S3AsyncClient.builder().region(...).credentialsProvider(...).build()

val apiClient = ApiClient.withCaching(
  "facia-tool-store",
  Environment.Prod,
  S3ObjectFetching.byteArraysWith(s3AsyncClient)
)
```

_PR using this updated version of the FAPI client: https://github.com/guardian/ophan/pull/6741_

### Independence from AWS SDK version (v1 vs v2)

Ideally, the whole of `facia-scala-client` would be independent of AWS SDK version - we'd _like_ consumers of this library to be able to use whatever AWS SDK version they want, without us pulling in dependency on either SDK version.

For `facia-scala-client` this is an attainable goal, as the only AWS API action it performs is fetching from S3, and [guardian/etag-caching](https://github.com/guardian/etag-caching) provides the [`S3ByteArrayFetching`](https://github.com/guardian/etag-caching/blob/v6.0.0/aws-s3/base/src/main/scala/com/gu/etagcaching/aws/s3/package.scala#L6-L22) abstraction that encapsulates this action without tying to a specific AWS SDK version.

Due to legacy code compatibility, we can't completely remove AWS SDK v1 from `fapi-client` for now, but we _have_ removed it from `fapi-client-core`, which is the new home of `com.gu.facia.client.ApiClient`, which is now a trait, with 2 constructor methods that provide different implementations:

* **`ApiClient()`**  - legacy, using the existing `com.gu.facia.client.S3Client` abstraction on S3 behaviour
* **`ApiClient.withCaching()`** - provides ETag-based caching and is independent of AWS SDK version - the consumer just needs to provide an appropriate instance of `com.gu.etagcaching.aws.sdkv2.s3.S3ObjectFetching` (ie with `"com.gu.etag-caching" %% "aws-s3-sdk-v2"` and `S3ObjectFetching.byteArraysWith(s3AsyncClient)`, introduced with guardian/etag-caching#65)

### Solved problems

* **Noisy logging associated with absent collection JSON** - the `etag-caching` library has been updated with guardian/etag-caching#56 to avoid excessive logging that would occur in the Facia client, due to it typically trying to access collections that aren't yet persisted: #32.
rtyley added a commit to guardian/facia-scala-client that referenced this pull request Jan 8, 2025
This change adds these improvements:

* Facia data is only re-downloaded & re-parsed if the S3 content has _changed_ (thanks to [ETag-caching](https://github.com/guardian/etag-caching))
* Independence from AWS SDK version (v1 vs v2) _(this PR can replace #286

The [ETag-caching](https://github.com/guardian/etag-caching) library itself is also being used in DotCom PROD, introduced with guardian/frontend#26338.

### Usage

```scala
import com.gu.etagcaching.aws.sdkv2.s3.S3ObjectFetching
import com.gu.facia.client.{ApiClient, Environment}
import software.amazon.awssdk.services.s3.S3AsyncClient

val s3AsyncClient = S3AsyncClient.builder().region(...).credentialsProvider(...).build()

val apiClient = ApiClient.withCaching(
  "facia-tool-store",
  Environment.Prod,
  S3ObjectFetching.byteArraysWith(s3AsyncClient)
)
```

_PR using this updated version of the FAPI client: https://github.com/guardian/ophan/pull/6741_

### Independence from AWS SDK version (v1 vs v2)

Ideally, the whole of `facia-scala-client` would be independent of AWS SDK version - we'd _like_ consumers of this library to be able to use whatever AWS SDK version they want, without us pulling in dependency on either SDK version.

For `facia-scala-client` this is an attainable goal, as the only AWS API action it performs is fetching from S3, and [guardian/etag-caching](https://github.com/guardian/etag-caching) provides the [`S3ByteArrayFetching`](https://github.com/guardian/etag-caching/blob/v6.0.0/aws-s3/base/src/main/scala/com/gu/etagcaching/aws/s3/package.scala#L6-L22) abstraction that encapsulates this action without tying to a specific AWS SDK version.

Due to legacy code compatibility, we can't completely remove AWS SDK v1 from `fapi-client` for now, but we _have_ removed it from `fapi-client-core`, which is the new home of `com.gu.facia.client.ApiClient`, which is now a trait, with 2 constructor methods that provide different implementations:

* **`ApiClient()`**  - legacy, using the existing `com.gu.facia.client.S3Client` abstraction on S3 behaviour
* **`ApiClient.withCaching()`** - provides ETag-based caching and is independent of AWS SDK version - the consumer just needs to provide an appropriate instance of `com.gu.etagcaching.aws.sdkv2.s3.S3ObjectFetching` (ie with `"com.gu.etag-caching" %% "aws-s3-sdk-v2"` and `S3ObjectFetching.byteArraysWith(s3AsyncClient)`, introduced with guardian/etag-caching#65)

### Solved problems

* **Noisy logging associated with absent collection JSON** - the `etag-caching` library has been updated with guardian/etag-caching#56 to avoid excessive logging that would occur in the Facia client, due to it typically trying to access collections that aren't yet persisted: #32.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant