Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ETag-caching and AWS SDK v2 support #287

Merged
merged 1 commit into from
Jan 8, 2025
Merged

Conversation

rtyley
Copy link
Member

@rtyley rtyley commented Aug 4, 2023

This change adds these improvements:

  • Facia data is only re-downloaded & re-parsed if the S3 content has changed (thanks to ETag-caching)
  • Independence from AWS SDK version (v1 vs v2) (this PR can replace Update to AWS SDK v2 #286)

The ETag-caching library itself is also being used in DotCom PROD, introduced with guardian/frontend#26338.

Usage

import com.gu.facia.client.{ApiClient, Environment}
import com.gu.etagcaching.aws.sdkv2.s3.S3ObjectFetching
import software.amazon.awssdk.services.s3.S3AsyncClient

val s3AsyncClient = S3AsyncClient.builder().region(...).credentialsProvider(...).build()

val apiClient = ApiClient.withCaching(
  "facia-tool-store",
  Environment.Prod,
  S3ObjectFetching.byteArraysWith(s3AsyncClient)
)

PR using this updated version of the FAPI client: https://github.com/guardian/ophan/pull/6741

Independence from AWS SDK version (v1 vs v2)

Ideally, the whole of facia-scala-client would be independent of AWS SDK version - we'd like consumers of this library to be able to use whatever AWS SDK version they want, without us pulling in dependency on either SDK version.

For facia-scala-client this is an attainable goal, as the only AWS API action it performs is fetching from S3, and guardian/etag-caching provides the S3ByteArrayFetching abstraction that encapsulates this action without tying to a specific AWS SDK version.

Due to legacy code compatibility, we can't completely remove AWS SDK v1 from fapi-client for now, but we have removed it from fapi-client-core, which is the new home of com.gu.facia.client.ApiClient, which is now a trait, with 2 constructor methods that provide different implementations:

  • ApiClient() - legacy, using the existing com.gu.facia.client.S3Client abstraction on S3 behaviour
  • ApiClient.withCaching() - provides ETag-based caching and is independent of AWS SDK version - the consumer just needs to provide an appropriate instance of com.gu.etagcaching.aws.sdkv2.s3.S3ObjectFetching (ie with "com.gu.etag-caching" %% "aws-s3-sdk-v2" and S3ObjectFetching.byteArraysWith(s3AsyncClient), introduced with Improve API discoverability/clarity for devs with S3ByteArrayFetching etag-caching#65)

Solved problems

@rtyley rtyley force-pushed the add-etag-caching-support branch 11 times, most recently from 49f9a7f to 9b755d7 Compare August 10, 2023 11:54
@rtyley rtyley changed the title WIP: Add ETag caching support Add ETag-caching (and AWS SDK v2) support Aug 10, 2023
Comment on lines -13 to -15
trait S3Client {
def get(bucket: String, path: String): Future[FaciaResult]
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note the com.gu.facia.client.S3Client trait still exists, it's just moved to the fapi-client-core artifact - couldn't delete it without badly breaking binary compatability for existing users!

@rtyley rtyley force-pushed the add-etag-caching-support branch from 9b755d7 to 54a1fc5 Compare August 10, 2023 12:13
rtyley added a commit to guardian/etag-caching that referenced this pull request Sep 6, 2023
This is a hack, and it probably won't be able to fully conceal that fact.

The facia-scala-client, in normal use, is asked for collections that don't
yet have collection JSON, and may never have it (apparently, if a collection
has never been edited, its collection JSON won't exist). In the past, this
would be handled by generating a `FaciaNotFound` response for
`com.gu.facia.client.S3Client.get()` which `ApiClient` would cheerfully
_not_ throw an exception for:

https://github.com/guardian/facia-scala-client/blob/f9830fd286d409662be850ff441792ec03c1c2e2/facia-json/src/main/scala/com/gu/facia/client/ApiClient.scala#L25-L27

...so encountering the missing resource would not be logged as an exception,
**which is the desired behaviour**, see guardian/facia-scala-client#32

However, as my attempt (guardian/facia-scala-client#287)
to introduce ETag-caching & AWS SDK v2 support rejects the `com.gu.facia.client.S3Client`
trait, in favour of adapting `com.gu.facia.client.ApiClient`, I need to provide
that same exception-supressing behaviour.

The task is complicated because currently `ETagCache` holds a scaffeine cache with
**value** type `ETaggedData[V]` - there's no successful response for the cache loader
to return for keys that can denote 'resource missing' - any successful value must have
an ETag for one thing, as `ETaggedData` requires it, but when a resource is missing,
we have no ETag.

This commit tried to deal with the problem by replacing the `S3Exception` with a
`CancellationException`, which caffeine special-cases as exception to _not_ log:

ben-manes/caffeine@1e52b10

...however this gets even yuckier than we might have thought, because it means that
`facia-scala-client`, as the consumer of `etag-caching`, has to know that it must
now try to catch `CancellationException`, rather than `S3Exception`, when it is
trying to do it's _own_ suppression of exception-logging.
@rtyley
Copy link
Member Author

rtyley commented Sep 13, 2023

As discussed at standup, I'm going to do a release of facia-scala-client as it currently is (v4.0.6), and get that in use, before introducing this new update.

rtyley added a commit to guardian/editors-picks-uploader that referenced this pull request Sep 13, 2023
This is a small upgrade, catching up with the recent dependency updates of
https://github.com/guardian/facia-scala-client/releases/tag/v4.0.6,
before the more extensive update in guardian/facia-scala-client#287
is introduced.

This update has already been tested in Ophan's PromotionPoller with
guardian/ophan#5540, successfully deployed
to Prodution.

I've switched from `fapi-client-play27` to `fapi-client-play28` - the
`editors-picks-uploader` is an AWS Lambda, it doesn't actually run
the Play framework, so specifying a higher Play version doesn't really
affect it! The `fapi-client` uses `play-json` to do its JSON-parsing
(that's all that the `-play2x` suffix indicates), so we might as well use
the latest version that's available (allowing `facia-scala-client` to drop
support for `play27` at some future point).
rtyley added a commit to guardian/story-packages that referenced this pull request Sep 13, 2023
This is a small upgrade, catching up with the recent dependency updates of
https://github.com/guardian/facia-scala-client/releases/tag/v4.0.6,
before the more extensive update in guardian/facia-scala-client#287
is introduced.

This update has already been tested in Ophan's PromotionPoller with
guardian/ophan#5540, successfully deployed
to Prodution.
rtyley added a commit to guardian/facia-tool that referenced this pull request Sep 13, 2023
This is a small upgrade, catching up with the recent dependency updates of
https://github.com/guardian/facia-scala-client/releases/tag/v4.0.6,
before the more extensive update in guardian/facia-scala-client#287
is introduced.

This update has already been tested in Ophan's PromotionPoller with
guardian/ophan#5540, successfully deployed
to Prodution.
rtyley added a commit to guardian/frontend that referenced this pull request Sep 13, 2023
This is a small upgrade, catching up with the recent dependency updates of
https://github.com/guardian/facia-scala-client/releases/tag/v4.0.6,
before the more extensive update in guardian/facia-scala-client#287
is introduced.

This update has already been tested in Ophan's PromotionPoller with
guardian/ophan#5540, successfully deployed
to Prodution.

Note that as https://github.com/guardian/frontend is _already_
using Play 2.8, it should probably be using `fapi-client-play28`,
rather than `fapi-client-play27`, so I've also updated that.
rtyley added a commit to guardian/etag-caching that referenced this pull request Jan 5, 2024
This is a hack, and it probably won't be able to fully conceal that fact.

The facia-scala-client, in normal use, is asked for collections that don't
yet have collection JSON, and may never have it (apparently, if a collection
has never been edited, its collection JSON won't exist). In the past, this
would be handled by generating a `FaciaNotFound` response for
`com.gu.facia.client.S3Client.get()` which `ApiClient` would cheerfully
_not_ throw an exception for:

https://github.com/guardian/facia-scala-client/blob/f9830fd286d409662be850ff441792ec03c1c2e2/facia-json/src/main/scala/com/gu/facia/client/ApiClient.scala#L25-L27

...so encountering the missing resource would not be logged as an exception,
**which is the desired behaviour**, see guardian/facia-scala-client#32

However, as my attempt (guardian/facia-scala-client#287)
to introduce ETag-caching & AWS SDK v2 support rejects the `com.gu.facia.client.S3Client`
trait, in favour of adapting `com.gu.facia.client.ApiClient`, I need to provide
that same exception-supressing behaviour.

The task is complicated because currently `ETagCache` holds a scaffeine cache with
**value** type `ETaggedData[V]` - there's no successful response for the cache loader
to return for keys that can denote 'resource missing' - any successful value must have
an ETag for one thing, as `ETaggedData` requires it, but when a resource is missing,
we have no ETag.

This commit tried to deal with the problem by replacing the `S3Exception` with a
`CancellationException`, which caffeine special-cases as exception to _not_ log:

ben-manes/caffeine@1e52b10

...however this gets even yuckier than we might have thought, because it means that
`facia-scala-client`, as the consumer of `etag-caching`, has to know that it must
now try to catch `CancellationException`, rather than `S3Exception`, when it is
trying to do it's _own_ suppression of exception-logging.
@rtyley rtyley force-pushed the add-etag-caching-support branch from 54a1fc5 to 93211ce Compare January 5, 2024 16:41
@rtyley rtyley force-pushed the add-etag-caching-support branch 2 times, most recently from 98baf91 to b1fbfdb Compare July 11, 2024 13:03
@rtyley rtyley force-pushed the add-etag-caching-support branch from b4f6a4b to 6de8ec1 Compare January 3, 2025 12:28
@gu-scala-library-release
Copy link
Contributor

@rtyley has published a preview version of this PR with release workflow run #58, based on commit 6de8ec1:

13.1.1-PREVIEW.add-etag-caching-support.2025-01-03T1238.6de8ec13

Want to make another preview release?

Click 'Run workflow' in the GitHub UI, specifying the add-etag-caching-support branch, or use the GitHub CLI command:

gh workflow run release.yml --ref add-etag-caching-support

Want to make a full release after this PR is merged?

Click 'Run workflow' in the GitHub UI, leaving the branch as the default, or use the GitHub CLI command:

gh workflow run release.yml

@rtyley rtyley force-pushed the add-etag-caching-support branch 6 times, most recently from adc4af5 to 00f2690 Compare January 7, 2025 11:47
Comment on lines -20 to -27
private def retrieve[A: Format](key: String): Future[Option[A]] = s3Client.get(bucket, key) map {
case FaciaSuccess(bytes) =>
Some(Json.fromJson[A](Json.parse(new String(bytes, Encoding))) getOrElse {
throw new JsonDeserialisationError(s"Could not deserialize JSON in $bucket, $key")
})
case FaciaNotAuthorized(message) => throw new BackendError(message)
case FaciaNotFound(_) => None
case FaciaUnknownError(message) => throw new BackendError(message)
Copy link
Member Author

@rtyley rtyley Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code needed to be refactored a bit to extract the new parseBytes() method (so it could be used for the parsing phase of the new caching code) - consequently lines 22-24 above go into that new method.

@rtyley rtyley force-pushed the add-etag-caching-support branch 2 times, most recently from 62fdb9c to ba8e60c Compare January 7, 2025 11:57
@gu-scala-library-release
Copy link
Contributor

@rtyley has published a preview version of this PR with release workflow run #59, based on commit ba8e60c:

13.1.1-PREVIEW.add-etag-caching-support.2025-01-07T1516.ba8e60c2

Want to make another preview release?

Click 'Run workflow' in the GitHub UI, specifying the add-etag-caching-support branch, or use the GitHub CLI command:

gh workflow run release.yml --ref add-etag-caching-support

Want to make a full release after this PR is merged?

Click 'Run workflow' in the GitHub UI, leaving the branch as the default, or use the GitHub CLI command:

gh workflow run release.yml

@rtyley rtyley marked this pull request as ready for review January 8, 2025 10:36
@rtyley rtyley requested a review from fredex42 January 8, 2025 10:36
releaseVersion := fromAggregatedAssessedCompatibilityWithLatestRelease().value,
// releaseVersion := fromAggregatedAssessedCompatibilityWithLatestRelease().value,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to comment this out due to this issue, as we're introducing the new fapi-client-core artifact:

...we'll re-enable it immediately after the next release.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-enabled with 42fa0bd.

@rtyley rtyley force-pushed the add-etag-caching-support branch from ba8e60c to 2175db0 Compare January 8, 2025 10:53
This change adds these improvements:

* Facia data is only re-downloaded & re-parsed if the S3 content has _changed_ (thanks to [ETag-caching](https://github.com/guardian/etag-caching))
* Independence from AWS SDK version (v1 vs v2) _(this PR can replace #286

The [ETag-caching](https://github.com/guardian/etag-caching) library itself is also being used in DotCom PROD, introduced with guardian/frontend#26338.

### Usage

```scala
import com.gu.etagcaching.aws.sdkv2.s3.S3ObjectFetching
import com.gu.facia.client.{ApiClient, Environment}
import software.amazon.awssdk.services.s3.S3AsyncClient

val s3AsyncClient = S3AsyncClient.builder().region(...).credentialsProvider(...).build()

val apiClient = ApiClient.withCaching(
  "facia-tool-store",
  Environment.Prod,
  S3ObjectFetching.byteArraysWith(s3AsyncClient)
)
```

_PR using this updated version of the FAPI client: https://github.com/guardian/ophan/pull/6741_

### Independence from AWS SDK version (v1 vs v2)

Ideally, the whole of `facia-scala-client` would be independent of AWS SDK version - we'd _like_ consumers of this library to be able to use whatever AWS SDK version they want, without us pulling in dependency on either SDK version.

For `facia-scala-client` this is an attainable goal, as the only AWS API action it performs is fetching from S3, and [guardian/etag-caching](https://github.com/guardian/etag-caching) provides the [`S3ByteArrayFetching`](https://github.com/guardian/etag-caching/blob/v6.0.0/aws-s3/base/src/main/scala/com/gu/etagcaching/aws/s3/package.scala#L6-L22) abstraction that encapsulates this action without tying to a specific AWS SDK version.

Due to legacy code compatibility, we can't completely remove AWS SDK v1 from `fapi-client` for now, but we _have_ removed it from `fapi-client-core`, which is the new home of `com.gu.facia.client.ApiClient`, which is now a trait, with 2 constructor methods that provide different implementations:

* **`ApiClient()`**  - legacy, using the existing `com.gu.facia.client.S3Client` abstraction on S3 behaviour
* **`ApiClient.withCaching()`** - provides ETag-based caching and is independent of AWS SDK version - the consumer just needs to provide an appropriate instance of `com.gu.etagcaching.aws.sdkv2.s3.S3ObjectFetching` (ie with `"com.gu.etag-caching" %% "aws-s3-sdk-v2"` and `S3ObjectFetching.byteArraysWith(s3AsyncClient)`, introduced with guardian/etag-caching#65)

### Solved problems

* **Noisy logging associated with absent collection JSON** - the `etag-caching` library has been updated with guardian/etag-caching#56 to avoid excessive logging that would occur in the Facia client, due to it typically trying to access collections that aren't yet persisted: #32.
@rtyley rtyley force-pushed the add-etag-caching-support branch from 2175db0 to 4db52b9 Compare January 8, 2025 10:59
@rtyley rtyley changed the title Add ETag-caching (and AWS SDK v2) support Add ETag-caching and AWS SDK v2 support Jan 8, 2025
@gu-scala-library-release
Copy link
Contributor

@rtyley has published a preview version of this PR with release workflow run #62, based on commit 4db52b9:

14.0.1-PREVIEW.add-etag-caching-support.2025-01-08T1105.4db52b99

Want to make another preview release?

Click 'Run workflow' in the GitHub UI, specifying the add-etag-caching-support branch, or use the GitHub CLI command:

gh workflow run release.yml --ref add-etag-caching-support

Want to make a full release after this PR is merged?

Click 'Run workflow' in the GitHub UI, leaving the branch as the default, or use the GitHub CLI command:

gh workflow run release.yml

Copy link

@fredex42 fredex42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great 😁

})
private def parseBytes[B: Format](bytes: Array[Byte]): B =
Json.fromJson[B](Json.parse(new String(bytes, Encoding))) getOrElse {
throw JsonDeserialisationError(s"Could not deserialize JSON")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be nice to log out the actual error here - maybe one for a future pr? @rtyley

@rtyley rtyley merged commit 0ae2cb1 into main Jan 8, 2025
10 checks passed
rtyley added a commit to guardian/frontend that referenced this pull request Jan 9, 2025
With guardian/facia-scala-client#287, the Facia client has
been updated so that it can take advantage of ETag-caching, using the
https://github.com/guardian/etag-caching library already used in Frontend since
August 2023 with #26338.

As some changes were necessary to the `etag-caching` library to facilitate
this, we need to update the version of `etag-caching` used by Frontend to v7.0.0.
We also need to use an updated version of CAPI, as the CAPI version used by Facia
was recently updated with guardian/facia-scala-client#341.

Although this change upgrades to the latest version of the Facia client, it doesn't
take advantage of the new ETag-caching support there - that can come in a later PR,
as it does involve switching to AWS SDK v2 (not everywhere in Frontend, just for
where we use the Facia client, but it's still a significant change as the AWS STS
credential creation has got more elaborate with AWS SDK v2).
rtyley added a commit to guardian/frontend that referenced this pull request Jan 10, 2025
With guardian/facia-scala-client#287, the Facia client has
been updated so that it can take advantage of ETag-caching, using the
https://github.com/guardian/etag-caching library already used in Frontend since
August 2023 with #26338.

As some changes were necessary to the `etag-caching` library to facilitate
this, we need to update the version of `etag-caching` used by Frontend to v7.0.0.

Although this change upgrades to the latest version of the Facia client, it doesn't
take advantage of the new ETag-caching support there - that can come in a later PR,
as it does involve switching to AWS SDK v2 (not everywhere in Frontend, just for
where we use the Facia client, but it's still a significant change as AWS STS
credential creation has got more elaborate with AWS SDK v2).
rtyley added a commit to guardian/frontend that referenced this pull request Jan 10, 2025
With guardian/facia-scala-client#287, the Facia client has
been updated so that it can take advantage of ETag-caching, using the
https://github.com/guardian/etag-caching library already used in Frontend since
August 2023 with #26338.

As some changes were necessary to the `etag-caching` library to facilitate
this, we need to update the version of `etag-caching` used by Frontend to v7.0.0.

This does involve switching to AWS SDK v2 - not everywhere in Frontend, just for
where we use the Facia client!
rtyley added a commit to guardian/frontend that referenced this pull request Jan 11, 2025
With guardian/facia-scala-client#287, the Facia client has
been updated so that it can take advantage of ETag-caching, using the
https://github.com/guardian/etag-caching library already used in Frontend since
August 2023 with #26338.

As some changes were necessary to the `etag-caching` library to facilitate
this, we need to update the version of `etag-caching` used by Frontend to v7.0.0.

This does involve switching to AWS SDK v2 - not everywhere in Frontend, just for
where we use the Facia client!
rtyley added a commit to guardian/frontend that referenced this pull request Jan 12, 2025
With guardian/facia-scala-client#287, the Facia client has
been updated so that it can take advantage of ETag-caching, using the
https://github.com/guardian/etag-caching library already used in Frontend since
August 2023 with #26338.

As some changes were necessary to the `etag-caching` library to facilitate
this, we need to update the version of `etag-caching` used by Frontend to v7.0.0.

This does involve switching to AWS SDK v2 - not everywhere in Frontend, just for
where we use the Facia client!
rtyley added a commit to guardian/frontend that referenced this pull request Jan 12, 2025
With guardian/facia-scala-client#287, the Facia client has
been updated so that it can take advantage of ETag-caching, using the
https://github.com/guardian/etag-caching library already used in Frontend since
August 2023 with #26338.

As some changes were necessary to the `etag-caching` library to facilitate
this, we need to update the version of `etag-caching` used by Frontend to v7.0.0.

This does involve switching to AWS SDK v2 - not everywhere in Frontend, just for
where we use the Facia client!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants