Experimental ability to cache `Reference`s #8111

snazy · 2024-02-22T16:09:57Z

Adds ability to cache Reference objects, avoiding round-trips to the backend database, beneficial for read heavy workloads.

Reference-caching can be enabled via two new configuration options: to define the expiration time for References (holding the current HEAD/tip) and to define the expiration time for non-existing References.

Looking up references happens via the name of the reference, usually without the reference type (aka whether it is a branch or a tag), so Nessie has to look up both types - the given name as a branch and the given name as a tag. This is where negative-caching comes into play, because that caches the existing entry and the non-existing "other" reference type. Hence, if you enable reference-caching, it is recommended to also enable negative reference-caching.

Operations that are about to change a reference (committing and reference create/assign/delete operations), always consult the backing database, implicitly refreshing the cache.

Mutliple Nessie (against the same repository) do not communicate with each other. If for example a commit happened against one Nessie instance, the other instances may or may not return the new commit. This is why this feature is still experimental and only useful for Nessie setups with a single Nessie instance. This change will be later with another change to allow distributed cache-eviction, so that other Nessie instances accessing the same repository work fine.

servers/quarkus-common/src/main/java/org/projectnessie/quarkus/config/QuarkusStoreConfig.java

...oned/storage/cache/src/main/java/org/projectnessie/versioned/storage/cache/CacheBackend.java

dimas-b · 2024-05-15T14:59:20Z

...storage/common/src/main/java/org/projectnessie/versioned/storage/common/persist/Persist.java

+   * <p>Database specific implementations of {@link Persist} must not implement this function.
+   */
+  @Nullable
+  default Reference fetchReferenceForUpdate(@Nonnull String name) {


"ForUpdate" looks a bit misleading to me... The client does not have to update it after fetching. How about fetchCurrentReference?

"bypassingCache" would be more correct - but do we care that much?

"ForUpdate" looks really odd in code that just tries to load the latest value 🤷

Why not add a parameter to the old method: fetchReference(String name, boolean refresh), for example?

That caused like 5248 additional changed lines of code ;)

but we could have it as an overloaded method?

I'd prefer to keep the impact low (it's would be rather cosmetic).

Adding fetchReference(String name, boolean refresh) instead of fetchReferenceForUpdate(@Nonnull String name) is about the same in terms of impact, is it not?.. At the same time I find that fetchReference(name, true) is more intuitive (more precise) than fetchReferenceForUpdate() at the call sites. fetchReferenceBypassingCache(name) also works for me, but I think it will bring the term "cache" to implementation that do not care about caching, which is again a bit awkward.

It could also be fetchReference(String name, FetchMode mode) with FetchMode { LOCAL, FRESH }... or similar. WDYT?

...rage/cache/src/main/java/org/projectnessie/versioned/storage/cache/CaffeineCacheBackend.java

Adds ability to cache `Reference` objects, avoiding round-trips to the backend database, beneficial for read heavy workloads. Reference-caching can be enabled via two new configuration options: to define the expiration time for `Reference`s (holding the current HEAD/tip) and to define the expiration time for non-existing `Reference`s. Looking up references happens via the name of the reference, usually without the reference type (aka whether it is a branch or a tag), so Nessie has to look up both types - the given name as a branch and the given name as a tag. This is where negative-caching comes into play, because that caches the existing entry and the non-existing "other" reference type. Hence, if you enable reference-caching, it is recommended to also enable negative reference-caching. Operations that are about to change a reference (committing and reference create/assign/delete operations), always consult the backing database, implicitly refreshing the cache. Mutliple Nessie (against the same repository) do not communicate with each other. If for example a commit happened against one Nessie instance, the other instances may or may not return the new commit. This is why this feature is still experimental and only useful for Nessie setups with a _single_ Nessie instance. This change will be later with another change to allow distributed cache-eviction, so that other Nessie instances accessing the same repository work fine.

dimas-b · 2024-05-15T21:57:16Z

Should we perhaps expose a way for getting a fresh Reference at the client API level?.. mostly as a way to allow clients to avoid references flipping back to an old version on consecutive API calls.

dimas-b · 2024-05-15T21:58:44Z

Also, do we want to keep it "experimental" (PR title) still?

snazy · 2024-05-16T06:54:02Z

Should we perhaps expose a way for getting a fresh Reference at the client API level?.. mostly as a way to allow clients to avoid references flipping back to an old version on consecutive API calls.

I don't understand, TBH. How should a client "flip back" to an old version? The cached value is always up to date (assuming a single-node Nessie instance, later with #8463 on all instances)

Also, do we want to keep it "experimental" (PR title) still?

At least until #8463 is not merged.

snazy force-pushed the cache-references branch 9 times, most recently from 6dcb278 to 036a501 Compare February 27, 2024 09:42

snazy force-pushed the cache-references branch 2 times, most recently from 1fab3a0 to c3f1303 Compare March 18, 2024 10:18

snazy force-pushed the cache-references branch from c3f1303 to a2ca5b3 Compare March 28, 2024 17:50

snazy force-pushed the cache-references branch 7 times, most recently from 9672006 to ee9cca6 Compare May 6, 2024 10:34

snazy changed the title ~~WIP: Experimental ability to cache References~~ Experimental ability to cache References May 6, 2024

snazy marked this pull request as ready for review May 6, 2024 10:34

snazy force-pushed the cache-references branch 2 times, most recently from ba20f8d to b27e9b9 Compare May 14, 2024 07:31

dimas-b reviewed May 15, 2024

View reviewed changes

snazy added 3 commits May 15, 2024 17:25

address possible ID clash

5d8f287

renames

8d9360a

snazy force-pushed the cache-references branch from b27e9b9 to 8d9360a Compare May 15, 2024 15:55

dimas-b approved these changes May 15, 2024

View reviewed changes

snazy merged commit cfb7f69 into projectnessie:main May 16, 2024
16 checks passed

snazy deleted the cache-references branch May 16, 2024 06:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experimental ability to cache `Reference`s #8111

Experimental ability to cache `Reference`s #8111

snazy commented Feb 22, 2024 •

edited

Loading

dimas-b May 15, 2024

snazy May 15, 2024

dimas-b May 15, 2024

snazy May 15, 2024

dimas-b May 15, 2024

snazy May 15, 2024

dimas-b May 15, 2024

dimas-b commented May 15, 2024

dimas-b commented May 15, 2024

snazy commented May 16, 2024

Experimental ability to cache References #8111

Experimental ability to cache References #8111

Conversation

snazy commented Feb 22, 2024 • edited Loading

dimas-b May 15, 2024

Choose a reason for hiding this comment

snazy May 15, 2024

Choose a reason for hiding this comment

dimas-b May 15, 2024

Choose a reason for hiding this comment

snazy May 15, 2024

Choose a reason for hiding this comment

dimas-b May 15, 2024

Choose a reason for hiding this comment

snazy May 15, 2024

Choose a reason for hiding this comment

dimas-b May 15, 2024

Choose a reason for hiding this comment

dimas-b commented May 15, 2024

dimas-b commented May 15, 2024

snazy commented May 16, 2024

Experimental ability to cache `Reference`s #8111

Experimental ability to cache `Reference`s #8111

snazy commented Feb 22, 2024 •

edited

Loading