Skip to content

Commit

Permalink
Experimental ability to cache References
Browse files Browse the repository at this point in the history
Adds ability to cache `Reference` objects, avoiding round-trips to the backend database, beneficial for read heavy workloads.

Reference-caching can be enabled via two new configuration options: to define the expiration time for `Reference`s (holding the current HEAD/tip) and to define the expiration time for non-existing `Reference`s.

Looking up references happens via the name of the reference, usually without the reference type (aka whether it is a branch or a tag), so Nessie has to look up both types - the given name as a branch and the given name as a tag. This is where negative-caching comes into play, because that caches the existing entry and the non-existing "other" reference type. Hence, if you enable reference-caching, it is recommended to also enable negative reference-caching.

Operations that are about to change a reference (committing and reference create/assign/delete operations), always consult the backing database, implicitly refreshing the cache.

Mutliple Nessie (against the same repository) do not communicate with each other. If for example a commit happened against one Nessie instance, the other instances may or may not return the new commit. This is why this feature is still experimental and only useful for Nessie setups with a _single_ Nessie instance.
  • Loading branch information
snazy committed Mar 18, 2024
1 parent 5b9a291 commit c3f1303
Show file tree
Hide file tree
Showing 34 changed files with 834 additions and 53 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@
import io.smallrye.config.WithConverter;
import io.smallrye.config.WithDefault;
import io.smallrye.config.WithName;
import java.time.Duration;
import java.util.Optional;
import java.util.OptionalDouble;
import java.util.OptionalInt;
import org.projectnessie.versioned.storage.common.config.StoreConfig;
Expand Down Expand Up @@ -120,4 +122,12 @@ public interface QuarkusStoreConfig extends StoreConfig {

@WithName(CONFIG_CACHE_CAPACITY_FRACTION_ADJUST_MB)
OptionalInt cacheCapacityFractionAdjustMB();

@WithName(CONFIG_CACHE_REFERENCE_TTL)
@Override
Optional<Duration> cacheReferenceTtl();

@WithName(CONFIG_CACHE_REFERENCE_NEGATIVE_TTL)
@Override
Optional<Duration> cacheReferenceNegativeTtl();
}
Original file line number Diff line number Diff line change
Expand Up @@ -146,12 +146,21 @@ public Persist producePersist(MeterRegistry meterRegistry) {

String cacheInfo;
if (effectiveCacheSizeMB > 0) {
CacheConfig cacheConfig =
CacheConfig.builder()
.capacityMb(effectiveCacheSizeMB)
.meterRegistry(meterRegistry)
.build();
CacheBackend cacheBackend = PersistCaches.newBackend(cacheConfig);
CacheConfig.Builder cacheConfig =
CacheConfig.builder().capacityMb(effectiveCacheSizeMB).meterRegistry(meterRegistry);

storeConfig
.cacheReferenceTtl()
.ifPresent(
refTtl -> {
LOGGER.warn(
"Reference caching is an experimental feature but enabled with a TTL of {}",
refTtl);
cacheConfig.referenceTtl(refTtl);
});
storeConfig.cacheReferenceNegativeTtl().ifPresent(cacheConfig::referenceNegativeTtl);

CacheBackend cacheBackend = PersistCaches.newBackend(cacheConfig.build());
persist = cacheBackend.wrap(persist);
cacheInfo = "with " + effectiveCacheSizeMB + " MB objects cache";
} else {
Expand Down
51 changes: 31 additions & 20 deletions site/docs/try/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -174,26 +174,37 @@ into the configured data store:
Usually, only the cache-capacity should be adjusted to the amount of the Java heap "available" for the cache. The
default is conservative, bumping the cache size is recommended.

| Property | Default values | Type | Description |
|--------------------------------------------------------------------|---------------------|-----------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `nessie.version.store.persist.repository-id` | | `String` | Sets Nessie repository ID (optional). This ID can be used to distinguish multiple Nessie repositories that reside in the same storage instance. |
| `nessie.version.store.persist.parents-per-commit` | `20` | `int` | Sets the number of parent-commit-hashes stored in Nessie store. |
| `nessie.version.store.persist.commit-timeout-millis` | `5000` | `int` | Sets the timeout for CAS-like operations in milliseconds. |
| `nessie.version.store.persist.commit-retries` | `Integer.MAX_VALUE` | `int` | Sets the maximum retries for CAS-like operations. |
| `nessie.version.store.persist.retry-initial-sleep-millis-lower` | `5` | `int` | Configures the initial lower-bound sleep time in milliseconds of the exponential backoff when retrying commit operations. |
| `nessie.version.store.persist.retry-initial-sleep-millis-upper` | `25` | `int` | Configures the initial upper-bound sleep time in milliseconds of the exponential backoff when retrying commit operations. |
| `nessie.version.store.persist.retry-max-sleep-millis` | `250` | `int` | Configures the max sleep time in milliseconds of the exponential backoff when retrying commit operations. |
| `nessie.version.store.persist.max-incremental-index-size` | `50 * 1024` | `int` | Maximum serialized size of key indexes stored inside commit objects. Trade off: bigger incremental indexes reduce the amount of reads, at the expense of "bigger" read results. |
| `nessie.version.store.persist.max-serialized-index-size` | `200 * 1024` | `int` | Maximum serialized size of key indexes stored as separate objects. Trade off: bigger incremental indexes reduce the amount of reads, at the expense of "bigger" read results. |
| `nessie.version.store.persist.max-reference-stripes-per-commit` | `50` | `int` | Maximum number of referenced index objects stored inside commit objects. |
| `nessie.version.store.persist.assumed-wall-clock-drift-micros` | `5_000_000` | `long` | Sets the assumed wall-clock drift between multiple Nessie instances, in microseconds. |
| `nessie.version.store.persist.namespace-validation` | `true` | `boolean` | Whether namespace validation is enabled, changing this to `false` will break the Nessie specification! |
| `nessie.version.store.persist.cache-capacity-mb` | see description | `int` | Fixed amount of heap used to cache objects, set to `0` to disable the cache entirely. Must not be used with fractional cache sizing. See description for `cache-capacity-fraction-of-heap` for the default value. |
| `nessie.version.store.persist.cache-capacity-fraction-of-heap` | see description | `double` | Fraction of Java's max heap size to use for cache objects, set to `0` to disable. Must not be used with fixed cache sizing. If neither this value nor a fixed size is configured, a default of `.7` (70%) is assumed. |
| `nessie.version.store.persist.cache-capacity-fraction-adjust-mb` | `256` | `int` | When using fractional cache sizing, this amount in MB of the heap will always be "kept free" when calculating the cache size. |
| `nessie.version.store.persist.cache-capacity-fraction-min-size-mb` | `64` | `int` | When using fractional cache sizing, this amount in MB is the minimum cache size. |
| `nessie.version.store.persist.ref-previous-head-count` | `20` | `int` | Named references keep a history of up to this amount of previous HEAD pointers, and up to the configured age. |
| `nessie.version.store.persist.ref-previous-head-time-span-seconds` | `300` | `int` | Named references keep a history of previous HEAD pointers with this age in _seconds_, and up to the configured amount. |
| Property | Default values | Type | Description |
|--------------------------------------------------------------------|---------------------|------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `nessie.version.store.persist.repository-id` | | `String` | Sets Nessie repository ID (optional). This ID can be used to distinguish multiple Nessie repositories that reside in the same storage instance. |
| `nessie.version.store.persist.parents-per-commit` | `20` | `int` | Sets the number of parent-commit-hashes stored in Nessie store. |
| `nessie.version.store.persist.commit-timeout-millis` | `5000` | `int` | Sets the timeout for CAS-like operations in milliseconds. |
| `nessie.version.store.persist.commit-retries` | `Integer.MAX_VALUE` | `int` | Sets the maximum retries for CAS-like operations. |
| `nessie.version.store.persist.retry-initial-sleep-millis-lower` | `5` | `int` | Configures the initial lower-bound sleep time in milliseconds of the exponential backoff when retrying commit operations. |
| `nessie.version.store.persist.retry-initial-sleep-millis-upper` | `25` | `int` | Configures the initial upper-bound sleep time in milliseconds of the exponential backoff when retrying commit operations. |
| `nessie.version.store.persist.retry-max-sleep-millis` | `250` | `int` | Configures the max sleep time in milliseconds of the exponential backoff when retrying commit operations. |
| `nessie.version.store.persist.max-incremental-index-size` | `50 * 1024` | `int` | Maximum serialized size of key indexes stored inside commit objects. Trade off: bigger incremental indexes reduce the amount of reads, at the expense of "bigger" read results. |
| `nessie.version.store.persist.max-serialized-index-size` | `200 * 1024` | `int` | Maximum serialized size of key indexes stored as separate objects. Trade off: bigger incremental indexes reduce the amount of reads, at the expense of "bigger" read results. |
| `nessie.version.store.persist.max-reference-stripes-per-commit` | `50` | `int` | Maximum number of referenced index objects stored inside commit objects. |
| `nessie.version.store.persist.assumed-wall-clock-drift-micros` | `5_000_000` | `long` | Sets the assumed wall-clock drift between multiple Nessie instances, in microseconds. |
| `nessie.version.store.persist.namespace-validation` | `true` | `boolean` | Whether namespace validation is enabled, changing this to `false` will break the Nessie specification! |
| `nessie.version.store.persist.cache-capacity-mb` | see description | `int` | Fixed amount of heap used to cache objects, set to `0` to disable the cache entirely. Must not be used with fractional cache sizing. See description for `cache-capacity-fraction-of-heap` for the default value. |
| `nessie.version.store.persist.cache-capacity-fraction-of-heap` | see description | `double` | Fraction of Java's max heap size to use for cache objects, set to `0` to disable. Must not be used with fixed cache sizing. If neither this value nor a fixed size is configured, a default of `.7` (70%) is assumed. |
| `nessie.version.store.persist.cache-capacity-fraction-adjust-mb` | `256` | `int` | When using fractional cache sizing, this amount in MB of the heap will always be "kept free" when calculating the cache size. |
| `nessie.version.store.persist.cache-capacity-fraction-min-size-mb` | `64` | `int` | When using fractional cache sizing, this amount in MB is the minimum cache size. |
| `nessie.version.store.persist.ref-previous-head-count` | `20` | `int` | Named references keep a history of up to this amount of previous HEAD pointers, and up to the configured age. |
| `nessie.version.store.persist.ref-previous-head-time-span-seconds` | `300` | `int` | Named references keep a history of previous HEAD pointers with this age in _seconds_, and up to the configured amount. |
| `nessie.version.store.persist.cache-reference-ttl` | (not present) | `Duration` | **EXPERIMENTAL FEATURE** If present, defines the duration how long information about a named reference shall be cached. Disabled if not present or `PT0S`. |
| `nessie.version.store.persist.cache-reference-negative-ttl` | (not present) | `Duration` | **EXPERIMENTAL FEATURE** If present, defines the duration how long a non-existing reference information shall be cached. Disabled if not present or `PT0S`. |

!!! info
The `Duration` type specifies a time-duration using the string format of the Java `java.time.Duration` type.
Examples: `PT5M` for 5 minutes, `PT30S` for 30 seconds, `PT1M30S` for 90 seconds.

!!! warning
Reference caching, enabled via the two `nessie.version.store.persist.cache-reference-*` settings, is still
experimental and only usable when only one Nessie server instance accesses a Nessie repository. Using this
feature is not recommended.

### Authentication settings

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -404,13 +404,27 @@ public Reference fetchReference(@Nonnull @javax.annotation.Nonnull String name)
return delegate().fetchReference(name);
}

@Override
@Nullable
@javax.annotation.Nullable
public Reference fetchReferenceForUpdate(@Nonnull @javax.annotation.Nonnull String name) {
return delegate().fetchReferenceForUpdate(name);
}

@Override
@Nonnull
@javax.annotation.Nonnull
public Reference[] fetchReferences(@Nonnull @javax.annotation.Nonnull String[] names) {
return delegate().fetchReferences(names);
}

@Override
@Nonnull
@javax.annotation.Nonnull
public Reference[] fetchReferencesForUpdate(@Nonnull @javax.annotation.Nonnull String[] names) {
return delegate().fetchReferencesForUpdate(names);
}

@Override
@Nonnull
@javax.annotation.Nonnull
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,17 @@ public StoreConfig config() {
return config;
}

@Override
public Reference fetchReferenceForUpdate(@Nonnull String name) {
return fetchReference(name);
}

@Nonnull
@Override
public Reference[] fetchReferencesForUpdate(@Nonnull String[] names) {
return fetchReferences(names);
}

@Override
public Reference fetchReference(@Nonnull String name) {
try {
Expand Down Expand Up @@ -173,7 +184,7 @@ public Reference addReference(@Nonnull Reference reference) throws RefAlreadyExi
.otherwise(mutation));

if (success) {
throw new RefAlreadyExistsException(fetchReference(reference.name()));
throw new RefAlreadyExistsException(fetchReferenceForUpdate(reference.name()));
}

return reference;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,17 +15,24 @@
*/
package org.projectnessie.versioned.storage.cache;

import static org.projectnessie.versioned.storage.common.persist.ObjId.zeroLengthObjId;
import static org.projectnessie.versioned.storage.common.persist.Reference.reference;

import jakarta.annotation.Nonnull;
import org.projectnessie.versioned.storage.common.persist.Backend;
import org.projectnessie.versioned.storage.common.persist.Obj;
import org.projectnessie.versioned.storage.common.persist.ObjId;
import org.projectnessie.versioned.storage.common.persist.Persist;
import org.projectnessie.versioned.storage.common.persist.Reference;

/**
* Provides the cache primitives for a caching {@link Persist} facade, suitable for multiple
* repositories. It is adviseable to have one {@link CacheBackend} per {@link Backend}.
*/
public interface CacheBackend {
Reference NON_EXISTENT_REFERENCE_SENTINEL =
reference("NON_EXISTENT", zeroLengthObjId(), false, -1L, null);

Obj get(@Nonnull String repositoryId, @Nonnull ObjId id);

void put(@Nonnull String repositoryId, @Nonnull Obj obj);
Expand All @@ -35,4 +42,12 @@ public interface CacheBackend {
void clear(@Nonnull String repositoryId);

Persist wrap(@Nonnull Persist perist);

Reference getReference(@Nonnull String repositoryId, @Nonnull String name);

void removeReference(@Nonnull String repositoryId, @Nonnull String name);

void putReference(@Nonnull String repositoryId, @Nonnull Reference r);

void putNegative(@Nonnull String repositoryId, @Nonnull String name);
}
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@

import com.google.errorprone.annotations.CanIgnoreReturnValue;
import io.micrometer.core.instrument.MeterRegistry;
import java.time.Duration;
import java.util.Optional;
import java.util.function.LongSupplier;
import org.immutables.value.Value;
Expand All @@ -27,6 +28,10 @@ public interface CacheConfig {

Optional<MeterRegistry> meterRegistry();

Optional<Duration> referenceTtl();

Optional<Duration> referenceNegativeTtl();

@Value.Default
default LongSupplier clockNanos() {
return System::nanoTime;
Expand All @@ -43,6 +48,12 @@ interface Builder {
@CanIgnoreReturnValue
Builder meterRegistry(MeterRegistry meterRegistry);

@CanIgnoreReturnValue
Builder referenceTtl(Duration referenceTtl);

@CanIgnoreReturnValue
Builder referenceNegativeTtl(Duration referenceNegativeTtl);

@CanIgnoreReturnValue
Builder clockNanos(LongSupplier clockNanos);

Expand Down
Loading

0 comments on commit c3f1303

Please sign in to comment.