Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use soft values in the EntityCache; approximate entity size #490

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -71,12 +71,13 @@ public EntityCache(@NotNull PolarisRemoteCache polarisRemoteCache) {
};

// use a Caffeine cache to purge entries when those have not been used for a long time.
// Assuming 1KB per entry, 100K entries is about 100MB.
this.byId =
Caffeine.newBuilder()
.maximumSize(100_000) // Set maximum size to 100,000 elements
.maximumWeight(100 * EntityWeigher.WEIGHT_PER_MB) // Goal is ~100MB
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we're here - why is this always hard coded and not configurable? Or at least determined based on the actual heap size (which is not as easy as it sounds)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good callout. My concern with making it configurable is that the limit is close to arbitrary.

As you noted in another comment, we have a "goal" of 100MB but that could really be 200MB or 1MB depending on how eviction happens with soft values and the weights. So while raising the limit definitely gets you a bigger cache, it's quite opaque as to what value you would actually want to set.

Also, the EntityCache is ideally very transparent and not something a typical end user would want to concern themselves with.

In light of the above I kept it hard-coded for now but if you feel strongly that we should make it a featureConfiguration I am comfortable doing so

.weigher(EntityWeigher.asWeigher())
.expireAfterAccess(1, TimeUnit.HOURS) // Expire entries after 1 hour of no access
.removalListener(removalListener) // Set the removal listener
.softValues() // Account for memory pressure
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either there's a weigher or there are soft-references, the latter is IMHO a bad choice, because it can likely ruin the efficiency of the eviction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs seem to say you can do both, WDYM?

Copy link
Contributor Author

@eric-maynard eric-maynard Nov 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for soft values being a bad choice, in my experience this is quite a common practice and the degradation we're likely to see is far less than the degradation we'd see in a situation where heap space gets exhausted and we do not have soft values.

Assuming there is not significant memory pressure, my expectation is that GC should more or less ignore the SoftReference objects.

If there's a specific performance concern that can be borne out in a benchmark, we should address it.

.build();

// remember the meta store manager
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/
package org.apache.polaris.core.persistence.cache;

import com.github.benmanes.caffeine.cache.Weigher;
import org.checkerframework.checker.index.qual.NonNegative;

/**
* A {@link Weigher} implementation that weighs {@link EntityCacheEntry} objects by the approximate
* size of the entity object.
*/
public class EntityWeigher implements Weigher<Long, EntityCacheEntry> {

/** The amount of weight that is expected to roughly equate to 1MB of memory usage */
public static final long WEIGHT_PER_MB = 1024 * 1024;

/* Represents the approximate size of an entity beyond the properties */
private static final int APPROXIMATE_ENTITY_OVERHEAD = 1000;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does the value 1000 come from?

Copy link
Contributor Author

@eric-maynard eric-maynard Nov 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See this old comment -- the apparent assumption was that entities are generally ~1KB.

This means in the worst case, we will not get a larger number of entries due to this change. It should strictly be smaller.


/** Singleton instance */
private static final EntityWeigher instance = new EntityWeigher();

private EntityWeigher() {}

/** Gets the singleton {@link EntityWeigher} */
public static EntityWeigher getInstance() {
return instance;
}

/**
* Computes the weight of a given entity
*
* @param key The entity's key; not used
* @param value The entity to be cached
* @return The weight of the entity
*/
@Override
public @NonNegative int weigh(Long key, EntityCacheEntry value) {
return APPROXIMATE_ENTITY_OVERHEAD
+ value.getEntity().getProperties().length()
+ value.getEntity().getInternalProperties().length();
Comment on lines +56 to +57
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assumption here is wrong. There is no guarantee that one character is only one byte, even with "compact strings" - it's likely off by factor 2.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an approximate size. Even if it's off by a factor of 10, that should be okay. A 1GB cache is greatly preferable to an unbounded cache.

}

/** Factory method to provide a typed Weigher */
public static Weigher<Long, EntityCacheEntry> asWeigher() {
return (Weigher<Long, EntityCacheEntry>) getInstance();
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -16,24 +16,29 @@
* specific language governing permissions and limitations
* under the License.
*/
package org.apache.polaris.core.persistence;
package org.apache.polaris.core.persistence.cache;

import static org.apache.polaris.core.persistence.PrincipalSecretsGenerator.RANDOM_SECRETS;

import java.util.List;
import java.util.stream.Collectors;
import org.apache.iceberg.catalog.TableIdentifier;
import org.apache.polaris.core.PolarisCallContext;
import org.apache.polaris.core.PolarisDefaultDiagServiceImpl;
import org.apache.polaris.core.PolarisDiagnostics;
import org.apache.polaris.core.entity.PolarisBaseEntity;
import org.apache.polaris.core.entity.PolarisEntity;
import org.apache.polaris.core.entity.PolarisEntitySubType;
import org.apache.polaris.core.entity.PolarisEntityType;
import org.apache.polaris.core.entity.PolarisGrantRecord;
import org.apache.polaris.core.entity.PolarisPrivilege;
import org.apache.polaris.core.persistence.cache.EntityCache;
import org.apache.polaris.core.persistence.cache.EntityCacheByNameKey;
import org.apache.polaris.core.persistence.cache.EntityCacheEntry;
import org.apache.polaris.core.persistence.cache.EntityCacheLookupResult;
import org.apache.polaris.core.entity.TableLikeEntity;
import org.apache.polaris.core.persistence.PolarisMetaStoreManager;
import org.apache.polaris.core.persistence.PolarisMetaStoreManagerImpl;
import org.apache.polaris.core.persistence.PolarisMetaStoreSession;
import org.apache.polaris.core.persistence.PolarisTestMetaStoreManager;
import org.apache.polaris.core.persistence.PolarisTreeMapMetaStoreSessionImpl;
import org.apache.polaris.core.persistence.PolarisTreeMapStore;
import org.assertj.core.api.Assertions;
import org.junit.jupiter.api.Test;
import org.mockito.Mockito;
Expand Down Expand Up @@ -473,4 +478,26 @@ void testRenameAndCacheDestinationBeforeLoadingSource() {
// now the loading by the old name should return null
Assertions.assertThat(cache.getOrLoadEntityByName(callCtx, T4_name)).isNull();
}

/* Helper for `testEntityWeigher` */
private int getEntityWeight(PolarisEntity entity) {
return EntityWeigher.getInstance()
.weigh(-1L, new EntityCacheEntry(diagServices, 1L, entity, List.of(), 1));
}

@Test
void testEntityWeigher() {
var smallEntity = new TableLikeEntity.Builder(TableIdentifier.of("ns.t1"), "").build();
var mediumEntity =
new TableLikeEntity.Builder(TableIdentifier.of("ns.t1"), "")
.setMetadataLocation("a".repeat(10000))
.build();
var largeEntity =
new TableLikeEntity.Builder(TableIdentifier.of("ns.t1"), "")
.setMetadataLocation("a".repeat(1000 * 1000))
.build();

Assertions.assertThat(getEntityWeight(smallEntity)).isLessThan(getEntityWeight(mediumEntity));
Assertions.assertThat(getEntityWeight(mediumEntity)).isLessThan(getEntityWeight(largeEntity));
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -828,7 +828,7 @@ void dropEntity(List<PolarisEntityCore> catalogPath, PolarisEntityCore entityToD
}

/** Grant a privilege to a catalog role */
void grantPrivilege(
public void grantPrivilege(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes here look unrelated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are not; the EntityCacheTest was moved into a new package and so these methods were no longer callable from it if they remained package-private

PolarisBaseEntity role,
List<PolarisEntityCore> catalogPath,
PolarisBaseEntity securable,
Expand Down Expand Up @@ -1006,7 +1006,7 @@ PolarisBaseEntity createTestCatalog(String catalogName) {
*
* @return the identity we found
*/
PolarisBaseEntity ensureExistsByName(
public PolarisBaseEntity ensureExistsByName(
List<PolarisEntityCore> catalogPath,
PolarisEntityType entityType,
PolarisEntitySubType entitySubType,
Expand Down Expand Up @@ -1052,7 +1052,7 @@ PolarisBaseEntity ensureExistsByName(
*
* @return the identity we found
*/
PolarisBaseEntity ensureExistsByName(
public PolarisBaseEntity ensureExistsByName(
List<PolarisEntityCore> catalogPath, PolarisEntityType entityType, String name) {
return this.ensureExistsByName(
catalogPath, entityType, PolarisEntitySubType.NULL_SUBTYPE, name);
Expand All @@ -1067,7 +1067,7 @@ PolarisBaseEntity ensureExistsByName(
* @param internalProps updated internal properties
* @return updated entity
*/
PolarisBaseEntity updateEntity(
public PolarisBaseEntity updateEntity(
List<PolarisEntityCore> catalogPath,
PolarisBaseEntity entity,
String props,
Expand Down Expand Up @@ -1553,7 +1553,7 @@ void validateBootstrap() {
this.ensureGrantRecordExists(principalRole, principal, PolarisPrivilege.PRINCIPAL_ROLE_USAGE);
}

void testCreateTestCatalog() {
public void testCreateTestCatalog() {
// create test catalog
this.createTestCatalog("test");

Expand Down Expand Up @@ -2075,7 +2075,7 @@ public void testPrivileges() {
* @param newCatPath new catalog path
* @param newName new name
*/
void renameEntity(
public void renameEntity(
List<PolarisEntityCore> catPath,
PolarisBaseEntity entity,
List<PolarisEntityCore> newCatPath,
Expand Down