feat: enable cache for external tables using a cache control on metrics views #6265

k-anshul · 2024-12-12T06:21:40Z

No description provided.

…rics views

runtime/reconcilers/metrics_view.go

proto/rill/runtime/v1/resources.proto

runtime/compilers/rillv1/parse_metrics_view.go

runtime/metricsview/executor.go

runtime/query.go

begelundmuller · 2024-12-16T14:02:03Z

runtime/reconcilers/metrics_view.go


 	return runtime.ReconcileResult{Err: validateErr}
 }
+
+func metricsViewCacheControl(spec *runtimev1.MetricsViewSpec, streaming bool, updatedOn *timestamppb.Timestamp, dialect drivers.Dialect) *runtimev1.MetricsViewSpec_Cache {


I think it would be better to move the logic (which basically just applies defaults) in this function to the metricsview_cache_key resolver – to keep the cache key logic in a single place. (It also makes refactoring the defaults easier because they will only be applied at runtime, not persisted in the DB as here.)

We still need to validate for key_sql i.e. whether we are able to resolve a valid key_sql for streaming metrics views. It will also be ideal to do that validation in reconciler and not during query time. Since we are anyways doing some validations it made sense to also apply defaults.

In reconciler I always mark cache enabled or disabled which the resolvers can use to check if caching should be enabled and get metricsViewCacheKey. Now resolvers will need to query every time and handle an error thrown by metrics_cache_key resolver.

It also makes refactoring the defaults easier because they will only be applied at runtime, not persisted in the DB as here

Since defaults can only be changed with a code change so it should make sure the defaults are also updated on metrics view reconcile on restart.

Having said that I also thought about keeping this in resolver and was indifferent to both.
I can move this logic to resolver as well.

By the way if we don't apply defaults in reconciler than we can't resolve this todo since executor can't call metrics_cache_key resolver.

rill/runtime/metricsview/executor.go

Lines 68 to 72 in ba6f2ba

// Cacheable returns whether the result of running the given query is cacheable.

func (e *Executor) Cacheable(qry *Query) bool {

// TODO: Get from OLAP instead of hardcoding

return e.olap.Dialect() == drivers.DialectDuckDB

}

In order to cache the cache key correctly (this is so confusing), it seems like some of the flows around Cacheable anyway need to be refactored?

In general, the idea is that the metricsview package implements logic related to metrics views (such as validation, query generation, exports, etc.). This is largely in order to centralize the core logic related to metrics views in one package to make it easier to understand how our metrics layer works.

In line with that, it would be nice if the metrics-related cache key resolving could also be implemented in the metricsview package. And then called by upstream resolvers.

Right now we have a nice flow of the resolver cache calling the resolver calling the metricsview package.

This easily gets mixed up with this new caching logic. But I think it should be possible to find a way to keep the logic flow here relatively simple. Can you see if you can think of a way to keep it as straightforward as possible?

I can think of following ideas:

We discussed on resolving this via a refresh schedule on metrics_view. This could be evaluated. In that case we can compute the key and store it directly in spec.

Another option is to let runtime expose APIs to set/get data in query_cache which the executor can utilise to set/get metrics_cache_key in the cache. This could be useful for other use cases as well. Like as of now we are always querying watermark so it could be cached as well(May be it is not queried for all API calls, just thinking out aloud).

Let the executor expose an API CacheControl instead of Cacheable which the caller can use. If enabled execute the KeySQL and save the result in the cache using the CacheKey.

type CacheControl struct { Enabled bool KeySQL string CacheKey string }

Option 2 sounds most convenient to me. Good chance we may come up with similar use cases in the future as well. Logically also sounds okay. If anybody can access OLAP interface than anybody should be able to leverage Cache as well. In future the OLAP APIs can accept cache parameter and handle caching results(Just thinking out aloud).

Right now we basically only have caching happening in runtime.Runtime.Resolve (and the legacy runtime.Runtime.Query), which is nice because it's much simpler than when caching happens at multiple levels. I think it would be very nice to avoid caching happening inside the metricsview package – so the guarantee basically becomes that it always returns fresh data, which upstream callers can cache (we may need multi-level caching at some point, but I think we may be able to avoid it for now).

Looking at the resolvers, the Cacheable and Key functions are only called from here:

rill/runtime/resolver.go

Lines 126 to 136 in 8e7bca1

// If not cacheable, just resolve and return

if !resolver.Cacheable() {

return resolver.ResolveInteractive(ctx)

}

// Build cache key based on the resolver's key and refs

ctrl, err := r.Controller(ctx, opts.InstanceID)

if err != nil {

return nil, err

}

hash := md5.New()

if _, err := hash.Write([]byte(resolver.Key())); err != nil {

.

What if we replaced these with a CacheKey(ctx context.Context) (key []byte, ok bool) function (where caching should be skipped if !ok)? Then the metrics, metrics_sql and metrics_time_range resolvers could call the metrics_cache_key resolver from their CacheKey implementation – if they pass that call through opts.Runtime.Resolve, then it in turn will also be cached.

Do you think that will work? The advantage of this approach is that there continues only to be caching happening in runtime.Runtime.Resolve, and resolvers are only being called from other resolvers (not from the metricsview package). This doesn't quite account for the legacy runtime.Runtime.Query implementation, but hopefully the above idea can also work for that in a similar way?

When it comes to the actual execution (but not caching) of the metrics_cache_key resolver, that in turn could actually be implemented inside the metricsview package (similar to how the logic for the metrics resolver is implemented there).

Lastly, to comment on this:

Like as of now we are always querying watermark so it could be cached as well

You are thinking about this, right?

rill/runtime/metricsview/executor_rewrite_time.go

Line 45 in 8e7bca1

t, err := e.loadWatermark(ctx, executionTime)

I agree that it would be nice to have this cached (it hasn't been urgent since the UI currently always passes fixed time ranges). In this case, we could also avoid multi-level caching by instead moving this to the resolver – e.g. you could have something like this in the metrics resolver (pseudocode):

// Create an executor executor := ... // Call watermark resolver for the metrics view (can be cached in the resolver cache) watermark := opts.Runtime.Resolve(...) // Bind the query using the provided watermark (to avoid nested, uncached queries for the watermark) boundQry = executor.BindWatermark(qry, watermark) // Now execute the bound qry res := executor.Execute(boundQry)

Overall the CacheKey API seems good to me.
But I am somewhat unconvinced with how it is computed in resolvers.

Feels like too much indirection for CacheKey. It is hard to understand and trace the calls mentally. metrics and other resolvers need to call another resolver via the runtime.Resolve method to resolve cache key so that data can be cached.

Some APIs on executor like Query can be called directly but other APIs like CacheKey need to go via the resolver path. So the decision to cache something is split across multiple places.

It will be ideal to not let resolvers call other resolvers to prevent some cyclic calls.

May be for internal use cases instead of creating resolvers there should be a uncomplicated way to get and set things in query_cache to avoid too many indirections.

Feels like too much indirection for CacheKey

Agree, but I think it's intrinsic to the idea of a "cached cache key", not this implementation.

Some APIs on executor like Query can be called directly but other APIs like CacheKey need to go via the resolver path. So the decision to cache something is split across multiple places.

You must have misunderstood Executor.Query – it never does any caching. So on the contrary, this ensures consistency because caching is never implemented in the metricsview package, and only implemented one place (in the runtime/resolver.go function (and the legacy runtime/query.go, but that's being deprecated)).

It will be ideal to not let resolvers call other resolvers to prevent some cyclic calls.

I think resolvers calling other resolvers can actually be very powerful. Think of resolvers as memoized functions – it's completely fine for functions to call other functions, but of course, you should be careful if you call the same function recursively.

there should be a uncomplicated way to get and set things in query_cache to avoid too many indirections.

It's subjective, but I'm conversely more worried about the potential for bugs if stuff is getting cached at many different levels in the call chain. Having caching only happen in Resolve IMO makes it easier to reason about when things are or are not cached. I guess there's a functional vs. imperative programming way to look at this.

Agree, but I think it's intrinsic to the idea of a "cached cache key", not this implementation.

Same indirection will be present if resolvers need to load watermark as well.

You must have misunderstood Executor.Query – it never does any caching

No I didn't meant that. I meant metrics resolver and other such resolvers need to call Executor.Query directly but call Executor.CacheKey via runtime.Resolve.

I think resolvers calling other resolvers can actually be very powerful.

I mean there is no denying that it is powerful. My reservations are more around ease of usage and ease of understanding. Creating resolvers for internal usage feels like adding too much bloat. I think there has to be a tradeoff between "perfect" abstractions and ease of understanding/usage. May be we are now in it is subjective territory.

It's subjective, but I'm conversely more worried about the potential for bugs if stuff is getting cached at many different levels in the call chain.

I don't know if the current approach is less bug prone. It is easy to miss the fact calling the Executor. CacheKey in resolver needs to go via runtime.Resolve and not to be called directly which is not the case with other APIs on executor.

runtime/reconcilers/metrics_view.go

runtime/server/batch_query_test.go

runtime/metricsview/executor.go

begelundmuller · 2024-12-19T19:25:05Z

runtime/metricsview/executor.go

+	if res.Err() != nil {
+		return nil, false, err
+	}
+	return []byte(key), true, nil


Not sure if we want to protect agains this, but often the key will be a timestamp and people may forget to cast it to a string.

An easy way to protect here is to use var key any and then run it through jsonval.ToValue and then json.Marshal – that will almost always give you a safely serialized string representation.

Weirdly I did not find this issue in my testing and on digging further I realise that the sql package already does conversion when source is time.Time and dest is *string : https://github.com/golang/go/blob/669d87a935536eb14cb2db311a83345359189924/src/database/sql/convert.go#L286
It does this conversions for most common types.

But I think it is good to run it through jsonval.ToValue since it handles more types.

begelundmuller · 2024-12-19T19:27:50Z

runtime/query.go

+			cacheKey, err := r.metricsViewCacheKey(ctx, instanceID, res.Meta.Name.Name, priority)
+			if err != nil {
+				// skip caching
+				// the cache_key_resolver should ideally only return an error if caching is disabled or context is cancelled
+				return query.Resolve(ctx, r, instanceID, priority)
+			}


Can we have a dedicated skip bool for skipping caching. Because if the err is a context cancellation, that's ideally an error we return immediately. It will also make it easier to understand if an unexpected error is actually returned.

Okay. In that case I think it will be best to expose errCachingDisabled and check for it directly for now. Once query package is removed that can be made internal as well.

begelundmuller · 2024-12-19T19:36:58Z

runtime/resolvers/metrics.go

+		return nil, false, nil
+	}
+
+	hasher := md5.New()


This is already being hashed in runtime/resolver.go so could we return the full pointer here?

The values were already being hashed in the older implementation :

rill/runtime/resolvers/metrics.go

Line 94 in 9061a24

func (r *metricsResolver) Key() string {

Since the key is now composed of two things (cache_key and query) I thought of just using a md5 hasher.
Do you mean we create a new struct and just return any from CacheKey instead of bytes ?

runtime/resolvers/metricsview_cache_key.go

begelundmuller · 2024-12-19T19:46:45Z

runtime/resolvers/metricsview_cache_key.go

+func (r *metricsViewCacheKeyResolver) Refs() []*runtimev1.ResourceName {
+	return []*runtimev1.ResourceName{}
+}


I think it should have a ref to the metrics view to ensure invalidation if the underlying model is updated when streaming is false?

Yes. Previously when not having the cacheable interface I was calling this for every metrics view and adding a ref on metrics_view would lead to circular dependency but now given it is called by resolvers it can add a ref on metrics view.
Though invalidation due to model update still happens because the queries also have the metricsview and hence model as ref.

runtime/resolvers/metricsview_cache_key.go

runtime/resolvers/sql.go

Co-authored-by: Benjamin Egelund-Müller <[email protected]>

allow enabling cache for external tables using a cache control on met…

dc92aaa

…rics views

k-anshul self-assigned this Dec 12, 2024

k-anshul added 5 commits December 12, 2024 16:17

Merge remote-tracking branch 'origin/main' into app_level_ch_cache

efedaab

fix test

ed42ea3

Merge remote-tracking branch 'origin/main' into app_level_ch_cache

af840ba

small fixes

5ed21ef

set cap on query to 1

7676411

k-anshul commented Dec 13, 2024

View reviewed changes

runtime/reconcilers/metrics_view.go Outdated Show resolved Hide resolved

k-anshul requested review from begelundmuller and nishantmonu51 December 13, 2024 03:37

begelundmuller requested changes Dec 16, 2024

View reviewed changes

k-anshul added 3 commits December 18, 2024 11:28

metrics view executor queries the cache key

2dd564d

lint fixes

41ff10a

fix metrics view time range

80b80eb

begelundmuller requested changes Dec 19, 2024

View reviewed changes

k-anshul and others added 3 commits December 20, 2024 09:44

Update runtime/metricsview/executor.go

b9de62f

Co-authored-by: Benjamin Egelund-Müller <[email protected]>

Update runtime/resolvers/metricsview_cache_key.go

0197df2

Co-authored-by: Benjamin Egelund-Müller <[email protected]>

review comments

9eb17e7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: enable cache for external tables using a cache control on metrics views #6265

feat: enable cache for external tables using a cache control on metrics views #6265

k-anshul commented Dec 12, 2024 •

edited

Loading

begelundmuller Dec 16, 2024

k-anshul Dec 16, 2024

k-anshul Dec 16, 2024

begelundmuller Dec 16, 2024

k-anshul Dec 17, 2024 •

edited

Loading

k-anshul Dec 17, 2024 •

edited

Loading

begelundmuller Dec 17, 2024 •

edited

Loading

k-anshul Dec 17, 2024 •

edited

Loading

begelundmuller Dec 19, 2024 •

edited

Loading

k-anshul Dec 20, 2024

begelundmuller Dec 19, 2024

k-anshul Dec 20, 2024

begelundmuller Dec 19, 2024

k-anshul Dec 20, 2024

begelundmuller Dec 19, 2024

k-anshul Dec 20, 2024

begelundmuller Dec 19, 2024

k-anshul Dec 20, 2024 •

edited

Loading

	// Cacheable returns whether the result of running the given query is cacheable.
	func (e Executor) Cacheable(qry Query) bool {
	// TODO: Get from OLAP instead of hardcoding
	return e.olap.Dialect() == drivers.DialectDuckDB
	}

	// If not cacheable, just resolve and return
	if !resolver.Cacheable() {
	return resolver.ResolveInteractive(ctx)
	}
	// Build cache key based on the resolver's key and refs
	ctrl, err := r.Controller(ctx, opts.InstanceID)
	if err != nil {
	return nil, err
	}
	hash := md5.New()
	if _, err := hash.Write([]byte(resolver.Key())); err != nil {

feat: enable cache for external tables using a cache control on metrics views #6265

Are you sure you want to change the base?

feat: enable cache for external tables using a cache control on metrics views #6265

Conversation

k-anshul commented Dec 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

k-anshul Dec 17, 2024 • edited Loading

Choose a reason for hiding this comment

k-anshul Dec 17, 2024 • edited Loading

Choose a reason for hiding this comment

begelundmuller Dec 17, 2024 • edited Loading

Choose a reason for hiding this comment

k-anshul Dec 17, 2024 • edited Loading

Choose a reason for hiding this comment

begelundmuller Dec 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

k-anshul Dec 20, 2024 • edited Loading

Choose a reason for hiding this comment

k-anshul commented Dec 12, 2024 •

edited

Loading

k-anshul Dec 17, 2024 •

edited

Loading

k-anshul Dec 17, 2024 •

edited

Loading

begelundmuller Dec 17, 2024 •

edited

Loading

k-anshul Dec 17, 2024 •

edited

Loading

begelundmuller Dec 19, 2024 •

edited

Loading

k-anshul Dec 20, 2024 •

edited

Loading