-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: enable cache for external tables using a cache control on metrics views #6265
base: main
Are you sure you want to change the base?
Conversation
runtime/reconcilers/metrics_view.go
Outdated
|
||
return runtime.ReconcileResult{Err: validateErr} | ||
} | ||
|
||
func metricsViewCacheControl(spec *runtimev1.MetricsViewSpec, streaming bool, updatedOn *timestamppb.Timestamp, dialect drivers.Dialect) *runtimev1.MetricsViewSpec_Cache { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to move the logic (which basically just applies defaults) in this function to the metricsview_cache_key
resolver – to keep the cache key logic in a single place. (It also makes refactoring the defaults easier because they will only be applied at runtime, not persisted in the DB as here.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- We still need to validate for
key_sql
i.e. whether we are able to resolve a valid key_sql for streaming metrics views. It will also be ideal to do that validation in reconciler and not during query time. Since we are anyways doing some validations it made sense to also apply defaults. - In reconciler I always mark cache enabled or disabled which the resolvers can use to check if caching should be enabled and get
metricsViewCacheKey
. Now resolvers will need to query every time and handle an error thrown bymetrics_cache_key
resolver.
It also makes refactoring the defaults easier because they will only be applied at runtime, not persisted in the DB as here
Since defaults can only be changed with a code change so it should make sure the defaults are also updated on metrics view reconcile on restart.
Having said that I also thought about keeping this in resolver and was indifferent to both.
I can move this logic to resolver as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way if we don't apply defaults in reconciler than we can't resolve this todo since executor can't call metrics_cache_key
resolver.
rill/runtime/metricsview/executor.go
Lines 68 to 72 in ba6f2ba
// Cacheable returns whether the result of running the given query is cacheable. | |
func (e *Executor) Cacheable(qry *Query) bool { | |
// TODO: Get from OLAP instead of hardcoding | |
return e.olap.Dialect() == drivers.DialectDuckDB | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to cache the cache key correctly (this is so confusing), it seems like some of the flows around Cacheable
anyway need to be refactored?
In general, the idea is that the metricsview
package implements logic related to metrics views (such as validation, query generation, exports, etc.). This is largely in order to centralize the core logic related to metrics views in one package to make it easier to understand how our metrics layer works.
In line with that, it would be nice if the metrics-related cache key resolving could also be implemented in the metricsview
package. And then called by upstream resolvers.
Right now we have a nice flow of the resolver cache calling the resolver calling the metricsview
package.
This easily gets mixed up with this new caching logic. But I think it should be possible to find a way to keep the logic flow here relatively simple. Can you see if you can think of a way to keep it as straightforward as possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can think of following ideas:
- We discussed on resolving this via a refresh schedule on metrics_view. This could be evaluated. In that case we can compute the key and store it directly in spec.
- Another option is to let runtime expose APIs to set/get data in query_cache which the executor can utilise to set/get
metrics_cache_key
in the cache. This could be useful for other use cases as well. Like as of now we are always querying watermark so it could be cached as well(May be it is not queried for all API calls, just thinking out aloud). - Let the executor expose an API
CacheControl
instead ofCacheable
which the caller can use. If enabled execute the KeySQL and save the result in the cache using the CacheKey.
type CacheControl struct {
Enabled bool
KeySQL string
CacheKey string
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Option 2 sounds most convenient to me. Good chance we may come up with similar use cases in the future as well. Logically also sounds okay. If anybody can access OLAP interface than anybody should be able to leverage Cache as well. In future the OLAP APIs can accept cache
parameter and handle caching results(Just thinking out aloud).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now we basically only have caching happening in runtime.Runtime.Resolve
(and the legacy runtime.Runtime.Query
), which is nice because it's much simpler than when caching happens at multiple levels. I think it would be very nice to avoid caching happening inside the metricsview
package – so the guarantee basically becomes that it always returns fresh data, which upstream callers can cache (we may need multi-level caching at some point, but I think we may be able to avoid it for now).
Looking at the resolvers, the Cacheable
and Key
functions are only called from here:
Lines 126 to 136 in 8e7bca1
// If not cacheable, just resolve and return | |
if !resolver.Cacheable() { | |
return resolver.ResolveInteractive(ctx) | |
} | |
// Build cache key based on the resolver's key and refs | |
ctrl, err := r.Controller(ctx, opts.InstanceID) | |
if err != nil { | |
return nil, err | |
} | |
hash := md5.New() | |
if _, err := hash.Write([]byte(resolver.Key())); err != nil { |
What if we replaced these with a CacheKey(ctx context.Context) (key []byte, ok bool)
function (where caching should be skipped if !ok
)? Then the metrics
, metrics_sql
and metrics_time_range
resolvers could call the metrics_cache_key
resolver from their CacheKey
implementation – if they pass that call through opts.Runtime.Resolve
, then it in turn will also be cached.
Do you think that will work? The advantage of this approach is that there continues only to be caching happening in runtime.Runtime.Resolve
, and resolvers are only being called from other resolvers (not from the metricsview
package). This doesn't quite account for the legacy runtime.Runtime.Query
implementation, but hopefully the above idea can also work for that in a similar way?
When it comes to the actual execution (but not caching) of the metrics_cache_key
resolver, that in turn could actually be implemented inside the metricsview
package (similar to how the logic for the metrics
resolver is implemented there).
Lastly, to comment on this:
Like as of now we are always querying watermark so it could be cached as well
You are thinking about this, right?
t, err := e.loadWatermark(ctx, executionTime) |
I agree that it would be nice to have this cached (it hasn't been urgent since the UI currently always passes fixed time ranges). In this case, we could also avoid multi-level caching by instead moving this to the resolver – e.g. you could have something like this in the metrics
resolver (pseudocode):
// Create an executor
executor := ...
// Call watermark resolver for the metrics view (can be cached in the resolver cache)
watermark := opts.Runtime.Resolve(...)
// Bind the query using the provided watermark (to avoid nested, uncached queries for the watermark)
boundQry = executor.BindWatermark(qry, watermark)
// Now execute the bound qry
res := executor.Execute(boundQry)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall the CacheKey
API seems good to me.
But I am somewhat unconvinced with how it is computed in resolvers.
- Feels like too much indirection for
CacheKey
. It is hard to understand and trace the calls mentally.metrics
and other resolvers need to call another resolver via theruntime.Resolve
method to resolve cache key so that data can be cached. - Some APIs on executor like
Query
can be called directly but other APIs likeCacheKey
need to go via the resolver path. So the decision to cache something is split across multiple places. - It will be ideal to not let resolvers call other resolvers to prevent some cyclic calls.
May be for internal use cases instead of creating resolvers there should be a uncomplicated way to get and set things in query_cache to avoid too many indirections.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels like too much indirection for CacheKey
Agree, but I think it's intrinsic to the idea of a "cached cache key", not this implementation.
Some APIs on executor like Query can be called directly but other APIs like CacheKey need to go via the resolver path. So the decision to cache something is split across multiple places.
You must have misunderstood Executor.Query
– it never does any caching. So on the contrary, this ensures consistency because caching is never implemented in the metricsview
package, and only implemented one place (in the runtime/resolver.go
function (and the legacy runtime/query.go
, but that's being deprecated)).
It will be ideal to not let resolvers call other resolvers to prevent some cyclic calls.
I think resolvers calling other resolvers can actually be very powerful. Think of resolvers as memoized functions – it's completely fine for functions to call other functions, but of course, you should be careful if you call the same function recursively.
there should be a uncomplicated way to get and set things in query_cache to avoid too many indirections.
It's subjective, but I'm conversely more worried about the potential for bugs if stuff is getting cached at many different levels in the call chain. Having caching only happen in Resolve
IMO makes it easier to reason about when things are or are not cached. I guess there's a functional vs. imperative programming way to look at this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, but I think it's intrinsic to the idea of a "cached cache key", not this implementation.
Same indirection will be present if resolvers need to load watermark as well.
You must have misunderstood Executor.Query – it never does any caching
No I didn't meant that. I meant metrics
resolver and other such resolvers need to call Executor.Query
directly but call Executor.CacheKey
via runtime.Resolve
.
I think resolvers calling other resolvers can actually be very powerful.
I mean there is no denying that it is powerful. My reservations are more around ease of usage and ease of understanding. Creating resolvers for internal usage feels like adding too much bloat. I think there has to be a tradeoff between "perfect" abstractions and ease of understanding/usage. May be we are now in it is subjective territory.
It's subjective, but I'm conversely more worried about the potential for bugs if stuff is getting cached at many different levels in the call chain.
I don't know if the current approach is less bug prone. It is easy to miss the fact calling the Executor. CacheKey
in resolver needs to go via runtime.Resolve
and not to be called directly which is not the case with other APIs on executor.
runtime/metricsview/executor.go
Outdated
if res.Err() != nil { | ||
return nil, false, err | ||
} | ||
return []byte(key), true, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if we want to protect agains this, but often the key will be a timestamp and people may forget to cast it to a string.
An easy way to protect here is to use var key any
and then run it through jsonval.ToValue
and then json.Marshal
– that will almost always give you a safely serialized string representation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weirdly I did not find this issue in my testing and on digging further I realise that the sql
package already does conversion when source is time.Time
and dest is *string
: https://github.com/golang/go/blob/669d87a935536eb14cb2db311a83345359189924/src/database/sql/convert.go#L286
It does this conversions for most common types.
But I think it is good to run it through jsonval.ToValue
since it handles more types.
runtime/query.go
Outdated
cacheKey, err := r.metricsViewCacheKey(ctx, instanceID, res.Meta.Name.Name, priority) | ||
if err != nil { | ||
// skip caching | ||
// the cache_key_resolver should ideally only return an error if caching is disabled or context is cancelled | ||
return query.Resolve(ctx, r, instanceID, priority) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have a dedicated skip bool
for skipping caching. Because if the err
is a context cancellation, that's ideally an error we return immediately. It will also make it easier to understand if an unexpected error is actually returned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. In that case I think it will be best to expose errCachingDisabled
and check for it directly for now. Once query package is removed that can be made internal as well.
return nil, false, nil | ||
} | ||
|
||
hasher := md5.New() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is already being hashed in runtime/resolver.go
so could we return the full pointer here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The values were already being hashed in the older implementation :
rill/runtime/resolvers/metrics.go
Line 94 in 9061a24
func (r *metricsResolver) Key() string { |
Since the key is now composed of two things (cache_key
and query
) I thought of just using a md5 hasher.
Do you mean we create a new struct and just return any from CacheKey
instead of bytes ?
func (r *metricsViewCacheKeyResolver) Refs() []*runtimev1.ResourceName { | ||
return []*runtimev1.ResourceName{} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should have a ref to the metrics view to ensure invalidation if the underlying model is updated when streaming
is false?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Previously when not having the cacheable interface I was calling this for every metrics view and adding a ref on metrics_view would lead to circular dependency but now given it is called by resolvers it can add a ref on metrics view.
Though invalidation due to model update still happens because the queries also have the metricsview and hence model as ref.
Co-authored-by: Benjamin Egelund-Müller <[email protected]>
Co-authored-by: Benjamin Egelund-Müller <[email protected]>
No description provided.