-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deserialize rules session is unable to take advantage of cached rules #464
Comments
@k13gomez, For the proposal, I have a few thoughts:
On the deserialization side, today(if I remember correctly), we shouldn't be incurring the
I 100% agree, the current state of clara's caching mechanics in general probably need this face lift.
The generation of the With all of that being said, I am a little curious of the use-case. Again, clara's compilation is/has been a known hot spot, please don't take the comments as trying to dissuade from trying to make things better but rather as simple observations. |
@EthanEChristian See a few answers below, thank you for engaging in discussing these topic
The only reason I suggested it might be useful to also cache
We create thousands of sessions per hour (with caching), but then each session we create is serialized and stored, and may be picked up and deserialized several times for running queries. |
@k13gomez, This would likely require some work to deserialize and maintain the original rule-base, but subsequent Sessions could leverage the initial rule-base.I believe that this option was originally added due to a use-case where consumers had a single rule-base and then a multitude of memory(facts) that made distinct sessions, if that makes any sense. I'm not sure if this fits your use-case perfectly, it seems at least partially applicable if you are seeing benefits via the |
@EthanEChristian yes we currently do that, but the problem is our base-rulebase is loaded dynamically at runtime, and we don't know which base-rulebase to use until we deserialize it, I will try to optimize this further for our use case right now because it might be quicker to add a caching layer here by saving the signature of the rulebase along with it and then being able to load it from a cache. |
@k13gomez, Were you thinking about adding another implementation to the Generating an If that would work, then callers shouldn't need to know what rule-base or version of rule-base is needed to deserialize a session ahead of time. Was that what you were thinking as well? |
@EthanEChristian I think i'm going to try putting a caching layer in front of the call to deserialize the rulebase on our implementation, here's how everything would work, and we could bring something similar to be built into clara or add caching at compile-exprs: Rules Session caching:
Base Rulebase caching:
|
I was tinkering around with the caching on the deserialize side here to better walk through how we might cache the entire rulebase. This is a rough POC of what i was thinking: Its a little goofy in the sense that we have nested readers/writers for specifically the rulebase, but in theory it might work out to our benefit as, in the "Cache Hit" scenario, we could avoid transforming the serialized rulebase struct into its actual object and just chuck away the byte array. I haven't done any additional testing(TODO), but i would be interested to hear your thoughts in addition to @mrrodriguez and @WilliamParker. |
So the idea here is that the rulebase upon read is cached using its MD5 hash as a key, and then if there is a cache hit on read we don't bother reading the bytes for the rulebase again but just use the previous rulebase? I've toyed with the idea before of doing something like this, but with the user assigning the name of the session to be used as a key. Something like
Which is basically a similar idea to this. |
@EthanEChristian @WilliamParker I ended up putting a caching layer wrapping the deserialize function, this way the serialization itself is still pretty thin and the I can do the following optimization:
Although this approach is outside clara, it has the advantage of allowing the user to use whatever serializer/deserializer they want and the caching layer still works, I think this aproach could be easily added to clara-rules as a higher order middleware that can be user to instrument a session factory function which invokes
Clarification: for our use case, we always have access to the productions used to make the session, because they are loaded dynamically at runtime so we have to manually create and require their corresponding namespaces if they are missing, so we effectively serialize the productions separate from the session and rulebase so that we can ensure it is properly loaded every time, this makes it trivial to compute an hash of those productions. Serializing and deserializing the productions is very cheap compared to re-compiling them. |
When deserializing a rules session, to restore the session the expressions are compiled every time by calling
compile-exprs
, not taking advantage of caching functionality built intomk-session
, incurring the high cost of recompiling expressions every time a session is deserialized.See here:
https://github.com/cerner/clara-rules/blob/8b688ae5dc7fcd81dd4aaf83250c4b4d25b65b79/src/main/clojure/clara/rules/durability/fressian.clj#L598
Rules are also compiled via
compile-exprs
whenevermk-session
is invoked, so I propose to make a few changes to how caching works for rules:to-beta-graph
,to-alpha-graph
, andcompile-exprs
to be cached, so that whenever rules sessions are deserialized they do not incur the high cost of recompiling expressions if the rulebase has already been previously compiled.md5
hash of the production form automatically computed viadefrule
ordefquery
which can be used to more cheaply detect changes to rules and determine if the result of beta-graph, alpha-graph or compile-exprs is on the cache.Open to discussion, these are some ideas I came up with based on my company's use case; we compile serialized rules at runtime and cache them using a custom cache, we then stored the serialized session, but have had to temporarily move away from deserializing the session because due to how
mk-session
caching works it is actually cheaper to re-run rules with facts than to actually deserialize the rules session (which would require recompiling expressions).@EthanEChristian @WilliamParker @mrrodriguez I am happy to work on the described improvement above, have a clear picture of how I would do it, but would like some initial feedback if this is something you would be open to merging and releasing.
The text was updated successfully, but these errors were encountered: