-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KV Limits #121
Comments
First we need to figure out what's using the KV so much. Can you see in the dashboard if it's kernel or some other plugin? |
Unfortunately, I can't find any useful information from the analytics. I included in the screenshots everything relevant I could find. |
Lately we added https://github.com/ubiquity-os-marketplace/generate-vector-embeddings that reacts to 7 different events. Each run of a plugin equals one KV |
I think the base of the problem is that we need to persist data between runs since the worker gets destroyed after its job is done, which is why we used KV in the first place. So the alternatives to store the data we could maybe consider:
|
I got the 90% warning again today and today just started. Yes I have a feeling it must be the vector embeddings plugin. Perhaps we need to optimize it. @sshivaditya2019 as a heads up, let us know if you have ideas for optimizing your plugin. Cloudflare KV is used to manage state across "plugin chain" runs. Basically in our config when we define multiple plugins to be invoked by a specific webhook (such as Is it realistic to check if it's the last event in the plugin chain and not keep track of it anymore? That way we can put the heavy ones at the end, like vector embeddings? For example, issue comment created and we have three plugins. Kernel executes first and second normally, but it knows the last one is next so it executes and does not read/write KV. If it executes it I don't see why it needs to keep track anymore. |
A straightforward way to optimize would be to divide this functionality into several plugins. For instance, "Issue Matching" could function as one action plugin, while "Issue Deduplication" could be another. We could further enhance efficiency by implementing batch processing for comments, rather than triggering actions every time a comment is edited or deleted. Alternatively, we could maintain the current setup but use a Postgres connection URI instead of the Supabase key and URI. We could also implement the embedding generation as a Postgres function. We save close to 14 KV operations per invocation. Another alternative would be to limit access to the anonymous key (Clear Text) as much as possible (RLS with Policy) and instead pass JWT tokens from the kernel. These tokens would then be used by the worker to make calls to the Supabase REST API. |
Anything on a timer is a no-go. Can you make batch processing events based? How does breaking it apart into separate plugins help with this? Also won't it cause a lot of code duplication? I always prefer breaking apart plugins wherever possible for enhanced modularity.
Saving 14 KV operations per invocations is massive. Lets do this immediately!
This I don't understand how it helps. |
Well technically we don't need to keep track if it's the last plugin or only 1 plugin in the chain, but that also means that we don't get to use the response from the plugin. We currently don't use it but we might in the future for example if plugin returns rewards to the kernel or returns comment html for kernel to post... |
Seems janky to have a switch in the config to enable this feature |
There are plugins that run quite a lot like https://github.com/ubiquity-os-marketplace/automated-merging and https://github.com/ubiquity-os-marketplace/disqualifier when these would only need one run once a day (this would save hundreds of KV calls).I know you're against CRONS but finding something that would behave similarly would be very helpful. |
Let's focus on the most prominent problem (vector embeddings plugin) and then work our way down to optimize others as needed. I have some half baked ideas how to handle these "cron suitable" events. I think there's potential for a solution using my |
I think |
Coming back to this, after having the My idea would be to add a |
disqualiferIs disqualifier ignoring bot comments? If not then its recursively invoking itself. It's poorly implemented and should be redone. "daemons"For any "daemon" class plugin, if we want the clock to be frequent but also be smart about the use of our KV then here is a solution that combines my previous proposal: We have a "queue job plugin" at the end of our commented event plugin chain. All that does is act as a queue/buffer. It collects a queue of jobs to run, with a job nonce, and we can set the recurring runs per time interval (like four times a day) from within its configuration. nonceThe job nonce should essentially deduplicate what would be redundant jobs, for example, following up on a particular issue (only needs to happen once per interval.) this could also be referred to as a job ID, which describes the type of action (a plugin developer defined action class name) and where it occurs (perhaps a node ID of an issue or pull)
The benefit of this approach is that if nothing is in the queue, it should not attempt to run. As a final optimization (although i realize now it might not be necessary) is that because it's in the end of the plugin chain, we can stop monitoring KV for any subsequent "daemon" events from the buffer/queue |
Projected Costs
6,666.6666666667
per $1 of costNext Steps
Let's discuss how we can optimize the KV usage of the kernel.
The text was updated successfully, but these errors were encountered: