-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CurrentEventsByTag query on Sqlite has a weird initial delay #345
Comments
That is very weird - is that a Linq2Db discrepancy possibly @to11mtm ? |
Possibly but I'd be surprised; MSSqlite is pretty fast, In most work environments/etc I actually tend to use it for writing DAL quasi-integration tests and protyping tests, a delay like that overall would be something I feel like I'd have seen. That said, I know #347 fixes some things that could cause pain in SQLite for existing data, but not necessarily for a 'fresh' DB. @kpko what does your connection string look like? |
Hi, I managed to reproduce the issue on a smaller stripped down example project: https://github.com/kpko/sqlite-query-delay-example I needed some time to reproduce the issue because on my first tries it worked just fine and the query ran pretty quickly. I then tried to align the example app more like my main project and was confused when everything worked as expected, even when I copied most of the functionality over. Finally I was able to reproduce this when I more or less copied the data in the database and I found the culprit: Once the ordering_id (on the journal and the tag table) is higher, the 10s delay starts. When there are two events/tags in the database with ordering id 1 and 2, theres practically no delay. When I manually edit the database and change the ordering id to 300 and 301 respectively, the initial delay happens. After, 00:00:10.4972806 delay (stopwatch): I tried to figure out at what value this starts and it seems like it already can be reproduced when ordering id is 2 instead of 1. Is this some kind of Sqlite query issue? |
This explains a lot, thank you! If the ordering numbers are non linear or have a large gap, there will be delays on queries. Why:Consider the case where multiple processes are writing to the DB concurrently. If A starts writing BEFORE B, but doesn't finish until AFTER B... It's possible that A succeeds (delay in visibility of ordered elements) or fails (gaps in ordering), but in general, if we see a 'gap' in ordering IDs, the logic will 'scan' the range of IDs it sees as missing for a certain period of time before finally 'releasing' the current Max Ordering ID. Backoff defaults are configured/shown here. Note that the default backoff for a missing value is 10 queries with 1 second between... lines up fairly well with your timings. |
Following up on this, speaking with Aaron we are issue open to continue investigating potential solutions due to the behavior's pain points. |
Hi @to11mtm, thank you very much for the explanation. I understand why it's implemented in this way now. For my concerns this is totally fine. I delete event streams once certain workflows in my project complete, that produced gaps in my log table pretty quickly. I have no idea how to work around the issue you mentioned, I didn't consider this scenario. Would you like the issue to stay open to work on an alternative solution? Otherwise please feel free to close this issue. Again, thank you for your time and the explanation. Keep up the good work :) |
We are trying to work towards a better solution for this, so will keep this issue open for now. Thank you! :) |
Hi @to11mtm, I think I'm experiencing a similar issue to the OP but I'm not sure it has to do with gaps. I'm using MS SQL instead of Postgres, and I am experiencing an initial delay with When I use
Reducing the backoff parameters (number of tries, query delay, etc) does affect this intial delay, so it does seem related to this backpressure mechanism. Now my question is: what is the correct way to use At the moment, it makes I can open a new issue if this turns out to be unrelated to the OP request. Thanks for your help! |
Would it be possible, as a temp workaround, to update the back off logic to disregard gaps for events written more than a day / an hour ago? A delay of tens of seconds might be acceptable, many minutes is really not... |
I think that's a good idea |
@ondrejpialek @Aaronontheweb @to11mtm So, I added some additional log statements into a local version of
You can see below a simulation of my local environment, where I have 27393 events. In fact, my local environment projection takes 30s to start, which is consistent with this wait process 3*10s. For our staging environment, we have 1.4m events and 21k gaps. This means that most of 10'000-sized batches will likely encounter a gap. So this means that it will need to wait, at worst case, 140 times 10s with default settings, which is consistent with what I saw (17 min, a bit less then the worst case). I agree with @ondrejpialek , I think the gap-finding code should ignore any gaps in OrderingId that happened more than MaxTries*QueueDelay ago... |
Yikes, that would do it.
I will happily accept a PR to that effect - gaps should have a freshness window. We only care about very recent history, so maybe make this configurable? |
@Aaronontheweb @ondrejpialek @to11mtm It turns out that the logic was already in place and correct, but a small bug prevented this 'freshness' window to be applied as intended. I provided a PR here: #415 |
Version Information
I'm using Akka.Cluster.Hosting v1.5.15 and the associated libraries like Akka 1.5.15.
Describe the performance issue
I'm using an ASP.NET Core background service that uses a PersistenceQuery (CurrentEventsByTag) that should be run at application start to retrieve some data and cache it in-memory. I want my app to scale using PostgreSQL but also scale down using Sqlite which works great by using Akka.Persistence.Sql and making the provider itself configurable.
The background service I'm using looks like this and is added to the ASP.NET Core app via AddHostedService.
When using the Sqlite provider (with Microsoft.Data.Sqlite v8.0.1), I get my first event after 11 seconds:
This delay doesn't happen with PostgreSQL or using another query type. Is there a reason the CurrentEventsByTag query runs after a 10-11s delay? The Sqlite database itself it empty except for the two events in the journal. I'm using the tag table instead of csv tags.
Could you give me a tip on how to get more information of the persistence query performance?
Also I expect to be the Sqlite provider to be slower than the PostgreSQL provider and a 10s delay is not the end of the world for an ASP.NET Core web app - but this doesn't feel like just "bad performance" but more like theres an intentional 10s delay somewhere and I would like to understand why.
Environment
I'm using a 16-inch 2021 MacBook Pro (M1) with 16 GB RAM.
.NET SDK:
Version: 8.0.101
Commit: 6eceda187b
Workload version: 8.0.100-manifests.8a1da915
Runtime Environment:
OS Name: Mac OS X
OS Version: 14.0
OS Platform: Darwin
RID: osx-arm64
Base Path: /usr/local/share/dotnet/sdk/8.0.101/
The text was updated successfully, but these errors were encountered: