Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes bug with long delay with initialization of JournalSequenceActor #415

Merged
merged 1 commit into from
Jul 9, 2024

Conversation

lucavice
Copy link
Contributor

@lucavice lucavice commented Jul 8, 2024

Fixes #345

Changes

By studying the SqlReadJournal and JournalSequenceActor initialization code, I found a bug that caused a longer than expected delay for the reader to start replaying current events in the journal. The initial intended logic seemed correct, which is:

  1. When the JournalSequenceActor is booted up, it starts by detecting gaps in the ordering numbers in the journal in batches, starting from the very beginning. If a gap is found, it will wait for up to QueryDelay*MaxTries for the gap to be filled, in case another process is writing into the journal.
  2. In order to avoid replaying everything from the very beginning, a message AssumeMaxOrderingId is scheduled at the same time with a delay of QueryDelay*MaxTries, which is the same as the maximum wait for gaps. When this message arrives, if the gaps being analyzed have not caught up yet to the total max of the journal, it will force to update the current ordering being inspected to be the max available in the journal. This makes perfect sense, as it will guarantee that when the SqlReadJournal boots up, it will wait at max QueryDelay*MaxTries for any gaps to be filled, regardless of the length of the journal.

Unfortunately, due to a bug using an incorrect parameter, step 2 was not working as intended. This means that for journal with large number of events and gaps, the initialization could take a very long time.

The change only fixes the supplied parameter with the original intent, which is to force the scanned ordering number to be the max after the configured delay. It was instead using _maxTries, which is simply the number of tries to wait for gap (by default this value is 10).

In my tests, after this change the maximum amount of wait at that I experience at SqlReadJournal initialization is always the expected 10s (which is QueryDelay [1s] * MaxTries [10]). Previously, it could take several minutes or even hours.

Copy link
Member

@to11mtm to11mtm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for digging into this!

Looks good and good write-up. <3

@@ -71,7 +71,7 @@ private bool ReceiveHandler(object message)
o =>
ReceiveHandler(
o,
_maxTries,
a.Max,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦

Nice catch. 😅

Copy link
Member

@Aaronontheweb Aaronontheweb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Aaronontheweb Aaronontheweb merged commit 02501ba into akkadotnet:dev Jul 9, 2024
3 checks passed
@lucavice
Copy link
Contributor Author

Thanks everybody for merging quickly and releasing 1.5.26 with the fix. I have deployed it to our staging environment and I can confirm that it now works fine with just the expected small initial delay.

I have also been running load testing on our environment to compare performance between Akka.Persistence.Sql and the older Akka.Persistence.SqlServer. We have seen some fairly dramatic performance improvements on our use case, even without using the Tag Table. If it's of any interest to you, I could share with you some charts, but I'm not sure where (it does not fit the topics of issues?).

@Aaronontheweb
Copy link
Member

If it's of any interest to you, I could share with you some charts, but I'm not sure where (it does not fit the topics of issues?).

@lucavice yes, that would be wonderful if you could share that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CurrentEventsByTag query on Sqlite has a weird initial delay
3 participants