-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes bug with long delay with initialization of JournalSequenceActor #415
Conversation
…rderingId to not working as expected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for digging into this!
Looks good and good write-up. <3
@@ -71,7 +71,7 @@ private bool ReceiveHandler(object message) | |||
o => | |||
ReceiveHandler( | |||
o, | |||
_maxTries, | |||
a.Max, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤦
Nice catch. 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks everybody for merging quickly and releasing 1.5.26 with the fix. I have deployed it to our staging environment and I can confirm that it now works fine with just the expected small initial delay. I have also been running load testing on our environment to compare performance between |
@lucavice yes, that would be wonderful if you could share that |
Fixes #345
Changes
By studying the
SqlReadJournal
andJournalSequenceActor
initialization code, I found a bug that caused a longer than expected delay for the reader to start replaying current events in the journal. The initial intended logic seemed correct, which is:JournalSequenceActor
is booted up, it starts by detecting gaps in the ordering numbers in the journal in batches, starting from the very beginning. If a gap is found, it will wait for up toQueryDelay*MaxTries
for the gap to be filled, in case another process is writing into the journal.AssumeMaxOrderingId
is scheduled at the same time with a delay ofQueryDelay*MaxTries
, which is the same as the maximum wait for gaps. When this message arrives, if the gaps being analyzed have not caught up yet to the total max of the journal, it will force to update the current ordering being inspected to be the max available in the journal. This makes perfect sense, as it will guarantee that when theSqlReadJournal
boots up, it will wait at maxQueryDelay*MaxTries
for any gaps to be filled, regardless of the length of the journal.Unfortunately, due to a bug using an incorrect parameter, step 2 was not working as intended. This means that for journal with large number of events and gaps, the initialization could take a very long time.
The change only fixes the supplied parameter with the original intent, which is to force the scanned ordering number to be the max after the configured delay. It was instead using
_maxTries
, which is simply the number of tries to wait for gap (by default this value is 10).In my tests, after this change the maximum amount of wait at that I experience at
SqlReadJournal
initialization is always the expected 10s (which is QueryDelay [1s] * MaxTries [10]). Previously, it could take several minutes or even hours.