-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix the events service connection #257
Conversation
crates/shared/src/utils.rs
Outdated
} | ||
}, | ||
} | ||
|
||
// Disconnect and reconnect if no event has been received within quite a long time (set to 1s by default) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code will be executed after we received the event, so what we'll detect here is a long period without events that has just ended. What we should do instead is use tokio::timeout
with connection.next()
and handle timeout
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, just want to make sure I understand correctly: here we'll wait pretty long before hit this line, but with tokio::timeout
we'll only wait for the period I set, is that correct?
Pull Request Test Coverage Report for Build 12118062892Warning: This coverage report may be inaccurate.This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Details
💛 - Coveralls |
crates/shared/src/utils.rs
Outdated
|
||
// Simulate idle timeout by stopping the server and waiting | ||
app_handle.abort(); | ||
tokio::time::sleep(IDLE_TIMEOUT + Duration::from_millis(500)).await; // Wait longer than idle timeout |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't this just be IDLE_TIMEOUT
? Or can we increase IDLE_TIMEOUT
to 2 seconds if we want it to be longer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If these consts are tied (maybe not?) with the setting in non-test code, e.g., the connect
function, can we move them to the non-test module near RETRY_PERIOD
and replace hard-coded values with them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
500ms is to make sure we wait at least IDLE_TIMEOUT. It's also fine to do a smaller add-on. Updated to RETRY_PERIOD
in 2a9d3ab.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still confused about the purpose of setting the sleep
period at least RETRY_PERIOD
if we create a new app immediately afterward.
IIUC when the period exceeds RETRY_PERIOD
, the connection.next()
call will enter the Err(_)
case, whereas before this PR, it would get stuck. However, if we create a new_app_handle
immediately after this sleep
period, it looks like we are essentially testing what's already covered in test_event_stream_wrapper
.
I think we should verify whether .next()
returns an error as expected after the sleep
, i.e., whether we do try to reconnect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As replied to a previous discussion, I think the test needs additional assertion to make sure the reconnection works.
Closes #255
Pasted here what @Ayiga mentioned regarding the connection issue:
I noticed during the running of the inscriptions service, after Mainnet went through it’s degraded behavior and recovered, that the Block stream never recovered.
This PR:
This PR does not:
Key places to review: