-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the groundwork for failover RPC support #110
base: master
Are you sure you want to change the base?
Conversation
* Remove the `rpc_host` and `rpc_port` fields from the configuration file format and add `primary_rpc_host`, `primary_rpc_port`, `failover_rpc_host`, and `failover_rpc_port`. * Add the `RpcUrl` and `RpcUrlKind` types. * `RpcUrl` contains a host name and a port. * `RpcUrlKind` has variants for primary and failover urls and contains a `RpcUrl`. * Remove `rpc_host` and `rpc_port` and add primary and failover RPC url fields to `config::Config` et. al. * Update `app::Connections::new_http` to automatically use the failover url in the event of an error connecting to the primary. * Add the `home_url` and `foreign_url` to `Connections` containing the url/kind (a `RpcUrlKind`) currently in use. * Update logging. * Update tests.
Thanks! I will look at this PR tomorrow (I am in MSK timezone) and provide my feedback. |
After some further thinking about this problem, here's my take on how it should be implemented to keep it simple and yet flexible so that even future scenarios can be accommodated. Instead of building very specific features (such as failover RPC), we can further utilize the approach that has been used in the bridge in the past couple of months: delegating failure handling to a supervisor process. Currently, an exit code for failed RPC connection doesn't make distinction between home and foreign. However, if we track the side on which the failure has occurred and use different exit codes for different sides, the supervisor can simply restart the bridge with a different config, according to its own logic. Could be more than two failovers, for example, or a check on what's available using curl and then using the appropriate config. This will work particularly well combined with persistent transaction queues. |
bridge/src/bridge/deploy.rs
Outdated
TransactionWithConfirmation(self.app.connections.foreign.clone(), | ||
self.app.config.foreign.poll_interval, | ||
self.app.config.foreign.required_confirmations) | ||
); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like these are formatting changes unrelated to the actual change. They obscure the substance of the change, making it harder to understand what's important and what's not.
bridge/src/bridge/deposit_relay.rs
Outdated
api::send_transaction_with_nonce(self.app.connections.foreign.clone(), | ||
self.app.connections.foreign_url.clone(), self.app.clone(), | ||
self.app.config.foreign.clone(), tx, self.foreign_chain_id, | ||
SendRawTransaction(self.app.connections.foreign.clone())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like these are formatting changes unrelated to the actual change. They make sense, but they would have been a lot easier to process as a part of separate "formatting" patch, both for review and later reading.
api::send_transaction_with_nonce(self.app.connections.foreign.clone(), | ||
self.app.connections.foreign_url.clone(), self.app.clone(), | ||
self.app.config.foreign.clone(), tx, self.foreign_chain_id, | ||
SendRawTransaction(self.app.connections.foreign.clone())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like these are formatting changes unrelated to the actual change.
bridge/src/bridge/withdraw_relay.rs
Outdated
@@ -232,7 +233,8 @@ impl<T: Transport> Stream for WithdrawRelay<T> { | |||
nonce: U256::zero(), | |||
action: Action::Call(contract), | |||
}; | |||
api::send_transaction_with_nonce(t.clone(), app.clone(), home.clone(), tx, chain_id, SendRawTransaction(t.clone())) | |||
api::send_transaction_with_nonce(t.clone(), t_url.clone(), app.clone(), | |||
home.clone(), tx, chain_id, SendRawTransaction(t.clone())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this is a formatting change unrelated to the actual change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I made these formatting changes to make the code more readable (and to adhere more closely to the Rust style guidelines. I'll put them all in a separate PR.
// the transport and the url upon success. | ||
fn connect(handle: &Handle, url_primary: &RpcUrl, url_failover: Option<&RpcUrl>, | ||
concurrent_connections: usize) -> Result<(Http, RpcUrlKind), Web3Error> { | ||
match Http::with_event_loop(&url_primary.to_string(), handle, concurrent_connections) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the most important part of my review. My reading of https://github.com/tomusdrw/rust-web3/blob/master/src/transports/http.rs#L78 suggests that this call will never fail if primary is not available as it doesn't attempt the connection, meaning bridge will always try primary, then fail, restart and try it again to no avail.
Would love to be proven wrong, though -- if I misread any part or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're right which is why I wanted to make clear that this PR doesn't properly address the issue. It looks like the connection errors are passed up through the spawned future. I'll need to dig around and look into how those are currently handled (see below).
I agree, this is a superior approach in this situation. And yes, this will need some re-architecting. I'll have a closer look at this later in the week and put together some ideas. |
It looks like I'm going to have to put this PR on the back-burner for now. I should be able to get back to it sometime in the next few weeks. Let me know if you come up with any other plans or suggestions, or if there are some simple things I can do to this PR to make it mergeable for now and deal with the larger issues in a separate PR later. |
I think this PR is premature insofar as indicated by the comment in which I mentioned that it is unlikely to actually ever switch over (leaving aside the suggestion for simplifying this whole solution), so let's return to this later or if @akolotov needs it sooner, I can take care of it in the coming days. |
Fixed some typos and made a section more coherent
Changes
rpc_host
andrpc_port
fields from the configuration fileformat and add
primary_rpc_host
,primary_rpc_port
,failover_rpc_host
,and
failover_rpc_port
.RpcUrl
andRpcUrlKind
types.RpcUrl
contains a host name and a port.RpcUrlKind
has variants for primary and failover urls and contains aRpcUrl
.rpc_host
andrpc_port
and add primary and failover RPC url fieldsto
config::Config
et. al.app::Connections::new_http
to automatically use the failover url inthe event of an error connecting to the primary.
home_url
andforeign_url
toConnections
containing theurl/kind (a
RpcUrlKind
) currently in use.Closes #107 (maybe)
I don't have a very sophisticated test setup so this will definitely need more comprehensive testing in a more realistic environment.
@akolotov: If I understand your original issue correctly (please correct me if not), the main purpose of this is for the failover not simply to be available upon startup, but to be used in the event of a disconnection. I'm not familiar enough with the whole project yet to know exactly how and where that happens. This patch only makes use of the failover url on startup (creation of an
App
/Connection
) but does nothing special in the event of a disconnect at a later point. Therefore it may not completely address your issue and should be considered only a partial implementation.I could use your help showing me where and how disconnection is currently is handled so that I can add some more logic to that process. I'm available on voice or Zoom if that's easier. Thanks!