Add the groundwork for failover RPC support #110

c0gent · 2018-06-19T22:02:33Z

Changes

Remove the rpc_host and rpc_port fields from the configuration file
format and add primary_rpc_host, primary_rpc_port, failover_rpc_host,
and failover_rpc_port.
Add the RpcUrl and RpcUrlKind types.
- RpcUrl contains a host name and a port.
- RpcUrlKind has variants for primary and failover urls and contains a
  RpcUrl.
Remove rpc_host and rpc_port and add primary and failover RPC url fields
to config::Config et. al.
Update app::Connections::new_http to automatically use the failover url in
the event of an error connecting to the primary.
Add the home_url and foreign_url to Connections containing the
url/kind (a RpcUrlKind) currently in use.
Update logging.
Update tests.

Closes #107 (maybe)

I don't have a very sophisticated test setup so this will definitely need more comprehensive testing in a more realistic environment.

@akolotov: If I understand your original issue correctly (please correct me if not), the main purpose of this is for the failover not simply to be available upon startup, but to be used in the event of a disconnection. I'm not familiar enough with the whole project yet to know exactly how and where that happens. This patch only makes use of the failover url on startup (creation of an App/Connection) but does nothing special in the event of a disconnect at a later point. Therefore it may not completely address your issue and should be considered only a partial implementation.

I could use your help showing me where and how disconnection is currently is handled so that I can add some more logic to that process. I'm available on voice or Zoom if that's easier. Thanks!

* Remove the `rpc_host` and `rpc_port` fields from the configuration file format and add `primary_rpc_host`, `primary_rpc_port`, `failover_rpc_host`, and `failover_rpc_port`. * Add the `RpcUrl` and `RpcUrlKind` types. * `RpcUrl` contains a host name and a port. * `RpcUrlKind` has variants for primary and failover urls and contains a `RpcUrl`. * Remove `rpc_host` and `rpc_port` and add primary and failover RPC url fields to `config::Config` et. al. * Update `app::Connections::new_http` to automatically use the failover url in the event of an error connecting to the primary. * Add the `home_url` and `foreign_url` to `Connections` containing the url/kind (a `RpcUrlKind`) currently in use. * Update logging. * Update tests.

akolotov · 2018-06-19T23:43:12Z

Thanks! I will look at this PR tomorrow (I am in MSK timezone) and provide my feedback.

yrashk · 2018-07-02T02:50:55Z

After some further thinking about this problem, here's my take on how it should be implemented to keep it simple and yet flexible so that even future scenarios can be accommodated.

Instead of building very specific features (such as failover RPC), we can further utilize the approach that has been used in the bridge in the past couple of months: delegating failure handling to a supervisor process. Currently, an exit code for failed RPC connection doesn't make distinction between home and foreign. However, if we track the side on which the failure has occurred and use different exit codes for different sides, the supervisor can simply restart the bridge with a different config, according to its own logic. Could be more than two failovers, for example, or a check on what's available using curl and then using the appropriate config.

This will work particularly well combined with persistent transaction queues.

yrashk · 2018-07-02T02:56:28Z

bridge/src/bridge/deploy.rs

+								TransactionWithConfirmation(self.app.connections.foreign.clone(),
+									self.app.config.foreign.poll_interval,
+									self.app.config.foreign.required_confirmations)
+							);



It looks like these are formatting changes unrelated to the actual change. They obscure the substance of the change, making it harder to understand what's important and what's not.

yrashk · 2018-07-02T02:57:42Z

bridge/src/bridge/deposit_relay.rs

+							api::send_transaction_with_nonce(self.app.connections.foreign.clone(),
+								self.app.connections.foreign_url.clone(), self.app.clone(),
+								self.app.config.foreign.clone(), tx, self.foreign_chain_id,
+								SendRawTransaction(self.app.connections.foreign.clone()))


It looks like these are formatting changes unrelated to the actual change. They make sense, but they would have been a lot easier to process as a part of separate "formatting" patch, both for review and later reading.

yrashk · 2018-07-02T02:58:27Z

bridge/src/bridge/withdraw_confirm.rs

+							api::send_transaction_with_nonce(self.app.connections.foreign.clone(),
+								self.app.connections.foreign_url.clone(), self.app.clone(),
+								self.app.config.foreign.clone(), tx, self.foreign_chain_id,
+								SendRawTransaction(self.app.connections.foreign.clone()))


It looks like these are formatting changes unrelated to the actual change.

yrashk · 2018-07-02T02:58:54Z

bridge/src/bridge/withdraw_relay.rs

@@ -232,7 +233,8 @@ impl<T: Transport> Stream for WithdrawRelay<T> {
 									nonce: U256::zero(),
 									action: Action::Call(contract),
 								};
-							    api::send_transaction_with_nonce(t.clone(), app.clone(), home.clone(), tx, chain_id, SendRawTransaction(t.clone()))
+							    api::send_transaction_with_nonce(t.clone(), t_url.clone(), app.clone(),
+							    	home.clone(), tx, chain_id, SendRawTransaction(t.clone()))


It looks like this is a formatting change unrelated to the actual change.

Yes I made these formatting changes to make the code more readable (and to adhere more closely to the Rust style guidelines. I'll put them all in a separate PR.

yrashk · 2018-07-02T03:08:20Z

bridge/src/app.rs

+		// the transport and the url upon success.
+		fn connect(handle: &Handle, url_primary: &RpcUrl, url_failover: Option<&RpcUrl>,
+				concurrent_connections: usize) -> Result<(Http, RpcUrlKind), Web3Error> {
+			match Http::with_event_loop(&url_primary.to_string(), handle, concurrent_connections) {


This is the most important part of my review. My reading of https://github.com/tomusdrw/rust-web3/blob/master/src/transports/http.rs#L78 suggests that this call will never fail if primary is not available as it doesn't attempt the connection, meaning bridge will always try primary, then fail, restart and try it again to no avail.

Would love to be proven wrong, though -- if I misread any part or something.

I think you're right which is why I wanted to make clear that this PR doesn't properly address the issue. It looks like the connection errors are passed up through the spawned future. I'll need to dig around and look into how those are currently handled (see below).

c0gent · 2018-07-02T13:08:35Z

Instead of building very specific features (such as failover RPC), we can further utilize the approach that has been used in the bridge in the past couple of months: delegating failure handling to a supervisor process.

I agree, this is a superior approach in this situation. And yes, this will need some re-architecting. I'll have a closer look at this later in the week and put together some ideas.

c0gent · 2018-07-03T14:07:54Z

It looks like I'm going to have to put this PR on the back-burner for now. I should be able to get back to it sometime in the next few weeks. Let me know if you come up with any other plans or suggestions, or if there are some simple things I can do to this PR to make it mergeable for now and deal with the larger issues in a separate PR later.

yrashk · 2018-07-06T04:41:26Z

I think this PR is premature insofar as indicated by the comment in which I mentioned that it is unlikely to actually ever switch over (leaving aside the suggestion for simplifying this whole solution), so let's return to this later or if @akolotov needs it sooner, I can take care of it in the coming days.

Fixed some typos and made a section more coherent

akolotov requested review from akolotov and yrashk June 19, 2018 23:43

yrashk reviewed Jul 2, 2018

View reviewed changes

Revert some formatting-only changes.

681b975

c0gent force-pushed the failover-rfc branch from d2ebbc5 to 681b975 Compare July 3, 2018 02:21

noot pushed a commit to noot/poa-bridge that referenced this pull request Jul 18, 2018

Merge pull request omni#110 from GriffGreen/patch-1

9dfd398

Fixed some typos and made a section more coherent

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add the groundwork for failover RPC support #110

Add the groundwork for failover RPC support #110

c0gent commented Jun 19, 2018 •

edited

Loading

akolotov commented Jun 19, 2018

yrashk commented Jul 2, 2018 •

edited

Loading

yrashk Jul 2, 2018

yrashk Jul 2, 2018

yrashk Jul 2, 2018

yrashk Jul 2, 2018

c0gent Jul 2, 2018

yrashk Jul 2, 2018

c0gent Jul 2, 2018

c0gent commented Jul 2, 2018

c0gent commented Jul 3, 2018

yrashk commented Jul 6, 2018

Add the groundwork for failover RPC support #110

Are you sure you want to change the base?

Add the groundwork for failover RPC support #110

Conversation

c0gent commented Jun 19, 2018 • edited Loading

Changes

akolotov commented Jun 19, 2018

yrashk commented Jul 2, 2018 • edited Loading

yrashk Jul 2, 2018

Choose a reason for hiding this comment

yrashk Jul 2, 2018

Choose a reason for hiding this comment

yrashk Jul 2, 2018

Choose a reason for hiding this comment

yrashk Jul 2, 2018

Choose a reason for hiding this comment

c0gent Jul 2, 2018

Choose a reason for hiding this comment

yrashk Jul 2, 2018

Choose a reason for hiding this comment

c0gent Jul 2, 2018

Choose a reason for hiding this comment

c0gent commented Jul 2, 2018

c0gent commented Jul 3, 2018

yrashk commented Jul 6, 2018

c0gent commented Jun 19, 2018 •

edited

Loading

yrashk commented Jul 2, 2018 •

edited

Loading