Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat 🛠️ / Refresh connection when validator is down #11

Closed
wants to merge 5 commits into from

Conversation

butonium
Copy link
Contributor

@butonium butonium commented Dec 7, 2022

📜What is this PR for?

Disconnect client when validator is offline and reconnect when it's back up.

🛠️What type of PR is it?

Feature

✔️Issues

@butonium butonium requested a review from janlegner December 7, 2022 04:08
fn validator_check() -> Result<String, String> {
let sys = System::new_all();
for process in sys.processes_by_name("solana-valid") {
if process.run_time() / 60 > DEFAULT_VALIDATOR_WARMUP_MINUTES {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let run_time_minutes = process.run_time() / 60

Copy link
Collaborator

@janlegner janlegner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please format the code using rust formatter.

}
.connect()
.await?;
loop {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please simplify this loop. You got some unnecessary breaks/continues, ifs. Try to make it a bit more flat. If you find it difficult, move some parts of the code to a separate function.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The loop should be:

  1. Wait for validator to become ready, if that takes too long, exit
  2. Connect

@@ -79,6 +97,10 @@ async fn mtx_stream(
let mut response_stream = response.into_inner();

loop {
let result = validator_check();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a job for a oneshot channel, you can then process that in the tokio::select. This way it would slow down the loop a lot. Also, the whole feature must be behind a config flag. E.g. when QUIC client is used instead of local RPC forward, this should not be checked at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a watcher loop task that will send a shutdown message via oneshot channel when validator is offline

@butonium butonium requested a review from janlegner December 13, 2022 13:31
Copy link
Collaborator

@janlegner janlegner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Remove all unnecessary .clone()
  • The main fn 1) waits for validator to be online, 2) spawns watcher, 3) spawns grpc client.
  • mtx_stream has 1 extra argument: exit_signal (?) that enables clean exit of the watcher decides so.
  • Add one exatra parameter to the CLI which controls whether the watcher runs (default = false)

@butonium butonium requested a review from janlegner January 4, 2023 09:09
@janlegner janlegner closed this Dec 19, 2023
janlegner pushed a commit that referenced this pull request May 15, 2024
* Added rust toolchain

* Remove auth (#3)

* Remove auth

Remove need for JWT auth for internal deploy

* Remove auth from server

* Implement optional authentication

IF you don't provide the JWT public key then the server enters into non authenticated mode.

* Update domain and company name for the certs (#4)

* Fixes cert validation (#6)

* Add support for connecting to multiple servers (#7)

Adds support for connecting to multiple servers.

* Fix metrics per host (#9)

* Add support for connecting to multiple servers

Adds support for connecting to multiple servers.

* Added metrics per server

Adds source tag to metrics so metrics are reported per server.

* Formatting

* Added basic health check (#10)

* Added basic health check

* Update server/rpc_server.rs

Co-authored-by: Kirill Fomichev <[email protected]>

---------

Co-authored-by: Kirill Fomichev <[email protected]>

* Add authentication identifier (#11)

* Fix reporting name

* Get auth information from header

Get auth information from header so we can populate in metrics.

* Update server/rpc_server.rs

Co-authored-by: Kirill Fomichev <[email protected]>

* Update server/rpc_server.rs

Co-authored-by: Kirill Fomichev <[email protected]>

---------

Co-authored-by: Kirill Fomichev <[email protected]>

* Fix init metrics (#13)

Fix init metrics so we get 0 value.

* Bump version (#14)

* Grpc reconnect on disconnect (#15)

* retry logic

Signed-off-by: Wilfred Almeida <[email protected]>

* refactor: retry on disconnect

Signed-off-by: Wilfred Almeida <[email protected]>

* debugging

Signed-off-by: Wilfred Almeida <[email protected]>

* mtransaction client debugging 2

Signed-off-by: Wilfred Almeida <[email protected]>

* revert metrics changes

Signed-off-by: Wilfred Almeida <[email protected]>

* refactor(client): code cleanup

Signed-off-by: Wilfred Almeida <[email protected]>

* refactor(client): PR changes

Signed-off-by: Wilfred Almeida <[email protected]>

* Fix return value and log

* Fixed reconnection message

* Added some updates to log messages

* Removed duplicate break

---------

Signed-off-by: Wilfred Almeida <[email protected]>
Co-authored-by: linuskendall <[email protected]>

* Add unlimited retries (#16)

* Add more retries

Add unlimited retries with a delay.

* Version bump

* CLose metric channel if metric receiver is closed

* Moved metrics closed

* Update client/main.rs

Co-authored-by: Kirill Fomichev <[email protected]>

---------

Co-authored-by: Kirill Fomichev <[email protected]>

* feat: sendTransaction aliased function (#18)

Signed-off-by: Wilfred Almeida <[email protected]>

* Grpc hot reload (#19)

* feat: sendTransaction aliased function

Signed-off-by: Wilfred Almeida <[email protected]>

* feat: grpc urls file watcher

Signed-off-by: Wilfred Almeida <[email protected]>

* refactor: file watcher to signals

Signed-off-by: Wilfred Almeida <[email protected]>

* refactor: spawn management

Signed-off-by: Wilfred Almeida <[email protected]>

* refactor: task spawn/kill management

Signed-off-by: Wilfred Almeida <[email protected]>

* refactor: PR review fixes

Signed-off-by: Wilfred Almeida <[email protected]>

---------

Signed-off-by: Wilfred Almeida <[email protected]>

* Enforce preflight (#21)

* feat: sendTransaction aliased function

Signed-off-by: Wilfred Almeida <[email protected]>

* feat: grpc urls file watcher

Signed-off-by: Wilfred Almeida <[email protected]>

* refactor: file watcher to signals

Signed-off-by: Wilfred Almeida <[email protected]>

* refactor: spawn management

Signed-off-by: Wilfred Almeida <[email protected]>

* refactor: task spawn/kill management

Signed-off-by: Wilfred Almeida <[email protected]>

* refactor: PR review fixes

Signed-off-by: Wilfred Almeida <[email protected]>

* feat: enforce skipPreflight true

Signed-off-by: Wilfred Almeida <[email protected]>

---------

Signed-off-by: Wilfred Almeida <[email protected]>

---------

Signed-off-by: Wilfred Almeida <[email protected]>
Co-authored-by: Pablo Fontoura <[email protected]>
Co-authored-by: Kirill Fomichev <[email protected]>
Co-authored-by: Wilfred Almeida <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feat 🛠️ / Refresh connection when validator is down
2 participants