Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

routing: Add automatic failover #65

Merged
merged 5 commits into from
Jan 25, 2024
Merged

Conversation

ethan-readyset
Copy link
Contributor

@ethan-readyset ethan-readyset commented Jan 10, 2024

This commit adds an optional automatic failover feature to the gem. If the number of connection errors to ReadySet exceeds a preconfigured threshold, ReadySet is considered to be unhealthy, all queries are routed upstream, and a background task is spawned that periodically checks ReadySet's health. When ReadySet is determined to be healthy again, the task is stopped, and queries are routed back to ReadySet.

Closes #45

@helpotters
Copy link
Contributor

I'm still working through this PR.

I have also employed the use of various code analysis tools such as skunk and rubycritic.

Thanks to your code-review feedback, I'll try to keep the scope in-mind of our goal (v1 release).

@ethowitz ethowitz force-pushed the fallback-to-upstream branch from b6fac49 to 01937d2 Compare January 19, 2024 22:30
@ethan-readyset
Copy link
Contributor Author

Rebased!

Copy link
Contributor

@helpotters helpotters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been reading up on some refactoring stuff! Let me know what you think about my suggestions here. Mostly nitpicks. Everything else seems great, let me know if you found any friction in the specs that I should double-check.

class Healthchecker
UNHEALTHY_ERRORS = [::PG::UnableToSend, ::PG::ConnectionBad].freeze

def initialize(healthcheck_interval:, error_window_period:, error_window_size:, shard:)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

polish: What do you think about reducing this to a configuration settings hash?

healthcheck_interval, error_window_period, and error_window_size seem to have a 'settings' behavior in-common.

      def initialize(healthcheck_config, shard:)
        @healthy = Concurrent::AtomicBoolean.new(true)
        @config = healthcheck_config
        @healthchecks = Health::Healthchecks.new(shard: shard)
        @lock = Mutex.new
        @shard = shard
        @window_counter = Readyset::Utils::WindowCounter.new(
          window_size: @config[:error_window_size],
          time_period: @config[:error_window_period]
        )
      end

This is an untested code-snippet.


Going a bit further, it may be possible to reduce the number of instance-variables as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me! Though I think I'd prefer to leave the instance variables the way they are -- I think it makes it clearer what attributes are available on the class

Comment on lines +107 to +116
def is_readyset_connection_error?(exception)
if exception.cause
is_readyset_connection_error?(exception.cause)
else
UNHEALTHY_ERRORS.any? { |e| exception.is_a?(e) } &&
exception.is_a?(Readyset::Error)
end
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Do you think this could improve readability?

I feel like this could make it a bit clearer as to the three steps it's going through.

      def is_readyset_connection_error?(exception)
        return true if exception.is_a?(Readyset::Error)
        return UNHEALTHY_ERRORS.include?(exception.class) if exception.cause.nil?
        is_readyset_connection_error?(exception.cause)
      end

This snippet passes test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I typically avoid trailing conditionals and prefer standard if ... else statements. IMO they read a bit easier, but I think it's just a matter of opinion 🤷

lib/readyset/health/healthchecker.rb Show resolved Hide resolved
lib/readyset/health/healthchecker.rb Show resolved Hide resolved
lib/readyset/configuration.rb Outdated Show resolved Hide resolved
Comment on lines +14 to +20
def healthy?
connection.execute('SHOW READYSET STATUS').any? do |row|
row['name'] == 'Database Connection' && row['value'] == 'Connected'
end
rescue
false
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: At a glance, this seems closer to a parser 👓 than a ✔️ healthcheck.

Do you think row would be considered referencing something "outside" of the Healthchecks object?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I'm not sure I follow -- can you expound a bit?

The health is being checked by ensuring that 1) we can successfully connect to ReadySet and run SHOW READYSET STATUS and 2) ReadySet is connected to the upstream database as expected

This commit adds an optional automatic failover feature to the gem. If
the number of connection errors to ReadySet exceeds a preconfigured
threshold, ReadySet is considered to be healthy, all queries are routed
upstream, and a background task is spawned that periodically checks
ReadySet's health. When ReadySet is determined to be healthy again, the
task is stopped, and queries are routed back to ReadySet.

Closes #45
@ethowitz ethowitz force-pushed the fallback-to-upstream branch from 6d8cdfc to 8a07b83 Compare January 23, 2024 18:49
@ethan-readyset
Copy link
Contributor Author

@helpotters Thanks for the great suggestions! Addressed the comments and rebased, so this is ready for another pass

Copy link
Contributor

@helpotters helpotters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR was definitely a learning experience for me!

@ethan-readyset ethan-readyset added this pull request to the merge queue Jan 25, 2024
Merged via the queue into main with commit 9841c19 Jan 25, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fall back to upstream database if query to ReadySet fails
3 participants