-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
routing: Add automatic failover #65
Conversation
I'm still working through this PR. I have also employed the use of various code analysis tools such as skunk and rubycritic. Thanks to your code-review feedback, I'll try to keep the scope in-mind of our goal (v1 release). |
b6fac49
to
01937d2
Compare
Rebased! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been reading up on some refactoring stuff! Let me know what you think about my suggestions here. Mostly nitpicks. Everything else seems great, let me know if you found any friction in the specs that I should double-check.
lib/readyset/health/healthchecker.rb
Outdated
class Healthchecker | ||
UNHEALTHY_ERRORS = [::PG::UnableToSend, ::PG::ConnectionBad].freeze | ||
|
||
def initialize(healthcheck_interval:, error_window_period:, error_window_size:, shard:) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
polish: What do you think about reducing this to a configuration settings hash?
healthcheck_interval
, error_window_period
, and error_window_size
seem to have a 'settings' behavior in-common.
def initialize(healthcheck_config, shard:)
@healthy = Concurrent::AtomicBoolean.new(true)
@config = healthcheck_config
@healthchecks = Health::Healthchecks.new(shard: shard)
@lock = Mutex.new
@shard = shard
@window_counter = Readyset::Utils::WindowCounter.new(
window_size: @config[:error_window_size],
time_period: @config[:error_window_period]
)
end
This is an untested code-snippet.
Going a bit further, it may be possible to reduce the number of instance-variables as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me! Though I think I'd prefer to leave the instance variables the way they are -- I think it makes it clearer what attributes are available on the class
def is_readyset_connection_error?(exception) | ||
if exception.cause | ||
is_readyset_connection_error?(exception.cause) | ||
else | ||
UNHEALTHY_ERRORS.any? { |e| exception.is_a?(e) } && | ||
exception.is_a?(Readyset::Error) | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Do you think this could improve readability?
I feel like this could make it a bit clearer as to the three steps it's going through.
def is_readyset_connection_error?(exception)
return true if exception.is_a?(Readyset::Error)
return UNHEALTHY_ERRORS.include?(exception.class) if exception.cause.nil?
is_readyset_connection_error?(exception.cause)
end
This snippet passes test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I typically avoid trailing conditionals and prefer standard if ... else
statements. IMO they read a bit easier, but I think it's just a matter of opinion 🤷
def healthy? | ||
connection.execute('SHOW READYSET STATUS').any? do |row| | ||
row['name'] == 'Database Connection' && row['value'] == 'Connected' | ||
end | ||
rescue | ||
false | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thought: At a glance, this seems closer to a parser 👓 than a ✔️ healthcheck.
Do you think row
would be considered referencing something "outside" of the Healthchecks
object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I'm not sure I follow -- can you expound a bit?
The health is being checked by ensuring that 1) we can successfully connect to ReadySet and run SHOW READYSET STATUS
and 2) ReadySet is connected to the upstream database as expected
This commit adds an optional automatic failover feature to the gem. If the number of connection errors to ReadySet exceeds a preconfigured threshold, ReadySet is considered to be healthy, all queries are routed upstream, and a background task is spawned that periodically checks ReadySet's health. When ReadySet is determined to be healthy again, the task is stopped, and queries are routed back to ReadySet. Closes #45
6d8cdfc
to
8a07b83
Compare
@helpotters Thanks for the great suggestions! Addressed the comments and rebased, so this is ready for another pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR was definitely a learning experience for me!
This commit adds an optional automatic failover feature to the gem. If the number of connection errors to ReadySet exceeds a preconfigured threshold, ReadySet is considered to be unhealthy, all queries are routed upstream, and a background task is spawned that periodically checks ReadySet's health. When ReadySet is determined to be healthy again, the task is stopped, and queries are routed back to ReadySet.
Closes #45