-
-
Notifications
You must be signed in to change notification settings - Fork 363
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow specifying arbitrary SSH configuration for nodes #1024
Conversation
It took one more patch, but I can now provision and deploy nodes behind a bastion. Further testing required. |
I can now force reboot nodes and have it wait properly for the machine to be up. A couple more TCP waiting patches to make. |
Are you sure that this also handles the ssh-for-each case? If so, please merge. If not, please try again :) |
|
OK I think this is good to go:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing the TCP ports check to use SSH instead is something of a mixed bag, so while this certainly makes handling of jump hosts easier, the timeout
argument to run_command
is only the connection timeout, which could possibly hang ssh
on multiple occasions.
So I'd probably add a few tests for the none
backend with flaky networking to see whether that really is robust enough. The only race condition I know for sure is when the machine reboots, but I remember seeing SSH hangs even during boot, eg. in something like this, which is a similar scenario.
@@ -650,7 +651,9 @@ def _wait_stop(self): | |||
""" | |||
self.log_start("waiting for system to shutdown... ") | |||
dotlog = lambda: self.log_continue(".") # NOQA | |||
wait_for_tcp_port(self.main_ipv4, 22, open=False, callback=dotlog) | |||
while self.try_ssh(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we'd switch to Python 3.x (I think it's long overdue), we could use subprocess.run
with the timeout
argument for logged_exec
and in SSHMaster
, but in the interim it's still possible in Python 3.4 and lower using wait
and kill
. We already have a timeout
argument, so I'd either rename the argument or add another command_timeout
argument. Note that we can't simply get rid of the connect timeout and just use the command timeout, because there might be long-running operations, like switching to a new configuration.
@grahamc are you still planning on working on this? |
I'll give this a try sometime in the next few weeks. I plan to put my new deployments behind a bastion host. |
The bastion functionality works perfectly, so that's very nice, as I couldn't get this to work without the patch (NixOps hangs on However, I did have one case where |
I'd love this functionality as well. Good to merge with @dhess's blessing? |
Unfortunately, I think it's not ready. I may have time to look into a fix for this in the next few weeks. |
Hello! Thank you for this PR. In the past several months, some major changes have taken place in
This is all accumulating in to what I hope will be a NixOps 2.0 My hope is that by adding types and more thorough automated testing, However, because of the major changes, it has become likely that this If you would like to see this merge, please bring it up to date with Thank you again for the work you've done here, I am sorry to be Graham |
😆 Fun to see the conversation between you and you @grahamc ;) |
I'd like to deploy to nodes only available behind a bastion server. This patch letsyou do this:
and ./ssh-config:
Initial testing shows this works, I'll do further testing tomorrow.