-
Notifications
You must be signed in to change notification settings - Fork 327
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SCP "connection corrupted" on large file transfer #274
Comments
I'm experiencing the same error when SSHing to an EC2 instance to access my database. The error occurs after some period of time (~30 minutes).
@edwdev could you solved the issue? |
I would like to know more about this as I am facing a similar issue as well. I am not transferring a file but just using session manager to access my RDS database and I frequently get this. I am connecting from my MAC incase there is something with MACs that is causing this issue. |
@rscottwatson I'm using macOS as well, still experiencing the issue |
@asterikx So I added the following to my .ssh/config file and I have not had seen that error yet. Host i-* mi-* No guarantees that this will help but so far it has allowed my connection to stay open longer than it has. |
This is happening to me as well, usually during or after a large file transfer. It is intermittent. |
I also have this issue. Some times it works while other (most of them) doesn't. Already tried @rscottwatson suggestion but got the same results.. |
I'm getting this issue too. I already had ServerAliveInterval & ServerAliveCountMax set. Need to see what data ssh is receiving that is corrupt; I imagine it's a debug or an info message from SSM. |
I'm getting this error regularly too, I'm only using SSH (not SCP) and my connections randomly drop after some time with |
I also have this issue, using SSH (not SCP).
ServerAliveInterval & ServerAliveCountMax also set:
|
I'm experiencing the same behaviors indirectly when using git, trying to clone a very large repository:
|
For my case, the issue was related to instable VPN connection |
I get the same with
archisgore@Archiss-MacBook keys % ssh -i <key redacted>.pem ec2-user@i-<instance redacted>
Bad packet length 631868568.
ssh_dispatch_run_fatal: Connection to UNKNOWN port 65535: Connection corrupted I tried all the above: ClientAliveInterval, ClientAliveCountMax, ServerAliveInternal, ServerAliveCountMax. I restarted What's more, I also opened port 22 using a security group and tried connecting over direct ssh, and get this: archisgore@Archiss-MacBook keys % ssh -v -i "<key redacted>.pem" ec2-user@ec2-<ip redacted>.us-west-2.compute.amazonaws.com
OpenSSH_8.1p1, LibreSSL 2.7.3
debug1: Reading configuration data /Users/archisgore/.ssh/config
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 47: Applying options for *
debug1: Connecting to ec2-52-11-122-217.us-west-2.compute.amazonaws.com port 22.
debug1: Connection established.
debug1: identity file <key redacted>.pem type -1
debug1: identity file <key redacted>.pem-cert type -1
debug1: Local version string SSH-2.0-OpenSSH_8.1
debug1: Remote protocol version 2.0, remote software version OpenSSH_8.0
debug1: match: OpenSSH_8.0 pat OpenSSH* compat 0x04000000
debug1: Authenticating to ec2-<ip redacted>.us-west-2.compute.amazonaws.com:22 as 'ec2-user'
debug1: SSH2_MSG_KEXINIT sent
debug1: SSH2_MSG_KEXINIT received
debug1: kex: algorithm: curve25519-sha256
debug1: kex: host key algorithm: ecdsa-sha2-nistp256
debug1: kex: server->client cipher: [email protected] MAC: <implicit> compression: none
debug1: kex: client->server cipher: [email protected] MAC: <implicit> compression: none
debug1: expecting SSH2_MSG_KEX_ECDH_REPLY
debug1: Server host key: ecdsa-sha2-nistp256 SHA256:IfH362yMWxH3EWdQZOTgsyuq0+jjdJvg0Ag+nFQPjvs
debug1: Host 'ec2-<ip redacted>.us-west-2.compute.amazonaws.com' is known and matches the ECDSA host key.
debug1: Found key in /Users/archisgore/.ssh/known_hosts:109
debug1: rekey out after 134217728 blocks
debug1: SSH2_MSG_NEWKEYS sent
debug1: expecting SSH2_MSG_NEWKEYS
debug1: SSH2_MSG_NEWKEYS received
debug1: rekey in after 134217728 blocks
debug1: Will attempt key: PolyverseDevelopmentKey.pem explicit
debug1: SSH2_MSG_EXT_INFO received
debug1: kex_input_ext_info: server-sig-algs=<ssh-ed25519,ssh-rsa,rsa-sha2-256,rsa-sha2-512,ssh-dss,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521>
debug1: SSH2_MSG_SERVICE_ACCEPT received
debug1: Authentications that can continue: publickey,gssapi-keyex,gssapi-with-mic
debug1: Next authentication method: publickey
debug1: Trying private key: PolyverseDevelopmentKey.pem
Connection closed by 52.11.122.217 port 22
archisgore@Archiss-MacBook keys % ssh -v -i "<key redacted>.pem" ec2-user@ec2-<ip redacted>.us-west-2.compute.amazonaws.com
OpenSSH_8.1p1, LibreSSL 2.7.3
debug1: Reading configuration data /Users/archisgore/.ssh/config
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 47: Applying options for *
debug1: Connecting to ec2-<ip redacted>.us-west-2.compute.amazonaws.com port 22.
debug1: Connection established.
debug1: identity file <key redacted>.pem type -1
debug1: identity file <key redacted>.pem-cert type -1
debug1: Local version string SSH-2.0-OpenSSH_8.1
debug1: Remote protocol version 2.0, remote software version OpenSSH_8.0
debug1: match: OpenSSH_8.0 pat OpenSSH* compat 0x04000000
debug1: Authenticating to ec2-<ip redacted>.us-west-2.compute.amazonaws.com:22 as 'ec2-user'
debug1: SSH2_MSG_KEXINIT sent
debug1: SSH2_MSG_KEXINIT received
debug1: kex: algorithm: curve25519-sha256
debug1: kex: host key algorithm: ecdsa-sha2-nistp256
debug1: kex: server->client cipher: [email protected] MAC: <implicit> compression: none
debug1: kex: client->server cipher: [email protected] MAC: <implicit> compression: none
debug1: expecting SSH2_MSG_KEX_ECDH_REPLY
debug1: Server host key: ecdsa-sha2-nistp256 SHA256:IfH362yMWxH3EWdQZOTgsyuq0+jjdJvg0Ag+nFQPjvs
debug1: Host 'ec2-<ip redacted>.us-west-2.compute.amazonaws.com' is known and matches the ECDSA host key.
debug1: Found key in /Users/archisgore/.ssh/known_hosts:109
debug1: rekey out after 134217728 blocks
debug1: SSH2_MSG_NEWKEYS sent
debug1: expecting SSH2_MSG_NEWKEYS
debug1: SSH2_MSG_NEWKEYS received
debug1: rekey in after 134217728 blocks
debug1: Will attempt key: <key redacted>.pem explicit
debug1: SSH2_MSG_EXT_INFO received
debug1: kex_input_ext_info: server-sig-algs=<ssh-ed25519,ssh-rsa,rsa-sha2-256,rsa-sha2-512,ssh-dss,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521>
debug1: SSH2_MSG_SERVICE_ACCEPT received
debug1: Authentications that can continue: publickey,gssapi-keyex,gssapi-with-mic
debug1: Next authentication method: publickey
debug1: Trying private key: <key redacted>.pem
Connection closed by <ip redacted> port 22 |
Thanks for reaching out to us! Documentation for enabling plugin logs. |
Oh hey sorry I take that back. My issue was unrelated. My SSH config was broken, which I verified by opening a direct SSH port. Once I fixed that, SSM worked too. Apologies for the report above. |
Sorry if I'm going slightly offtopic here, but would AWS consider supporting SCP-like functionality natively into the SSM client/cmdline? Case in point: https://twitter.com/braincode/status/1427841930596032513 Our users would migrate fully to SSM if there was such a file transfer facility built-in and as "cloud admins" we'd be happier because we don't have to open SSH ports anymore, all is handled via the official AWS CLI? |
Is there any solution to this issue? I still see it for long running connections. |
hi, may I ask if we have an update on this issue? |
I experienced this on MacOS:
To solve it, I have installed the last version of the OpenSSH with brew: |
I'm still getting this on MacOS with the latest version of openssh available from brew: ssh -V
OpenSSH_9.3p1, OpenSSL 3.1.1 30 May 2023 Bad packet length 3246692971.
ssh_dispatch_run_fatal: Connection to UNKNOWN port 65535: Connection corrupted When it happened, I wasn't transferring anything large, it was just sitting there idle. |
This issue is caused by the SSM Maximum Session Timeout on the SSM config found at: https://us-east-1.console.aws.amazon.com/systems-manager/session-manager/preferences?region=us-east-1 (change your region) The SSM service will terminate the stream after this timeout, causing this 'packet length' error from SSH logs regardless of what is in progress presumably due to the multi-layering at the TCP application-layer. As far as I can see, this is working as expected. Make sure to also enable SSH keep alives so the SSM Idle Session Timeout config also doesn't come into effect but nothing can prevent SSM Maximum Session Timeout config from killing it later on. This can probably be closed unless someone is experiencing this problem before the maximum timeout is reached (by default its 20 minutes). Cheers |
Given @xanather 's comment, I've just installed the latest version and a quick test (scp of large file and |
While the original issue reported here may have been due to the SSM timeout, I don't think this explanation makes sense. The If the SSM session times out, the I think there has to be some other bug at work here to trigger that error, which should just never normally happen. |
To @pseudometric's point above, the presence of the It seems any kind of The real question is probably -- what debug statement is firing? And would it shed light on the real problem? If the statement is not a fatal error message, it should not cause the connection to be corrupted. |
Thanks to
The remainder is addressed squarely to AWS... There is no reliable reproduction to this issue thus far, so "I updated and it's fixed" is not validation of a fix. This kind of issue MUST (https://datatracker.ietf.org/doc/html/rfc2119) be understood and solved at the source. This issue has plagued AWS customers for quite some time now and most have found a workaround. Unreliable behaviour of computers it not a thing most professionals want to deal with, especially if we're paying. If this method is not fit for purpose, please, retire it, or at minimum state as much in your documentation. AWS staff have surely had this issue when attempting to use this transport method, so it's clear you either know about it and avoid it, or you just retry until you succeed. Neither are appropriate responses to paying customers. Due to the intermittency of this issue it is a non-trivial problem to solve, however the only people with appropriate access to do so are AWS employees. This is NOT a problem with SSH, this is clearly a problem with your ssm-agent. AWS, you alone have the means to find and fix this! Thanks in advance, A |
I'm not defending AWS at their lack of support here. But take a look at my comment further above. You can make it consistent by enabling SSH probing and setting the SSM timeouts. It is caused by the SSM session timing OUT. The error message is a different issue. This has worked reliably since I added my comment 5 months ago. |
Fair call, I will attempt to replicate (as AWS should), but I hold that an intermittent issue it not solved by magical updates without explanation Edit: @xanather a couple of questions to help me know what I'm looking for, if you will...
I'm attempting to replicate by transferring a large file (ubuntu-budgie-22.04.1-desktop-amd64.iso) multiple times, and my expectation is it will fail at some unknown point, after some unknown amount of transfers, not related to any obvious timeout (unless you can guide me better?). To be clear, some number of transfers will succeed, and at some point it would be silly to not concede that it "just works", but I'm not sure where that mark should be! For some context: I have found a workaround since some time ago, and no timeouts have been changed at any point in time, and the value there is currently 20 minutes. I've had many a transfer succeed, and also some random failures with the symptoms presented in the OP, sometimes well short of "the timeout", and some transfers that survive much longer. I'm hopeful that the issue @msolters has linked (#358) may help here, but there is a component in here that I don't believe is open source, and it's been more than long enough for AWS to take this seriously for paying customers. |
So I attempted to reproduce the original issue using scp in a loop and have not yet been able to. This morning however, an idle SSH connection suffered from
So while this doesn't strictly fit the issue reported in the OP, it matches the symptoms. FWIW, my SSH configuration for this connection includes
So I would expect the SSH session to stay alive and not hit a timeout on the AWS side. I will try and replicate this again, but due to the intermittent nature I'm not expecting anything conclusive. |
aws/session-manager-plugin#94 appears to address the issue with SSH reporting a bad packet length, at least in some cases. The test applied was to intentionally shut down an EC2 instance while running a simple loop printing the date via SSH. More rigorous testing would be required to confirm this is a solution, but it looks promising. |
As stated earlier, the We certainly know that one cause is because Possibly, SSM sessions timing out trigger debug log statements in What I can confirm is that if you recompile The quiet mode MR looks to be a more robust implementation of that same fundamental fix -- get the garbage out of the protocol stream by moving it out of stdout. In this case, we just provide some additional runtime control over that behaviour. I would love for this fix or some variant to be upstreamed! We should not have to maintain forks of |
aws/amazon-ssm-agent#274 Bad packet length 2089525630. ssh_dispatch_run_fatal: Connection to UNKNOWN port 65535: Connection corrupted
We are encountering a random issue when using SSM proxy to SSH into EC2 instances in our CI/CD pipelines. The command used is:
The error message we occasionally receive is: "bad packet length." Key details:
Has anyone else faced this issue? It's quite frustrating to deal with in a pipeline context. Any insights or solutions would be appreciated! |
When trying to SCP from a local machine to and EC2 instance, using session manager (ProxyCommand is setup in SSH config) - it seems that I have issues sending larger files (60MB for example) and when I get to 8624KB uploaded, I get the following error.
ad packet length 2916819926.
ssh_dispatch_run_fatal: Connection to UNKNOWN port 65535: Connection corrupted
lost connection
My local connection is stable, so am just wondering if I'm hitting any hard limits ?
The text was updated successfully, but these errors were encountered: