Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: ssh_problem failed #90695

Open
cockroach-teamcity opened this issue Oct 26, 2022 · 1,304 comments
Open

roachtest: ssh_problem failed #90695

cockroach-teamcity opened this issue Oct 26, 2022 · 1,304 comments
Labels
branch-master Failures and bugs on the master branch. O-roachtest O-robot Originated from a bot. T-testeng TestEng Team X-infra-flake the automatically generated issue was closed due to an infrastructure problem not a product issue
Milestone

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Oct 26, 2022

roachtest.ssh_problem failed with artifacts on master @ 1b1c8da55be48c174b7b370b305f42622546209f:

test schemachange/random-load failed due to test artifacts and logs in: /artifacts/schemachange/random-load/run_1
(test_impl.go:291).Fatal: output in run_125153.622625054_n1_workload_run_schemachange: ./workload run schemachange --verbose=1 --tolerate-errors=false  --histograms=perf/stats.json --max-ops 5000 --concurrency 20 --txn-log /mnt/data1/cockroach/transactions.json returned: SSH_PROBLEM: exit status 255

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=4 , ROACHTEST_encrypted=false , ROACHTEST_fs=ext4 , ROACHTEST_localSSD=true , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

/cc @cockroachdb/test-eng

This test on roachdash | Improve this report!

Jira issue: CRDB-20896

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. labels Oct 26, 2022
@cockroach-teamcity cockroach-teamcity added this to the 22.2 milestone Oct 26, 2022
@blathers-crl blathers-crl bot added the T-testeng TestEng Team label Oct 26, 2022
@cockroach-teamcity
Copy link
Member Author

roachtest.ssh_problem failed with artifacts on master @ 0c1c3e7777b28a30ebe41428fb173f0156e8968c:

test schemachange/random-load failed due to test artifacts and logs in: /artifacts/schemachange/random-load/run_1
(test_impl.go:291).Fatal: output in run_133500.318499865_n1_workload_run_schemachange: ./workload run schemachange --verbose=1 --tolerate-errors=false  --histograms=perf/stats.json --max-ops 5000 --concurrency 20 --txn-log /mnt/data1/cockroach/transactions.json returned: SSH_PROBLEM: exit status 255

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=4 , ROACHTEST_encrypted=false , ROACHTEST_fs=ext4 , ROACHTEST_localSSD=true , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.ssh_problem failed with artifacts on master @ 2d926e68000df659f282d4e4477329867b9a3323:

test schemachange/random-load failed due to test artifacts and logs in: /artifacts/schemachange/random-load/run_1
(test_impl.go:291).Fatal: output in run_140012.557459920_n1_workload_run_schemachange: ./workload run schemachange --verbose=1 --tolerate-errors=false  --histograms=perf/stats.json --max-ops 5000 --concurrency 20 --txn-log /mnt/data1/cockroach/transactions.json returned: SSH_PROBLEM: exit status 255
(test_impl.go:314).Errorf: test timed out (0s)

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=4 , ROACHTEST_encrypted=false , ROACHTEST_fs=ext4 , ROACHTEST_localSSD=true , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

This test on roachdash | Improve this report!

@renatolabs
Copy link
Contributor

Some notes about the failure above:

Error classifiction

The failure was classified as an SSH flake but, in reality, the test timed out. You can see it in the message above, as well as by checking the test logs: the workload ran for 10 hours.

To consider (cc @smg260):

  • Improve the test timed out message to properly report time spent in the test. This could be done by reusing the default timeout [1] or (even better) displaying the actual duration (for example, by exposing duration [2]).
  • Improving SSH-flake detection. It seems that when the test runner intentionally stops processes started by roachprod, there's a chance we'll get a 255 error. There's probably a better way to deal with this. One quick and dirty, short term solution would be to not mark a failure an SSH-flake if one of the errors indicates that the test timed out.

[1] https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/test_runner.go#L877
[2] https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/test_impl.go#L373-L375


Actual test failure (workload timeout)

The workload ran for 10h and never finished. We're passing --max-ops 5000, so it technically should terminate when we reach that number of operations (+ the number of concurrent workers which in this case is 20).

The logs generated by the workload are fairly large (280MB). If we grep for a field in the JSON printed when a worker performs its random number of ops, I think we are able to get a count of the actual number of schema change ops the workload ran:

% rg expectedExecErrors run_140012.557459920_n1_workload_run_schemachange.log | wc -l
    5643

This looks suspicious to me. This number shouldn't be > 5019 (5000 maxOps + enough ops for the remaining 19 workers to realize they should stop after maxOps is reached).

@fqazi Thoughts? Anything to be done here?

@fqazi
Copy link
Collaborator

fqazi commented Nov 2, 2022

Let me dig into this, it looks like we are some how kept generating ops, which is super weird.

@fqazi
Copy link
Collaborator

fqazi commented Nov 2, 2022

@renatolabs It looks like we ran into an infinite loop inside randParentColumnForFkRelation. I'll get a patch out for it shortly

@smg260
Copy link
Contributor

smg260 commented Nov 2, 2022

Thanks @renatolabs .

Improve the test timed out message to properly report time spent in the test. This could be done by reusing the default timeout [1] or (even better) displaying the actual duration (for example, by exposing duration [2]).

test.log shows the correct actual timeout of 10h (this is the default, and incidentally quite high)

23:59:09 test_runner.go:936: test timed out after 10h0m0s; check __stacks.log and CRDB logs for goroutine dumps

The 0s you see is reported at the time we finally call t.Errorf and it uses the timeout in the TestSpec, which defaults to 0. This should probably use the actual timeout as in the above print.

Improving SSH-flake detection. It seems that when the test runner intentionally stops processes started by roachprod, there's a chance we'll get a 255 error. There's probably a better way to deal with this. One quick and dirty, short term solution would be to not mark a failure an SSH-flake if one of the errors indicates that the test timed out.

Agreed. However this is a symptom of us delaying the call to t.Errorf on timeout until teardown as completed [1], [2]. When the nodes are stopped, the long running ssh command will fail, resulting in a 255 exit code, and the subsequent reporting of that error - but since we delayed reporting the real cause (timeout), this error becomes the primary and gets reported as such. We could certainly look at prioritising the time out error.

[1] - https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/test_runner.go#L933
[2] - https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/test_runner.go#L1051

@fqazi
Copy link
Collaborator

fqazi commented Nov 2, 2022

There is a secondary bug here inside cockroache's DROP SCHEMA implementation too, which makes this worse. I'll band-aid the infinite loop first since a faster failure is better here.

fqazi added a commit to fqazi/cockroach that referenced this issue Nov 2, 2022
Fixes: cockroachdb#91131
Informs: cockroachdb#90695

Previously, the select statement part of this workload did not
properly handle disk full errors due to spilling being capped.
To addres this, this patch adds them into the expected set of
errors. Additionally, this patch cleans up an infinite loop
caused by unknown schema errors, this logic is no longer required
and the test should fail.

Release note: None
fqazi added a commit to fqazi/cockroach that referenced this issue Nov 3, 2022
Fixes: cockroachdb#91131
Informs: cockroachdb#90695

Previously, the select statement part of this workload did not
properly handle disk full errors due to spilling being capped.
To addres this, this patch adds them into the expected set of
errors. Additionally, this patch cleans up an infinite loop
caused by unknown schema errors, this logic is no longer required
and the test should fail.

Release note: None
craig bot pushed a commit that referenced this issue Nov 4, 2022
91163: workload/schemachange: fixes to improve stability of the test r=fqazi a=fqazi

These changes do the following:

Fixes: #91131
Informs: #90695

1. Eliminate log spam inside the schema changer workload due to the watchdog thread incorrectly dealing with null values
2. Expecting disk full errors due to large selects, since spilling does have a limit out of the box inside cockroach
3. Reducing a hang inside the workload due to a no longer necessary retry loop.

Co-authored-by: Faizan Qazi <[email protected]>
blathers-crl bot pushed a commit that referenced this issue Nov 8, 2022
Fixes: #91131
Informs: #90695

Previously, the select statement part of this workload did not
properly handle disk full errors due to spilling being capped.
To addres this, this patch adds them into the expected set of
errors. Additionally, this patch cleans up an infinite loop
caused by unknown schema errors, this logic is no longer required
and the test should fail.

Release note: None
@cockroach-teamcity
Copy link
Member Author

roachtest.ssh_problem failed with artifacts on master @ 8357abb668a5adaff781343b394b162fb1b66c6e:

test jepsen/bank-multitable/majority-ring-start-kill-2 failed due to test artifacts and logs in: /artifacts/jepsen/bank-multitable/majority-ring-start-kill-2/run_1
(test_impl.go:291).Fatal: output in run_104333.097775667_n6_bash: bash -e -c "cd /mnt/data1/jepsen/cockroachdb && set -eo pipefail && java -jar cockroachdb-0.2.0-1150b38f-standalone.jar test \
    --tarball file://${PWD}/cockroach.tgz \
    --username ${USER} \
    --ssh-private-key ~/.ssh/id_rsa \
    --os ubuntu \
    --time-limit 300 \
    --concurrency 30 \
    --recovery-time 25 \
    --test-count 1 \
    -n 10.142.1.137 -n 10.142.0.45 -n 10.142.1.138 -n 10.142.1.127 -n 10.142.1.162 \
    --test bank-multitable --nemesis majority-ring --nemesis2 start-kill-2 > invoke.log 2>&1" returned: error persisted after 3 attempts: SSH_PROBLEM: exit status 255

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=4 , ROACHTEST_encrypted=false , ROACHTEST_fs=ext4 , ROACHTEST_localSSD=true , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.ssh_problem failed with artifacts on master @ 8357abb668a5adaff781343b394b162fb1b66c6e:

test jepsen/bank-multitable/start-kill-2 failed due to test artifacts and logs in: /artifacts/jepsen/bank-multitable/start-kill-2/run_1
(test_impl.go:291).Fatal: output in run_105547.897432309_n6_bash: bash -e -c "cd /mnt/data1/jepsen/cockroachdb && set -eo pipefail && java -jar cockroachdb-0.2.0-1150b38f-standalone.jar test \
    --tarball file://${PWD}/cockroach.tgz \
    --username ${USER} \
    --ssh-private-key ~/.ssh/id_rsa \
    --os ubuntu \
    --time-limit 300 \
    --concurrency 30 \
    --recovery-time 25 \
    --test-count 1 \
    -n 10.142.1.174 -n 10.142.0.224 -n 10.142.1.98 -n 10.142.1.172 -n 10.142.0.217 \
    --test bank-multitable --nemesis start-kill-2 > invoke.log 2>&1" returned: error persisted after 3 attempts: SSH_PROBLEM: exit status 255

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=4 , ROACHTEST_encrypted=false , ROACHTEST_fs=ext4 , ROACHTEST_localSSD=true , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.ssh_problem failed with artifacts on master @ 8357abb668a5adaff781343b394b162fb1b66c6e:

test jepsen/bank/majority-ring-start-kill-2 failed due to test artifacts and logs in: /artifacts/jepsen/bank/majority-ring-start-kill-2/run_1
(test_impl.go:291).Fatal: output in run_105853.056711258_n6_bash: bash -e -c "cd /mnt/data1/jepsen/cockroachdb && set -eo pipefail && java -jar cockroachdb-0.2.0-1150b38f-standalone.jar test \
    --tarball file://${PWD}/cockroach.tgz \
    --username ${USER} \
    --ssh-private-key ~/.ssh/id_rsa \
    --os ubuntu \
    --time-limit 300 \
    --concurrency 30 \
    --recovery-time 25 \
    --test-count 1 \
    -n 10.142.1.166 -n 10.142.1.167 -n 10.142.1.163 -n 10.142.1.171 -n 10.142.1.168 \
    --test bank --nemesis majority-ring --nemesis2 start-kill-2 > invoke.log 2>&1" returned: error persisted after 3 attempts: SSH_PROBLEM: exit status 255

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=4 , ROACHTEST_encrypted=false , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.ssh_problem failed with artifacts on master @ 8357abb668a5adaff781343b394b162fb1b66c6e:

test jepsen/bank/start-kill-2 failed due to test artifacts and logs in: /artifacts/jepsen/bank/start-kill-2/run_1
(test_impl.go:291).Fatal: output in run_111113.812210581_n6_bash: bash -e -c "cd /mnt/data1/jepsen/cockroachdb && set -eo pipefail && java -jar cockroachdb-0.2.0-1150b38f-standalone.jar test \
    --tarball file://${PWD}/cockroach.tgz \
    --username ${USER} \
    --ssh-private-key ~/.ssh/id_rsa \
    --os ubuntu \
    --time-limit 300 \
    --concurrency 30 \
    --recovery-time 25 \
    --test-count 1 \
    -n 10.142.1.116 -n 10.142.1.114 -n 10.142.1.125 -n 10.142.1.124 -n 10.142.1.126 \
    --test bank --nemesis start-kill-2 > invoke.log 2>&1" returned: error persisted after 3 attempts: SSH_PROBLEM: exit status 255

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=4 , ROACHTEST_encrypted=false , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.ssh_problem failed with artifacts on master @ 8357abb668a5adaff781343b394b162fb1b66c6e:

test jepsen/g2/majority-ring-start-kill-2 failed due to test artifacts and logs in: /artifacts/jepsen/g2/majority-ring-start-kill-2/run_1
(test_impl.go:291).Fatal: output in run_113040.476678420_n6_bash: bash -e -c "cd /mnt/data1/jepsen/cockroachdb && set -eo pipefail && java -jar cockroachdb-0.2.0-1150b38f-standalone.jar test \
    --tarball file://${PWD}/cockroach.tgz \
    --username ${USER} \
    --ssh-private-key ~/.ssh/id_rsa \
    --os ubuntu \
    --time-limit 300 \
    --concurrency 30 \
    --recovery-time 25 \
    --test-count 1 \
    -n 10.142.1.194 -n 10.142.1.199 -n 10.142.1.193 -n 10.142.1.192 -n 10.142.1.198 \
    --test g2 --nemesis majority-ring --nemesis2 start-kill-2 > invoke.log 2>&1" returned: error persisted after 3 attempts: SSH_PROBLEM: exit status 255

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=4 , ROACHTEST_encrypted=false , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.ssh_problem failed with artifacts on master @ 8357abb668a5adaff781343b394b162fb1b66c6e:

test jepsen/g2/start-kill-2 failed due to test artifacts and logs in: /artifacts/jepsen/g2/start-kill-2/run_1
(test_impl.go:291).Fatal: output in run_113653.263800916_n6_bash: bash -e -c "cd /mnt/data1/jepsen/cockroachdb && set -eo pipefail && java -jar cockroachdb-0.2.0-1150b38f-standalone.jar test \
    --tarball file://${PWD}/cockroach.tgz \
    --username ${USER} \
    --ssh-private-key ~/.ssh/id_rsa \
    --os ubuntu \
    --time-limit 300 \
    --concurrency 30 \
    --recovery-time 25 \
    --test-count 1 \
    -n 10.142.1.159 -n 10.142.1.157 -n 10.142.1.120 -n 10.142.1.158 -n 10.142.1.155 \
    --test g2 --nemesis start-kill-2 > invoke.log 2>&1" returned: error persisted after 3 attempts: SSH_PROBLEM: exit status 255

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=4 , ROACHTEST_encrypted=false , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.ssh_problem failed with artifacts on master @ 8357abb668a5adaff781343b394b162fb1b66c6e:

test jepsen/multi-register/majority-ring failed due to test artifacts and logs in: /artifacts/jepsen/multi-register/majority-ring/run_1
(test_impl.go:291).Fatal: output in run_120825.027668848_n6_bash: bash -e -c "cd /mnt/data1/jepsen/cockroachdb && set -eo pipefail && java -jar cockroachdb-0.2.0-1150b38f-standalone.jar test \
    --tarball file://${PWD}/cockroach.tgz \
    --username ${USER} \
    --ssh-private-key ~/.ssh/id_rsa \
    --os ubuntu \
    --time-limit 300 \
    --concurrency 30 \
    --recovery-time 25 \
    --test-count 1 \
    -n 10.142.0.152 -n 10.142.0.91 -n 10.142.0.96 -n 10.142.0.145 -n 10.142.0.98 \
    --test multi-register --nemesis majority-ring > invoke.log 2>&1" returned: error persisted after 3 attempts: SSH_PROBLEM: exit status 255

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=4 , ROACHTEST_encrypted=false , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.ssh_problem failed with artifacts on master @ 8357abb668a5adaff781343b394b162fb1b66c6e:

test jepsen/multi-register/parts-start-kill-2 failed due to test artifacts and logs in: /artifacts/jepsen/multi-register/parts-start-kill-2/run_1
(test_impl.go:291).Fatal: output in run_121117.835152541_n6_bash: bash -e -c "cd /mnt/data1/jepsen/cockroachdb && set -eo pipefail && java -jar cockroachdb-0.2.0-1150b38f-standalone.jar test \
    --tarball file://${PWD}/cockroach.tgz \
    --username ${USER} \
    --ssh-private-key ~/.ssh/id_rsa \
    --os ubuntu \
    --time-limit 300 \
    --concurrency 30 \
    --recovery-time 25 \
    --test-count 1 \
    -n 10.142.0.95 -n 10.142.0.236 -n 10.142.0.110 -n 10.142.0.154 -n 10.142.0.112 \
    --test multi-register --nemesis parts --nemesis2 start-kill-2 > invoke.log 2>&1" returned: error persisted after 3 attempts: SSH_PROBLEM: exit status 255

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=4 , ROACHTEST_encrypted=false , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.ssh_problem failed with artifacts on master @ 6610d705724a21c836f3521f75972e65d9e9e2d4:

test c2c/shutdown/src/coordinator failed: VMs preempted during the test run: teamcity-17772379-1731652882-33-n9cpu8-0006 (us-central1-f), teamcity-17772379-1731652882-33-n9cpu8-0008 (us-central1-f), teamcity-17772379-1731652882-33-n9cpu8-0002 (us-central1-f), teamcity-17772379-1731652882-33-n9cpu8-0009 (us-central1-f)

**Other Failures:**
(cluster.go:2456).Run: full command output in run_080149.393917226_n9_cockroach-workload-i.log: error persisted after 3 attempts: _potential_ SSH flake (``ssh -vvv`` log retained in /artifacts/c2c/shutdown/src/coordinator/cpu_arch=arm64/run_1/ssh/ssh_080149.429372771_n9_cockroach-workload-i.log): TRANSIENT_ERROR(ssh_problem): exit status 255
(test_runner.go:1356).runTest: test timed out (20m0s)
test artifacts and logs in: /artifacts/c2c/shutdown/src/coordinator/cpu_arch=arm64/run_1

Parameters:

  • arch=arm64
  • cloud=gce
  • coverageBuild=false
  • cpu=8
  • encrypted=false
  • fs=ext4
  • localSSD=false
  • metamorphicLeases=default
  • runtimeAssertionsBuild=false
  • ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.ssh_problem failed with artifacts on master @ 6610d705724a21c836f3521f75972e65d9e9e2d4:

test allocbench/nodes=7/cpu=8/kv/r=50/ops=skew failed: (cluster.go:2343).Start: _potential_ SSH flake (``ssh -vvv`` log retained in /artifacts/allocbench/nodes=7/cpu=8/kv/r=50/ops=skew/run_1/ssh/ssh_085058.501200952_n1_upload-start-script.log): TRANSIENT_ERROR(ssh_problem): exit status 255
test artifacts and logs in: /artifacts/allocbench/nodes=7/cpu=8/kv/r=50/ops=skew/run_1

Parameters:

  • arch=amd64
  • cloud=azure
  • coverageBuild=false
  • cpu=8
  • encrypted=false
  • fs=ext4
  • localSSD=true
  • runtimeAssertionsBuild=false
  • ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

Grafana is not yet available for azure clusters

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.ssh_problem failed with artifacts on master @ 6610d705724a21c836f3521f75972e65d9e9e2d4:

test ycsb/E/nodes=3 failed: (cluster.go:2343).Start: TRANSIENT_ERROR(ssh_problem): ~ scp -r -C -o StrictHostKeyChecking=no -o ConnectTimeout=10 -i /home/roach/.ssh/id_rsa -i /home/roach/.ssh/google_compute_engine [email protected]:certs.tar /tmp/certs.tar2053697304
ssh: connect to host 52.146.91.221 port 22: Connection timed out
: exit status 1
test artifacts and logs in: /artifacts/ycsb/E/nodes=3/cpu_arch=arm64/run_1

Parameters:

  • arch=arm64
  • cloud=azure
  • coverageBuild=false
  • cpu=8
  • encrypted=false
  • fs=ext4
  • localSSD=true
  • runtimeAssertionsBuild=false
  • ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

Grafana is not yet available for azure clusters

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.ssh_problem failed with artifacts on master @ e83bc46aa42f2476b4b11b9703b8038c660dc980:

test cdc/crdb-chaos failed: (monitor.go:149).Wait: monitor failure: failed to list cockroach processes: _potential_ SSH flake (``ssh -vvv`` log retained in /artifacts/cdc/crdb-chaos/cpu_arch=arm64/run_1/ssh/ssh_074303.903524650_n3_list-processes.log): TRANSIENT_ERROR(ssh_problem): exit status 255 [owner=test-eng]
test artifacts and logs in: /artifacts/cdc/crdb-chaos/cpu_arch=arm64/run_1

Parameters:

  • arch=arm64
  • cloud=azure
  • coverageBuild=false
  • cpu=16
  • encrypted=false
  • fs=ext4
  • localSSD=true
  • metamorphicLeases=default
  • runtimeAssertionsBuild=false
  • ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

Grafana is not yet available for azure clusters

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.ssh_problem failed with artifacts on master @ e83bc46aa42f2476b4b11b9703b8038c660dc980:

test jepsen/multi-register/subcritical-skews-start-kill-2 failed: (cluster.go:2456).Run: full command output in run_092509.930865477_n1-6_sh-c-sudo-DEBIANFRON.log: error persisted after 3 attempts: _potential_ SSH flake (``ssh -vvv`` log retained in /artifacts/jepsen/multi-register/subcritical-skews-start-kill-2/cpu_arch=arm64/run_1/ssh/ssh_092510.083817775_n1_sh-c-sudo-DEBIANFRON.log): TRANSIENT_ERROR(ssh_problem): exit status 255
test artifacts and logs in: /artifacts/jepsen/multi-register/subcritical-skews-start-kill-2/cpu_arch=arm64/run_1

Parameters:

  • arch=arm64
  • cloud=azure
  • coverageBuild=false
  • cpu=4
  • encrypted=false
  • fs=ext4
  • localSSD=true
  • metamorphicLeases=default
  • runtimeAssertionsBuild=false
  • ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

Grafana is not yet available for azure clusters

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.ssh_problem failed with artifacts on master @ 9927a9a1f0827daa734d5eb718017cf260dfe676:

test c2c/mixed-version failed: (mixedversion.go:759).Run: source: mixed-version test failure while running step 8 (run "initialize tpcc workload"): full command output in run_065726.575660328_n9_cockroach-workload-i.log: COMMAND_PROBLEM: exit status 1
(mixedversion.go:759).Run: dest: cluster.PutE: put "pkg/cmd/roachtest/fixtures/1/checkpoint-v24.2.tgz" failed: TRANSIENT_ERROR(ssh_problem): ~ scp -r -C -o StrictHostKeyChecking=no -o ConnectTimeout=10 -i /home/roach/.ssh/id_rsa -i /home/roach/.ssh/google_compute_engine pkg/cmd/roachtest/fixtures/1/checkpoint-v24.2.tgz [email protected]:/mnt/data1/cockroach/fixture.tgz: signal: killed [owner=test-eng]
test artifacts and logs in: /artifacts/c2c/mixed-version/cpu_arch=arm64/run_1

Parameters:

  • arch=arm64
  • cloud=azure
  • coverageBuild=false
  • cpu=4
  • encrypted=false
  • fs=ext4
  • localSSD=true
  • mvtDeploymentMode=system-only
  • mvtVersions=v24.2.4 → master
  • runtimeAssertionsBuild=false
  • ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

Grafana is not yet available for azure clusters

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.ssh_problem failed with artifacts on master @ 9927a9a1f0827daa734d5eb718017cf260dfe676:

	            	  | Wraps: (4) TRANSIENT_ERROR(ssh_problem)
	            	  | Wraps: (5) exit status 255
	            	  | Error types: (1) *hintdetail.withDetail (2) *withstack.withStack (3) *errutil.withPrefix (4) errors.TransientError (5) *exec.ExitError
	            	Wraps: (7) secondary error attachment
	            	  | _potential_ SSH flake (``ssh -vvv`` log retained in /artifacts/perturbation/metamorphic/backfill/run_1/ssh/ssh_072847.080651650_n1_cockroach-workload-i.log): TRANSIENT_ERROR(ssh_problem): exit status 255
	            	  | (1) Node 1. Command with error:
	            	  |   | ``````
	            	  |   | ./cockroach workload init kv --db target --splits 10000 {pgurl:1}
	            	  |   | ``````
	            	  |   | <no output>
	            	  | Wraps: (2) attached stack trace
	            	  |   -- stack trace:
	            	  |   | github.com/cockroachdb/cockroach/pkg/roachprod/install.(*remoteSession).errWithDebug
	            	  |   | 	pkg/roachprod/install/session.go:138
	            	  |   | github.com/cockroachdb/cockroach/pkg/roachprod/install.(*remoteSession).Run.func1
	            	  |   | 	pkg/roachprod/install/session.go:168
	            	  |   | runtime.goexit
	            	  |   | 	src/runtime/asm_amd64.s:1695
	            	  | Wraps: (3) _potential_ SSH flake (``ssh -vvv`` log retained in /artifacts/perturbation/metamorphic/backfill/run_1/ssh/ssh_072847.080651650_n1_cockroach-workload-i.log)
	            	  | Wraps: (4) TRANSIENT_ERROR(ssh_problem)
	            	  | Wraps: (5) exit status 255
	            	  | Error types: (1) *hintdetail.withDetail (2) *withstack.withStack (3) *errutil.withPrefix (4) errors.TransientError (5) *exec.ExitError
	            	Wraps: (8) Node 1. Command with error:
	            	  | ``````
	            	  | ./cockroach workload init kv --db target --splits 10000 {pgurl:1}
	            	  | ``````
	            	  | stdout: <empty>
	            	  | stderr:I241119 07:08:23.773121 1 workload/cli/run.go:665  [-] 1  random seed: 3078597521994480738
	            	  | I241119 07:08:23.846736 1 workload/workloadsql/workloadsql.go:120  [-] 2  starting 10000 splits
	            	  | I241119 07:08:32.499410 436 workload/workloadsql/workloadsql.go:194  [-] 3  finished 1000 of 10000 splits
	            	  | I241119 07:08:40.352431 436 workload/workloadsql/workloadsql.go:194  [-] 4  finished 2000 of 10000 splits
	            	  | I241119 07:08:48.240923 436 workload/workloadsql/workloadsql.go:194  [-] 5  finished 3000 of 10000 splits
	            	  | I241119 07:08:56.206274 436 workload/workloadsql/workloadsql.go:194  [-] 6  finished 4000 of 10000 splits
	            	  | I241119 07:09:03.881781 436 workload/workloadsql/workloadsql.go:194  [-] 7  finished 5000 of 10000 splits
	            	  | I241119 07:09:11.879761 436 workload/workloadsql/workloadsql.go:194  [-] 8  finished 6000 of 10000 splits
	            	Wraps: (9) attached stack trace
	            	  -- stack trace:
	            	  | github.com/cockroachdb/cockroach/pkg/roachprod/install.(*remoteSession).errWithDebug
	            	  | 	pkg/roachprod/install/session.go:138
	            	  | github.com/cockroachdb/cockroach/pkg/roachprod/install.(*remoteSession).Run.func1
	            	  | 	pkg/roachprod/install/session.go:168
	            	  | runtime.goexit
	            	  | 	src/runtime/asm_amd64.s:1695
	            	Wraps: (10) _potential_ SSH flake (``ssh -vvv`` log retained in /artifacts/perturbation/metamorphic/backfill/run_1/ssh/ssh_070822.289498731_n1_cockroach-workload-i.log)
	            	Wraps: (11) TRANSIENT_ERROR(ssh_problem)
	            	Wraps: (12) exit status 255
	            	Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *markers.withMark (4) *withstack.withStack (5) *errutil.withPrefix (6) *secondary.withSecondaryError (7) *secondary.withSecondaryError (8) *hintdetail.withDetail (9) *withstack.withStack (10) *errutil.withPrefix (11) errors.TransientError (12) *exec.ExitError
	Test:       	perturbation/metamorphic/backfill
(require.go:1357).NoError: FailNow called
test artifacts and logs in: /artifacts/perturbation/metamorphic/backfill/run_1

Parameters:

  • arch=amd64
  • cloud=gce
  • coverageBuild=false
  • cpu=4
  • encrypted=false
  • fs=ext4
  • localSSD=true
  • runtimeAssertionsBuild=false
  • ssd=2
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.ssh_problem failed with artifacts on master @ 8eeb7f2ae3b2cede564b46ca47e2353fd147c061:

test acceptance/build-info failed: (cluster.go:2343).Start: _potential_ SSH flake (``ssh -vvv`` log retained in /artifacts/acceptance/build-info/cpu_arch=arm64/run_1/ssh/ssh_064800.825495339_n4_run-start-script.log): TRANSIENT_ERROR(ssh_problem): exit status 255
test artifacts and logs in: /artifacts/acceptance/build-info/cpu_arch=arm64/run_1

Parameters:

  • arch=arm64
  • cloud=azure
  • coverageBuild=false
  • cpu=4
  • encrypted=false
  • fs=ext4
  • localSSD=true
  • metamorphicLeases=default
  • runtimeAssertionsBuild=false
  • ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

Grafana is not yet available for azure clusters

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.ssh_problem failed with artifacts on master @ 8eeb7f2ae3b2cede564b46ca47e2353fd147c061:

test import/tpcc/warehouses=1000/nodes=32 failed: (cluster.go:2343).Start: _potential_ SSH flake (``ssh -vvv`` log retained in /artifacts/import/tpcc/warehouses=1000/nodes=32/cpu_arch=arm64/run_1/ssh/ssh_085657.097644986_n13_upload-start-script.log): TRANSIENT_ERROR(ssh_problem): exit status 255
test artifacts and logs in: /artifacts/import/tpcc/warehouses=1000/nodes=32/cpu_arch=arm64/run_1

Parameters:

  • arch=arm64
  • cloud=azure
  • coverageBuild=false
  • cpu=4
  • encrypted=true
  • fs=ext4
  • localSSD=true
  • runtimeAssertionsBuild=false
  • ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

Grafana is not yet available for azure clusters

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.ssh_problem failed with artifacts on master @ eb2d2e19eb29d2747d9e267bd0612a69d066adad:

test c2c/mixed-version failed: (mixedversion.go:759).Run: source: mixed-version test failure while running step 8 (run "initialize tpcc workload"): full command output in run_065056.206207043_n9_cockroach-workload-i.log: COMMAND_PROBLEM: exit status 1
(mixedversion.go:759).Run: dest: cluster.PutE: put "pkg/cmd/roachtest/fixtures/1/checkpoint-v24.2.tgz" failed: TRANSIENT_ERROR(ssh_problem): ~ scp -r -C -o StrictHostKeyChecking=no -o ConnectTimeout=10 -i /home/roach/.ssh/id_rsa -i /home/roach/.ssh/google_compute_engine pkg/cmd/roachtest/fixtures/1/checkpoint-v24.2.tgz [email protected]:/mnt/data1/cockroach/fixture.tgz: signal: killed [owner=test-eng]
test artifacts and logs in: /artifacts/c2c/mixed-version/cpu_arch=arm64/run_1

Parameters:

  • arch=arm64
  • cloud=azure
  • coverageBuild=false
  • cpu=4
  • encrypted=false
  • fs=ext4
  • localSSD=true
  • mvtDeploymentMode=system-only
  • mvtVersions=v24.2.4 → master
  • runtimeAssertionsBuild=false
  • ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

Grafana is not yet available for azure clusters

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.ssh_problem failed with artifacts on master @ 5c5c9d6803d47848aa1960dd6642d5f2c1926814:

test allocbench/nodes=7/cpu=8/kv/r=95/ops=skew failed: (cluster.go:2343).Start: _potential_ SSH flake (``ssh -vvv`` log retained in /artifacts/allocbench/nodes=7/cpu=8/kv/r=95/ops=skew/cpu_arch=arm64/run_1/ssh/ssh_065756.685109766_n1_set-cluster-settings.log): TRANSIENT_ERROR(ssh_problem): exit status 255
test artifacts and logs in: /artifacts/allocbench/nodes=7/cpu=8/kv/r=95/ops=skew/cpu_arch=arm64/run_1

Parameters:

  • arch=arm64
  • cloud=azure
  • coverageBuild=false
  • cpu=8
  • encrypted=false
  • fs=ext4
  • localSSD=true
  • runtimeAssertionsBuild=false
  • ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

Grafana is not yet available for azure clusters

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.ssh_problem failed with artifacts on master @ cea3ff5562160a3bf2802da052da2aaa40e1ccc1:

test schemachange/leasing-benchmark failed: (multiregion_leasing.go:185).func2: error persisted after 3 attempts: _potential_ SSH flake (``ssh -vvv`` log retained in /artifacts/schemachange/leasing-benchmark/cpu_arch=arm64/run_1/ssh/ssh_083432.887126271_n2_time-p-cockroach-sql.log): TRANSIENT_ERROR(ssh_problem): exit status 255
test artifacts and logs in: /artifacts/schemachange/leasing-benchmark/cpu_arch=arm64/run_1

Parameters:

  • arch=arm64
  • cloud=aws
  • coverageBuild=false
  • cpu=4
  • encrypted=false
  • fs=ext4
  • localSSD=true
  • runtimeAssertionsBuild=false
  • ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

Grafana is not yet available for aws clusters

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

Note: This build has runtime assertions enabled. If the same failure was hit in a run without assertions enabled, there should be a similar failure without this message. If there isn't one, then this failure is likely due to an assertion violation or (assertion) timeout.

roachtest.ssh_problem failed with artifacts on master @ cea3ff5562160a3bf2802da052da2aaa40e1ccc1:

test transfer-leases/drain failed: (cluster.go:2393).Stop: cluster.StopE: one or more parallel execution failure(s): _potential_ SSH flake (``ssh -vvv`` log retained in /artifacts/transfer-leases/drain/run_1/ssh/ssh_084949.540930205_n3_kill.log): TRANSIENT_ERROR(ssh_problem): exit status 255
test artifacts and logs in: /artifacts/transfer-leases/drain/run_1

Parameters:

  • arch=amd64
  • cloud=gce
  • coverageBuild=false
  • cpu=4
  • encrypted=false
  • metamorphicLeases=expiration
  • runtimeAssertionsBuild=true
  • ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.ssh_problem failed with artifacts on master @ f717f6bd218121bb5e3376af658545f6bff30c22:

test ycsb/C/nodes=3 failed: (cluster.go:2456).Run: full command output in run_111133.703364017_n4_cockroach-workload-r.log: error persisted after 2 attempts: _potential_ SSH flake (``ssh -vvv`` log retained in /artifacts/ycsb/C/nodes=3/run_1/ssh/ssh_111133.813413744_n4_cockroach-workload-r.log): TRANSIENT_ERROR(ssh_problem): exit status 255
(monitor.go:149).Wait: monitor failure: monitor user task failed: t.Fatal() was called
test artifacts and logs in: /artifacts/ycsb/C/nodes=3/run_1

Parameters:

  • arch=amd64
  • cloud=azure
  • coverageBuild=false
  • cpu=8
  • encrypted=false
  • fs=ext4
  • localSSD=true
  • runtimeAssertionsBuild=false
  • ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

Grafana is not yet available for azure clusters

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.ssh_problem failed with artifacts on master @ f717f6bd218121bb5e3376af658545f6bff30c22:

test c2c/tpcc/warehouses=1000/duration=60/cutover=30 failed: (cluster.go:2343).Start: _potential_ SSH flake (``ssh -vvv`` log retained in /artifacts/c2c/tpcc/warehouses=1000/duration=60/cutover=30/cpu_arch=arm64/run_1/ssh/ssh_065321.559794972_n4_run-start-script.log): TRANSIENT_ERROR(ssh_problem): exit status 255
test artifacts and logs in: /artifacts/c2c/tpcc/warehouses=1000/duration=60/cutover=30/cpu_arch=arm64/run_1

Parameters:

  • arch=arm64
  • cloud=azure
  • coverageBuild=false
  • cpu=8
  • encrypted=false
  • fs=ext4
  • localSSD=false
  • metamorphicLeases=default
  • runtimeAssertionsBuild=false
  • ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

Grafana is not yet available for azure clusters

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.ssh_problem failed with artifacts on master @ f717f6bd218121bb5e3376af658545f6bff30c22:

test backup/2TB/n10cpu4 failed: (test_runner.go:854).func4: cluster.PutE: put "/go/src/github.com/cockroachdb/cockroach/bin/cockroach.linux-arm64" failed: error persisted after 2 attempts: TRANSIENT_ERROR(ssh_problem): ~ scp -r -C -o StrictHostKeyChecking=no -o ConnectTimeout=10 -i /home/roach/.ssh/id_rsa -i /home/roach/.ssh/google_compute_engine /go/src/github.com/cockroachdb/cockroach/bin/cockroach.linux-arm64 [email protected]:./cockroach
Warning: Permanently added '34.56.41.37' (ECDSA) to the list of known hosts.
client_loop: send disconnect: Broken pipe
lost connection: exit status 1 [owner=test-eng]

Parameters:

  • arch=arm64
  • cloud=gce
  • coverageBuild=false
  • cpu=4
  • encrypted=false
  • fs=ext4
  • localSSD=false
  • runtimeAssertionsBuild=false
  • ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.ssh_problem failed with artifacts on master @ f717f6bd218121bb5e3376af658545f6bff30c22. A Side-Eye cluster snapshot was captured on timeout: https://app.side-eye.io/#/snapshots/420.

test admission-control/elastic-io failed: VMs preempted during the test run: teamcity-17904740-1732516957-08-n2cpu8-0002 (us-east1-b)

**Other Failures:**
(cluster.go:2456).Run: full command output in run_064618.935065768_n2_cockroach-workload-r.log: error persisted after 3 attempts: _potential_ SSH flake (``ssh -vvv`` log retained in /artifacts/admission-control/elastic-io/run_1/ssh/ssh_064618.971510197_n2_cockroach-workload-r.log): TRANSIENT_ERROR(ssh_problem): exit status 255
(test_runner.go:1363).runTest: test timed out (1h0m0s)
test artifacts and logs in: /artifacts/admission-control/elastic-io/run_1

Parameters:

  • arch=amd64
  • cloud=gce
  • coverageBuild=false
  • cpu=8
  • encrypted=false
  • fs=ext4
  • localSSD=true
  • metamorphicLeases=leader
  • runtimeAssertionsBuild=false
  • ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.ssh_problem failed with artifacts on master @ f717f6bd218121bb5e3376af658545f6bff30c22:

test jepsen/bank-multitable/strobe-skews failed: (test_runner.go:854).func4: cluster.PutE: put "/go/src/github.com/cockroachdb/cockroach/bin/cockroach.linux-arm64" failed: TRANSIENT_ERROR(ssh_problem): ~ scp -r -C -o StrictHostKeyChecking=no -o ConnectTimeout=10 -i /home/roach/.ssh/id_rsa -i /home/roach/.ssh/google_compute_engine /go/src/github.com/cockroachdb/cockroach/bin/cockroach.linux-arm64 [email protected]:./cockroach
ssh: connect to host 34.45.240.235 port 22: Connection timed out
lost connection: exit status 1 [owner=test-eng]

Parameters:

  • arch=arm64
  • cloud=gce
  • coverageBuild=false
  • cpu=4
  • encrypted=false
  • fs=ext4
  • localSSD=false
  • runtimeAssertionsBuild=false
  • ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.ssh_problem failed with artifacts on master @ e2bd414929290acd8f8dadd2453bb7bf118541c8:

test import/tpcc/warehouses=1000/nodes=32 failed: (test_runner.go:854).func4: cluster.PutE: put "/go/src/github.com/cockroachdb/cockroach/bin/cockroach.linux-amd64" failed: error persisted after 3 attempts: TRANSIENT_ERROR(ssh_problem): ~ scp -r -C -o StrictHostKeyChecking=no -o ConnectTimeout=10 -i /home/roach/.ssh/id_rsa -i /home/roach/.ssh/google_compute_engine /go/src/github.com/cockroachdb/cockroach/bin/cockroach.linux-amd64 ubuntu@:./cockroach
ssh: Could not resolve hostname : Name or service not known
lost connection: exit status 1 [owner=test-eng]

Parameters:

  • arch=amd64
  • cloud=gce
  • coverageBuild=false
  • cpu=4
  • encrypted=false
  • fs=ext4
  • localSSD=true
  • runtimeAssertionsBuild=false
  • ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.ssh_problem failed with artifacts on master @ 97965d4a2a614f2ac7fc9b10e6b5f4a92ed1d502:

test perturbation/metamorphic/addNode failed: (test_runner.go:854).func4: cluster.PutE: put "/go/src/github.com/cockroachdb/cockroach/bin/cockroach.linux-amd64" failed: TRANSIENT_ERROR(ssh_problem): ~ scp -r -C -o StrictHostKeyChecking=no -o ConnectTimeout=10 -i /home/roach/.ssh/id_rsa -i /home/roach/.ssh/google_compute_engine /go/src/github.com/cockroachdb/cockroach/bin/cockroach.linux-amd64 [email protected]:./cockroach
Warning: Permanently added '104.196.188.231' (ECDSA) to the list of known hosts.
[email protected]: Permission denied (publickey).
lost connection: exit status 1 [owner=test-eng]

Parameters:

  • arch=amd64
  • cloud=gce
  • coverageBuild=false
  • cpu=32
  • encrypted=false
  • fs=ext4
  • localSSD=true
  • runtimeAssertionsBuild=false
  • ssd=1
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.ssh_problem failed with artifacts on master @ 97965d4a2a614f2ac7fc9b10e6b5f4a92ed1d502:

test costfuzz/rand-multi-region failed: (test_runner.go:854).func4: one or more parallel execution failure(s): error persisted after 3 attempts: _potential_ SSH flake (``ssh -vvv`` log retained in /artifacts/_runner-logs/cluster-create/ssh/ssh_083726.214648140_n3_ssh-dist-key.log): TRANSIENT_ERROR(ssh_problem): exit status 255 [owner=test-eng]

Parameters:

  • cloud=azure
  • coverageBuild=false
  • cpu=4
  • runtimeAssertionsBuild=false
  • ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.ssh_problem failed with artifacts on master @ 97965d4a2a614f2ac7fc9b10e6b5f4a92ed1d502:

test failover/partial/lease-gateway/lease=leader failed: (test_runner.go:854).func4: one or more parallel execution failure(s): error persisted after 3 attempts: _potential_ SSH flake (``ssh -vvv`` log retained in /artifacts/_runner-logs/cluster-create/ssh/ssh_092448.971070470_n3_ssh-dist-key.log): TRANSIENT_ERROR(ssh_problem): exit status 255 [owner=test-eng]

Parameters:

  • cloud=azure
  • coverageBuild=false
  • cpu=2
  • runtimeAssertionsBuild=false
  • ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.ssh_problem failed with artifacts on master @ 97965d4a2a614f2ac7fc9b10e6b5f4a92ed1d502:

test ycsb/D/nodes=3/cpu=32/isolation-level=read-committed failed: (cluster.go:2456).Run: full command output in run_100111.561668996_n4_cockroach-workload-r.log: error persisted after 3 attempts: _potential_ SSH flake (``ssh -vvv`` log retained in /artifacts/ycsb/D/nodes=3/cpu=32/isolation-level=read-committed/cpu_arch=arm64/run_1/ssh/ssh_100111.606360554_n4_cockroach-workload-r.log): TRANSIENT_ERROR(ssh_problem): exit status 255
(monitor.go:149).Wait: monitor failure: monitor user task failed: t.Fatal() was called
test artifacts and logs in: /artifacts/ycsb/D/nodes=3/cpu=32/isolation-level=read-committed/cpu_arch=arm64/run_1

Parameters:

  • arch=arm64
  • cloud=aws
  • coverageBuild=false
  • cpu=32
  • encrypted=false
  • runtimeAssertionsBuild=false
  • ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

Grafana is not yet available for aws clusters

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.ssh_problem failed with artifacts on master @ f2ce52ff7cc3b4680e0fec2ef844baad49042d6a:

test kv95/enc=false/nodes=3 failed: (cluster.go:2456).Run: full command output in run_080639.497538269_n4_cockroach-workload-r.log: error persisted after 3 attempts: _potential_ SSH flake (``ssh -vvv`` log retained in /artifacts/kv95/enc=false/nodes=3/run_1/ssh/ssh_080639.604817213_n4_cockroach-workload-r.log): TRANSIENT_ERROR(ssh_problem): exit status 255
(monitor.go:149).Wait: monitor failure: monitor user task failed: t.Fatal() was called
test artifacts and logs in: /artifacts/kv95/enc=false/nodes=3/run_1

Parameters:

  • arch=amd64
  • cloud=aws
  • coverageBuild=false
  • cpu=8
  • encrypted=false
  • fs=ext4
  • localSSD=true
  • runtimeAssertionsBuild=false
  • ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

Grafana is not yet available for aws clusters

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.ssh_problem failed with artifacts on master @ f2ce52ff7cc3b4680e0fec2ef844baad49042d6a:

test failover/non-system/crash/lease=expiration failed: (test_runner.go:854).func4: cluster.PutE: put "/go/src/github.com/cockroachdb/cockroach/bin/cockroach.linux-arm64" failed: error persisted after 2 attempts: TRANSIENT_ERROR(ssh_problem): ~ scp -r -C -o StrictHostKeyChecking=no -o ConnectTimeout=10 -i /home/roach/.ssh/id_rsa -i /home/roach/.ssh/google_compute_engine /go/src/github.com/cockroachdb/cockroach/bin/cockroach.linux-arm64 [email protected]:./cockroach
Warning: Permanently added '104.197.162.254' (ECDSA) to the list of known hosts.
client_loop: send disconnect: Broken pipe
lost connection: exit status 1 [owner=test-eng]

Parameters:

  • arch=arm64
  • cloud=gce
  • coverageBuild=false
  • cpu=2
  • encrypted=false
  • fs=ext4
  • localSSD=false
  • runtimeAssertionsBuild=false
  • ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.ssh_problem failed with artifacts on master @ 7d48198a57f014a8828194b90098699f70f0695a:

test kv0/enc=false/nodes=3/cpu=32/mt-shared-process failed: (cluster.go:2245).StartServiceForVirtualCluster: cluster.StartE: cluster.Get: get ./certs failed: TRANSIENT_ERROR(ssh_problem): ~ scp -r -C -o StrictHostKeyChecking=no -o ConnectTimeout=10 -i /home/roach/.ssh/id_rsa -i /home/roach/.ssh/google_compute_engine [email protected]:./certs /tmp/roachtest-certs3371446429/certs
client_loop: send disconnect: Broken pipe
: signal: killed
test artifacts and logs in: /artifacts/kv0/enc=false/nodes=3/cpu=32/mt-shared-process/cpu_arch=arm64/run_1

Parameters:

  • arch=arm64
  • cloud=aws
  • coverageBuild=false
  • cpu=32
  • encrypted=false
  • fs=ext4
  • localSSD=true
  • runtimeAssertionsBuild=false
  • ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

Grafana is not yet available for aws clusters

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.ssh_problem failed with artifacts on master @ 7d48198a57f014a8828194b90098699f70f0695a:

test admission-control/disk-bandwidth-limiter failed: (cluster.go:2456).Run: full command output in run_065615.057148924_n2_cockroach-workload-r.log: error persisted after 3 attempts: _potential_ SSH flake (``ssh -vvv`` log retained in /artifacts/admission-control/disk-bandwidth-limiter/run_1/ssh/ssh_065615.126339027_n2_cockroach-workload-r.log): TRANSIENT_ERROR(ssh_problem): exit status 255
(cluster.go:2456).Run: context canceled
(admission_control_disk_bandwidth_overload.go:202).3: too many errors retrieving metrics
(monitor.go:149).Wait: monitor failure: monitor user task failed: t.Fatal() was called
test artifacts and logs in: /artifacts/admission-control/disk-bandwidth-limiter/run_1

Parameters:

  • arch=amd64
  • cloud=aws
  • coverageBuild=false
  • cpu=8
  • encrypted=false
  • fs=ext4
  • localSSD=true
  • metamorphicLeases=default
  • runtimeAssertionsBuild=false
  • ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

Grafana is not yet available for aws clusters

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

Note: This build has runtime assertions enabled. If the same failure was hit in a run without assertions enabled, there should be a similar failure without this message. If there isn't one, then this failure is likely due to an assertion violation or (assertion) timeout.

roachtest.ssh_problem failed with artifacts on master @ 7d48198a57f014a8828194b90098699f70f0695a:

test follower-reads/mixed-version/single-region failed: (mixedversion.go:759).Run: mixed-version test failure while running step 3 (start shared-process tenant "mixed-version-tenant-wiefz"): TRANSIENT_ERROR(ssh_problem): ~ scp -r -C -o StrictHostKeyChecking=no -o ConnectTimeout=10 -i /home/roach/.ssh/id_rsa -i /home/roach/.ssh/google_compute_engine [email protected]:tenant-certs-3.tar /tmp/tenant-certs-3.tar4173416115: signal: killed [owner=test-eng]
test artifacts and logs in: /artifacts/follower-reads/mixed-version/single-region/run_1

Parameters:

  • arch=amd64
  • cloud=azure
  • coverageBuild=false
  • cpu=2
  • encrypted=false
  • fs=ext4
  • localSSD=true
  • mvtDeploymentMode=shared-process
  • mvtVersions=v24.1.0 → v24.2.4 → master
  • runtimeAssertionsBuild=true
  • ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

Grafana is not yet available for azure clusters

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.ssh_problem failed with artifacts on master @ bcc993d796d03664604bf695e38fd5644d0bc952:

test failover/system-non-liveness/disk-stall/lease=leader failed: (cluster.go:2343).Start: _potential_ SSH flake (``ssh -vvv`` log retained in /artifacts/failover/system-non-liveness/disk-stall/lease=leader/run_1/ssh/ssh_101341.492028494_n2_upload-start-script.log): TRANSIENT_ERROR(ssh_problem): exit status 255
(cluster.go:2456).Run: context canceled
test artifacts and logs in: /artifacts/failover/system-non-liveness/disk-stall/lease=leader/run_1

Parameters:

  • arch=amd64
  • cloud=gce
  • coverageBuild=false
  • cpu=2
  • encrypted=false
  • fs=ext4
  • localSSD=false
  • runtimeAssertionsBuild=false
  • ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.ssh_problem failed with artifacts on master @ b1474fa887606008960634b571cf4501efb6281b:

test kv95/enc=false/nodes=3/size=64kb failed: (cluster.go:2456).Run: full command output in run_082200.179144094_n4_cockroach-workload-r.log: error persisted after 3 attempts: _potential_ SSH flake (``ssh -vvv`` log retained in /artifacts/kv95/enc=false/nodes=3/size=64kb/run_1/ssh/ssh_082200.263148653_n4_cockroach-workload-r.log): TRANSIENT_ERROR(ssh_problem): exit status 255
(monitor.go:149).Wait: monitor failure: monitor user task failed: t.Fatal() was called
test artifacts and logs in: /artifacts/kv95/enc=false/nodes=3/size=64kb/run_1

Parameters:

  • arch=amd64
  • cloud=aws
  • coverageBuild=false
  • cpu=8
  • encrypted=false
  • fs=ext4
  • localSSD=true
  • runtimeAssertionsBuild=false
  • ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

Grafana is not yet available for aws clusters

Same failure on other branches

This test on roachdash | Improve this report!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. O-roachtest O-robot Originated from a bot. T-testeng TestEng Team X-infra-flake the automatically generated issue was closed due to an infrastructure problem not a product issue
Projects
None yet
Development

No branches or pull requests

6 participants