-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachtest: ssh_problem failed #90695
Comments
roachtest.ssh_problem failed with artifacts on master @ 0c1c3e7777b28a30ebe41428fb173f0156e8968c:
Parameters: |
roachtest.ssh_problem failed with artifacts on master @ 2d926e68000df659f282d4e4477329867b9a3323:
Parameters: |
Some notes about the failure above: Error classifictionThe failure was classified as an SSH flake but, in reality, the test timed out. You can see it in the message above, as well as by checking the test logs: the workload ran for 10 hours. To consider (cc @smg260):
[1] https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/test_runner.go#L877 Actual test failure (workload timeout)The workload ran for 10h and never finished. We're passing The logs generated by the workload are fairly large (280MB). If we grep for a field in the JSON printed when a worker performs its random number of ops, I think we are able to get a count of the actual number of schema change ops the workload ran: % rg expectedExecErrors run_140012.557459920_n1_workload_run_schemachange.log | wc -l
5643 This looks suspicious to me. This number shouldn't be > 5019 (5000 @fqazi Thoughts? Anything to be done here? |
Let me dig into this, it looks like we are some how kept generating ops, which is super weird. |
@renatolabs It looks like we ran into an infinite loop inside randParentColumnForFkRelation. I'll get a patch out for it shortly |
Thanks @renatolabs .
The 0s you see is reported at the time we finally call
Agreed. However this is a symptom of us delaying the call to [1] - https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/test_runner.go#L933 |
There is a secondary bug here inside cockroache's DROP SCHEMA implementation too, which makes this worse. I'll band-aid the infinite loop first since a faster failure is better here. |
Fixes: cockroachdb#91131 Informs: cockroachdb#90695 Previously, the select statement part of this workload did not properly handle disk full errors due to spilling being capped. To addres this, this patch adds them into the expected set of errors. Additionally, this patch cleans up an infinite loop caused by unknown schema errors, this logic is no longer required and the test should fail. Release note: None
Fixes: cockroachdb#91131 Informs: cockroachdb#90695 Previously, the select statement part of this workload did not properly handle disk full errors due to spilling being capped. To addres this, this patch adds them into the expected set of errors. Additionally, this patch cleans up an infinite loop caused by unknown schema errors, this logic is no longer required and the test should fail. Release note: None
91163: workload/schemachange: fixes to improve stability of the test r=fqazi a=fqazi These changes do the following: Fixes: #91131 Informs: #90695 1. Eliminate log spam inside the schema changer workload due to the watchdog thread incorrectly dealing with null values 2. Expecting disk full errors due to large selects, since spilling does have a limit out of the box inside cockroach 3. Reducing a hang inside the workload due to a no longer necessary retry loop. Co-authored-by: Faizan Qazi <[email protected]>
Fixes: #91131 Informs: #90695 Previously, the select statement part of this workload did not properly handle disk full errors due to spilling being capped. To addres this, this patch adds them into the expected set of errors. Additionally, this patch cleans up an infinite loop caused by unknown schema errors, this logic is no longer required and the test should fail. Release note: None
roachtest.ssh_problem failed with artifacts on master @ 8357abb668a5adaff781343b394b162fb1b66c6e:
Parameters: |
roachtest.ssh_problem failed with artifacts on master @ 8357abb668a5adaff781343b394b162fb1b66c6e:
Parameters: |
roachtest.ssh_problem failed with artifacts on master @ 8357abb668a5adaff781343b394b162fb1b66c6e:
Parameters: |
roachtest.ssh_problem failed with artifacts on master @ 8357abb668a5adaff781343b394b162fb1b66c6e:
Parameters: |
roachtest.ssh_problem failed with artifacts on master @ 8357abb668a5adaff781343b394b162fb1b66c6e:
Parameters: |
roachtest.ssh_problem failed with artifacts on master @ 8357abb668a5adaff781343b394b162fb1b66c6e:
Parameters: |
roachtest.ssh_problem failed with artifacts on master @ 8357abb668a5adaff781343b394b162fb1b66c6e:
Parameters: |
roachtest.ssh_problem failed with artifacts on master @ 8357abb668a5adaff781343b394b162fb1b66c6e:
Parameters: |
roachtest.ssh_problem failed with artifacts on master @ 6610d705724a21c836f3521f75972e65d9e9e2d4:
Parameters:
Same failure on other branches
|
roachtest.ssh_problem failed with artifacts on master @ 6610d705724a21c836f3521f75972e65d9e9e2d4:
Parameters:
Help
See: roachtest README See: How To Investigate (internal) Grafana is not yet available for azure clusters Same failure on other branches
|
roachtest.ssh_problem failed with artifacts on master @ 6610d705724a21c836f3521f75972e65d9e9e2d4:
Parameters:
Help
See: roachtest README See: How To Investigate (internal) Grafana is not yet available for azure clusters Same failure on other branches
|
roachtest.ssh_problem failed with artifacts on master @ e83bc46aa42f2476b4b11b9703b8038c660dc980:
Parameters:
Help
See: roachtest README See: How To Investigate (internal) Grafana is not yet available for azure clusters Same failure on other branches
|
roachtest.ssh_problem failed with artifacts on master @ e83bc46aa42f2476b4b11b9703b8038c660dc980:
Parameters:
Help
See: roachtest README See: How To Investigate (internal) Grafana is not yet available for azure clusters Same failure on other branches
|
roachtest.ssh_problem failed with artifacts on master @ 9927a9a1f0827daa734d5eb718017cf260dfe676:
Parameters:
Help
See: roachtest README See: How To Investigate (internal) Grafana is not yet available for azure clusters Same failure on other branches
|
roachtest.ssh_problem failed with artifacts on master @ 9927a9a1f0827daa734d5eb718017cf260dfe676:
Parameters:
Same failure on other branches
|
roachtest.ssh_problem failed with artifacts on master @ 8eeb7f2ae3b2cede564b46ca47e2353fd147c061:
Parameters:
Help
See: roachtest README See: How To Investigate (internal) Grafana is not yet available for azure clusters Same failure on other branches
|
roachtest.ssh_problem failed with artifacts on master @ 8eeb7f2ae3b2cede564b46ca47e2353fd147c061:
Parameters:
Help
See: roachtest README See: How To Investigate (internal) Grafana is not yet available for azure clusters Same failure on other branches
|
roachtest.ssh_problem failed with artifacts on master @ eb2d2e19eb29d2747d9e267bd0612a69d066adad:
Parameters:
Help
See: roachtest README See: How To Investigate (internal) Grafana is not yet available for azure clusters Same failure on other branches
|
roachtest.ssh_problem failed with artifacts on master @ 5c5c9d6803d47848aa1960dd6642d5f2c1926814:
Parameters:
Help
See: roachtest README See: How To Investigate (internal) Grafana is not yet available for azure clusters Same failure on other branches
|
roachtest.ssh_problem failed with artifacts on master @ cea3ff5562160a3bf2802da052da2aaa40e1ccc1:
Parameters:
Help
See: roachtest README See: How To Investigate (internal) Grafana is not yet available for aws clusters Same failure on other branches
|
Note: This build has runtime assertions enabled. If the same failure was hit in a run without assertions enabled, there should be a similar failure without this message. If there isn't one, then this failure is likely due to an assertion violation or (assertion) timeout. roachtest.ssh_problem failed with artifacts on master @ cea3ff5562160a3bf2802da052da2aaa40e1ccc1:
Parameters:
Same failure on other branches
|
roachtest.ssh_problem failed with artifacts on master @ f717f6bd218121bb5e3376af658545f6bff30c22:
Parameters:
Help
See: roachtest README See: How To Investigate (internal) Grafana is not yet available for azure clusters Same failure on other branches
|
roachtest.ssh_problem failed with artifacts on master @ f717f6bd218121bb5e3376af658545f6bff30c22:
Parameters:
Help
See: roachtest README See: How To Investigate (internal) Grafana is not yet available for azure clusters Same failure on other branches
|
roachtest.ssh_problem failed with artifacts on master @ f717f6bd218121bb5e3376af658545f6bff30c22:
Parameters:
Same failure on other branches
|
roachtest.ssh_problem failed with artifacts on master @ f717f6bd218121bb5e3376af658545f6bff30c22. A Side-Eye cluster snapshot was captured on timeout: https://app.side-eye.io/#/snapshots/420.
Parameters:
Same failure on other branches
|
roachtest.ssh_problem failed with artifacts on master @ f717f6bd218121bb5e3376af658545f6bff30c22:
Parameters:
Same failure on other branches
|
roachtest.ssh_problem failed with artifacts on master @ e2bd414929290acd8f8dadd2453bb7bf118541c8:
Parameters:
Same failure on other branches
|
roachtest.ssh_problem failed with artifacts on master @ 97965d4a2a614f2ac7fc9b10e6b5f4a92ed1d502:
Parameters:
Same failure on other branches
|
roachtest.ssh_problem failed with artifacts on master @ 97965d4a2a614f2ac7fc9b10e6b5f4a92ed1d502:
Parameters:
Same failure on other branches
|
roachtest.ssh_problem failed with artifacts on master @ 97965d4a2a614f2ac7fc9b10e6b5f4a92ed1d502:
Parameters:
Same failure on other branches
|
roachtest.ssh_problem failed with artifacts on master @ 97965d4a2a614f2ac7fc9b10e6b5f4a92ed1d502:
Parameters:
Help
See: roachtest README See: How To Investigate (internal) Grafana is not yet available for aws clusters Same failure on other branches
|
roachtest.ssh_problem failed with artifacts on master @ f2ce52ff7cc3b4680e0fec2ef844baad49042d6a:
Parameters:
Help
See: roachtest README See: How To Investigate (internal) Grafana is not yet available for aws clusters Same failure on other branches
|
roachtest.ssh_problem failed with artifacts on master @ f2ce52ff7cc3b4680e0fec2ef844baad49042d6a:
Parameters:
Same failure on other branches
|
roachtest.ssh_problem failed with artifacts on master @ 7d48198a57f014a8828194b90098699f70f0695a:
Parameters:
Help
See: roachtest README See: How To Investigate (internal) Grafana is not yet available for aws clusters Same failure on other branches
|
roachtest.ssh_problem failed with artifacts on master @ 7d48198a57f014a8828194b90098699f70f0695a:
Parameters:
Help
See: roachtest README See: How To Investigate (internal) Grafana is not yet available for aws clusters Same failure on other branches
|
Note: This build has runtime assertions enabled. If the same failure was hit in a run without assertions enabled, there should be a similar failure without this message. If there isn't one, then this failure is likely due to an assertion violation or (assertion) timeout. roachtest.ssh_problem failed with artifacts on master @ 7d48198a57f014a8828194b90098699f70f0695a:
Parameters:
Help
See: roachtest README See: How To Investigate (internal) Grafana is not yet available for azure clusters Same failure on other branches
|
roachtest.ssh_problem failed with artifacts on master @ bcc993d796d03664604bf695e38fd5644d0bc952:
Parameters:
Same failure on other branches
|
roachtest.ssh_problem failed with artifacts on master @ b1474fa887606008960634b571cf4501efb6281b:
Parameters:
Help
See: roachtest README See: How To Investigate (internal) Grafana is not yet available for aws clusters Same failure on other branches
|
roachtest.ssh_problem failed with artifacts on master @ 1b1c8da55be48c174b7b370b305f42622546209f:
Parameters:
ROACHTEST_cloud=gce
,ROACHTEST_cpu=4
,ROACHTEST_encrypted=false
,ROACHTEST_fs=ext4
,ROACHTEST_localSSD=true
,ROACHTEST_ssd=0
Help
See: roachtest README
See: How To Investigate (internal)
This test on roachdash | Improve this report!
Jira issue: CRDB-20896
The text was updated successfully, but these errors were encountered: