[RB] Always cancel remote bazel run at end of CLI to prevent missing retries #8098

maggie-lou · 2024-12-19T22:30:41Z

A customer reported a bug where the remote bazel invocation was canceled due to a "stream terminated by RST_STREAM" error (which often occurs when an executor loses connection to an app server due to a rollout). That error was immediately returned to the CLI, which reported that the remote run failed. However that execution was retried, and eventually succeeded.

We should just cancel the invocation in this case, so the CLI reports the correct status of the final results of the remote run. We don't want a case where the CLI reports that the run failed, which may lead the client to rerun the remote bazel command, which in this case would've caused duplicate runs

If the invocation did already complete running, the CancelExecutions will be a no-op

Fixes #2 here: https://buildbuddy-corp.slack.com/archives/C07GMM2VBLY/p1734595804049229?thread_ts=1734595606.837269&cid=C07GMM2VBLY

…retries

bduffany

the remote bazel invocation was canceled due to a "stream terminated by RST_STREAM" error (which often occurs when an executor loses connection to an app server due to a rollout)

We should just cancel the invocation in this case

I think we should try to find a way to avoid failures during a rollout? Workflows don't result in "failed" GitHub statuses when the executors restart, so it seems like a worse experience if remote bazel runs do fail in this case.

Workflows are retried on the server currently, because there is no "client" that would be capable of doing retries. For remote bazel at least, maybe we could transparently retry the invocation in the CLI (instead of retrying on the server) if WaitExecution reports that the execution fails? I'm not sure how to handle this for the Run API though.

[RB] Always cancel remote bazel run at end of CLI to prevent missing …

5ac2881

…retries

maggie-lou requested review from bduffany and sluongng December 19, 2024 22:30

bduffany reviewed Dec 20, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RB] Always cancel remote bazel run at end of CLI to prevent missing retries #8098

[RB] Always cancel remote bazel run at end of CLI to prevent missing retries #8098

maggie-lou commented Dec 19, 2024

bduffany left a comment •

edited

Loading

[RB] Always cancel remote bazel run at end of CLI to prevent missing retries #8098

Are you sure you want to change the base?

[RB] Always cancel remote bazel run at end of CLI to prevent missing retries #8098

Conversation

maggie-lou commented Dec 19, 2024

bduffany left a comment • edited Loading

Choose a reason for hiding this comment

bduffany left a comment •

edited

Loading