Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(cli-repl): only ever run MongoshRepl.onExit once MONGOSH-1943 #2300

Merged
merged 2 commits into from
Dec 13, 2024

Conversation

addaleax
Copy link
Contributor

@addaleax addaleax commented Dec 12, 2024

(Depends on #2299 as the first commit here and on #2298 logically – without #2298, this just fails consistently now instead of flakily)

Previously, a task in our CI was flaky because it was running exit\n as literal input to the shell, expecting it to reliably make the shell exit.

exit as a command leads to the MongoshRepl.onExit function called, which then calls a) .close() and, indirectly, repl.close() on the Node.js REPL instance as well b) the CliRepl.exit function (and through that, indirectly, process.exit()).

However, closing the Node.js REPL instance makes it flush the history file to disk as a typically fast but asynchronous operation, and, after that, emit exit on the REPL; that in turn called MongoshRepl.onExit again (i.e. calling MongoshRepl.onExit leads to another call to the same function, just asynchronously with a short delay).

The flaky test in CI failed because of a combination of issues, including the fact that process.exit() did not necessarily actually exit the process because of coverage tooling interfering with it (which in turn happened because it was running with an altered working directory pointing at a temporary directory where nyc had not created a subdirectory for storing coverage output).

In typical operations, the REPL would flush history quickly, and calling .onExit() again would allow the process to exit normally (by ways of calling CliRepl.exit and process.exit() a second time). However, in the flaky failure case, the history flushing operation would take long enough that the original process.exit() call (which failed due to nyc) would set the internal process._exiting flag before throwing, which in turn prevents process.nextTick() from scheduling callbacks, and which then ultimately prevented the REPL from cleaning itself up, leading to the process timing out instead of exiting.

This commit introduces safeguards against calling .onExit() twice, always returning the same (never-resolving) Promise instance, and explicitly keeps track of whether the REPL was closed or not.

Copy link
Contributor

@gagik gagik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, seems tricky to catch this!

throw new MongoshInternalError(
`mongosh not initialized yet\nCurrentTrace: ${
new Error().stack
}\nClose trace: ${this.closeTrace}\n`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason we're ending with a newline?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah no, good point. I think I have a tendency to always terminate string that contain multiple newlines with one (just like most editors do for files), but Errors are a bit special in that their messages are usually displayed with an extra newline anyway. Removed in 3bcdc52!

@@ -544,6 +546,7 @@ class MongoshNodeRepl implements EvaluationListener {
});

repl.on('exit', () => {
if (this._runtimeState) this._runtimeState.repl = null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are the implications of this being set to null? is this to force a quicker garbage collection / flushing? might be worth leaving a comment, though perhaps I could just be lacking context

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, comment is a good idea – also added in 3bcdc52. (The shell can also exit when stdin ends, which would result in us calling this listener here first, before we call .onExit() through some other means)

Previously, a task in our CI was flaky because it was running `exit\n`
as literal input to the shell, expecting it to reliably make the shell exit.

`exit` as a command leads to the `MongoshRepl.onExit` function called,
which then calls a) `.close()` and, indirectly, `repl.close()` on the
Node.js REPL instance as well b) the `CliRepl.exit` function (and through
that, indirectly, `process.exit()`).

However, closing the Node.js REPL instance makes it flush the history
file to disk as a typically fast but asynchronous operation, and,
after that, emit `exit` on the REPL; that in turn called
`MongoshRepl.onExit` again (i.e. calling `MongoshRepl.onExit` leads
to another call to the same function, just asynchronously with
a short delay).

The flaky test in CI failed because of a combination of issues,
including the fact that `process.exit()` did not necessarily actually
exit the process because of coverage tooling interfering with it
(which in turn happened because it was running with an altered
working directory pointing at a temporary directory where `nyc`
had not created a subdirectory for storing coverage output).

In typical operations, the REPL would flush history quickly, and
calling `.onExit()` again would allow the process to exit
normally (by ways of calling `CliRepl.exit` and `process.exit()`
a second time). However, in the flaky failure case, the history
flushing operation would take long enough that the original
`process.exit()` call (which failed due to nyc) would set the
internal `process._exiting` flag before throwing, which in turn
prevents `process.nextTick()` from scheduling callbacks, and which
then ultimately prevented the REPL from cleaning itself up,
leading to the process timing out instead of exiting.

This commit introduces safeguards against calling `.onExit()` twice,
always returning the same (never-resolving) `Promise` instance,
and explicitly keeps track of whether the REPL was closed or not.
@addaleax addaleax force-pushed the 1943-mongoshrepl-onexit branch from df41dba to 3bcdc52 Compare December 13, 2024 13:46
@addaleax addaleax merged commit 18a499e into main Dec 13, 2024
127 of 130 checks passed
@addaleax addaleax deleted the 1943-mongoshrepl-onexit branch December 13, 2024 15:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants