-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(inputs.nvidia-smi): Add probe_on_startup
option
#15916
feat(inputs.nvidia-smi): Add probe_on_startup
option
#15916
Conversation
4f5b6bf
to
1eec743
Compare
Why not returning a startup error without adding a new option? |
Backwards compatibility reasons, I'm not sure if someone's system might strangely rely on this old behavior. I could envision a situation where nvidia-smi fails when called but if someone did not set What do you think? |
The question is, would an error in executing the command be repairable at runtime? If that's not the case, I'd say go without the option as people still do have a way to fix the situation using another startup-error-behavior. Otherwise, we should add this option... |
I think it depends on the issue at hand. In some cases it could be reparable, like if hot-rebooting a GPU fixes device memory corruption. In other situations like mine, it's not. Personally, I think there is no risk at all to adding this option to at least give people the choice. Not adding the option has some risk, so why not do the zero risk option? |
@srebhan any thoughts on the above? PS thanks for the responsiveness thus far 🙏🏻 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LandonTClipp, adding options that are not necessary complicates things, making it more difficult for the user to find out what to do and sometimes even cause ambiguities. Anyway, I think you do have a point in adding the option so let's go with this.
There are some cases where the nvidia-smi plugin might be found in PATH and executable, but upon running it might always return a non-zero exit code. For various reasons, in the environment I work in, this might be expected. It's thus disruptive for system logs to be polluted with infinite error messages. It's preferable in this situation to check if nvidia-smi returns a good result on plugin startup, and if not, allow the error to be bubbled up and handled according to startup_error_behavior.
4ad28d1
to
cd27c4a
Compare
@srebhan all comments addressed. |
test_on_startup
optionprobe_on_startup
option
Download PR build artifacts for linux_amd64.tar.gz, darwin_arm64.tar.gz, and windows_amd64.zip. 📦 Click here to get additional PR build artifactsArtifact URLs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work @LandonTClipp! Thanks!
@srebhan I wonder if we should revert this PR in lieu of the upcoming |
…xdata#15916)" This reverts commit b6e59aa.
Summary
There are some cases where the nvidia-smi plugin might be found in PATH and executable, but upon running it might always return a non-zero exit code. For various reasons, in the environment I work in, this might be expected. It's thus disruptive for system logs to be polluted with infinite error messages. It's preferable in this situation to check if nvidia-smi returns a good result on plugin startup, and if not, allow the error to be bubbled up and handled according to
startup_error_behavior
.Checklist
Related issues
resolves #15915