-
Notifications
You must be signed in to change notification settings - Fork 193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NVMe metrics collector based on parallel #204
Conversation
The collector based on parallel allows to collect the metrics quicker when the node has several NVMe drives. Included a Grafana dashboard that visualizes the metrics and for each panel reports a description coming from the NVMe express specifications. Because there are now four files NVMe related everything has been moved to a common directory. Signed-off-by: dobbi84 <[email protected]>
69be6a0
to
e6c8b21
Compare
Thanks for your contribution. However, when nvme_metrics.py was introduced (#155), it was with the intention that it would supersede and eventually replace the shell script variant. Keeping multiple variants of collectors is not a path we want to go down, as it is confusing for end users, and increases the burden on maintainers. Besides, these collectors are typically only run at fairly infrequent intervals. For example, the Debian package of this repo ships with systemd timers than execute some collectors with a 15 minute interval. With such infrequent execution, is the utmost speed and parallelism that important? |
I agree with you, i fell as well on the mistake of using the bash version. I run my timers at 1m interval or less and parallel seemed a painless way of collecting fast from multiple drives. I noticed your effort in the go-nvme and i think could be a good solution to have an exporter that does not parse the I can close this the PR, if you agree. |
I split go-nvme out of another pet project of mine, https://github.com/dswarbrick/smart, but I don't really have the time to actively work on either. These projects were really just to satisfy my curiosity when I started them way back in 2017. Nowadays, I think that Rust would be a more appropriate language in which to write low-level libraries which interact with the kernel. Implementing comprehensive support for the NVMe standard is a mammoth task - just look at https://github.com/linux-nvme/libnvme to get an idea. So, go-nvme exists really more as just a proof of concept. If an NVMe exporter were to be developed (in any language), it would need to run as root to be able to send the NVMe admin commands to devices. Most experienced sysadmins still have nightmares about running HTTP services as root - probably due to some decades-old RCE exploit in Apache 1.3, before dropping privileges became de facto. This is why IMHO for now, parsing the (preferably JSON) output of nvme-cli, smartctl etc. is the preferable solution, since it allows for privilege separation. |
understood thanks for the direction |
The need for |
i think a good start would be: |
adding the standard NVMe SMART log to sysfs would be great. The temperature commands are already converted to degrees C, just make sure we export the |
This discussion is moving off-topic for this PR. I recommend moving the discussion to the libnvme project, since this PR will not be merged. |
The collector based on parallel allows to collect the metrics quicker when the node has several NVMe drives. Included a Grafana dashboard that visualizes the metrics and for each panel reports a description coming from the NVMe express specifications.
Because there are now four files NVMe related everything has been moved to a common directory.