NVMe metrics collector based on parallel #204

dobbi84 · 2024-02-06T08:10:30Z

The collector based on parallel allows to collect the metrics quicker when the node has several NVMe drives. Included a Grafana dashboard that visualizes the metrics and for each panel reports a description coming from the NVMe express specifications.

Because there are now four files NVMe related everything has been moved to a common directory.

The collector based on parallel allows to collect the metrics quicker when the node has several NVMe drives. Included a Grafana dashboard that visualizes the metrics and for each panel reports a description coming from the NVMe express specifications. Because there are now four files NVMe related everything has been moved to a common directory. Signed-off-by: dobbi84 <[email protected]>

dswarbrick · 2024-02-13T14:38:24Z

Thanks for your contribution. However, when nvme_metrics.py was introduced (#155), it was with the intention that it would supersede and eventually replace the shell script variant. Keeping multiple variants of collectors is not a path we want to go down, as it is confusing for end users, and increases the burden on maintainers.

Besides, these collectors are typically only run at fairly infrequent intervals. For example, the Debian package of this repo ships with systemd timers than execute some collectors with a 15 minute interval. With such infrequent execution, is the utmost speed and parallelism that important?

dobbi84 · 2024-02-13T16:35:34Z

I agree with you, i fell as well on the mistake of using the bash version. I run my timers at 1m interval or less and parallel seemed a painless way of collecting fast from multiple drives.

I noticed your effort in the go-nvme and i think could be a good solution to have an exporter that does not parse the nvme-cli output and possibly aligned with the nvme spec release cycle.

I can close this the PR, if you agree.

dswarbrick · 2024-02-15T14:37:29Z

I noticed your effort in the go-nvme and i think could be a good solution to have an exporter that does not parse the nvme-cli output and possibly aligned with the nvme spec release cycle.

I split go-nvme out of another pet project of mine, https://github.com/dswarbrick/smart, but I don't really have the time to actively work on either. These projects were really just to satisfy my curiosity when I started them way back in 2017. Nowadays, I think that Rust would be a more appropriate language in which to write low-level libraries which interact with the kernel.

Implementing comprehensive support for the NVMe standard is a mammoth task - just look at https://github.com/linux-nvme/libnvme to get an idea. So, go-nvme exists really more as just a proof of concept.

If an NVMe exporter were to be developed (in any language), it would need to run as root to be able to send the NVMe admin commands to devices. Most experienced sysadmins still have nightmares about running HTTP services as root - probably due to some decades-old RCE exploit in Apache 1.3, before dropping privileges became de facto. This is why IMHO for now, parsing the (preferably JSON) output of nvme-cli, smartctl etc. is the preferable solution, since it allows for privilege separation.

dobbi84 · 2024-02-15T16:19:26Z

understood thanks for the direction

igaw · 2024-02-16T09:27:28Z

If an NVMe exporter were to be developed (in any language), it would need to run as root to be able to send the NVMe admin commands to devices.

The need for root to run nvme-cli has changed slightly recently. At least nvme list, nvme list-subsys, etc don't need root anymore. These commands are using the information from sysfs. Though the nvme smart-log is till issuing a request and this depends on root permission. But given this is a real use case we can think about adding the smart log information to sysfs as well. So I am interested to get these use case working without root, though that's just me. I can bring this topic up at the upcoming LSFMM conference. In this case it would be good to know which information needs to be available for 'non root' users.

dobbi84 · 2024-02-19T08:35:28Z

In this case it would be good to know which information needs to be available for 'non root' users.

i think a good start would be:

smart-log: the most parsed so far by existing exporters
ocp smart-add-log: as @jmhands reported , provides "Physical media units written" that is needed to calculate WAF.

jmhands · 2024-02-19T15:40:56Z

adding the standard NVMe SMART log to sysfs would be great. The temperature commands are already converted to degrees C, just make sure we export the Data Units Read and Data Units Written in bytes instead of the standard 1000 512B units.

dswarbrick · 2024-02-19T18:25:01Z

This discussion is moving off-topic for this PR. I recommend moving the discussion to the libnvme project, since this PR will not be merged.

dobbi84 force-pushed the dobbi84/nvme_metrics branch from 69be6a0 to e6c8b21 Compare February 6, 2024 08:12

dobbi84 mentioned this pull request Feb 6, 2024

Prometheus exporter for SMART and OCP C0 Log Page linux-nvme/nvme-cli#2189

Open

dswarbrick closed this Feb 19, 2024

dobbi84 deleted the dobbi84/nvme_metrics branch March 26, 2024 09:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NVMe metrics collector based on parallel #204

NVMe metrics collector based on parallel #204

dobbi84 commented Feb 6, 2024 •

edited

Loading

dswarbrick commented Feb 13, 2024

dobbi84 commented Feb 13, 2024

dswarbrick commented Feb 15, 2024

dobbi84 commented Feb 15, 2024

igaw commented Feb 16, 2024

dobbi84 commented Feb 19, 2024

jmhands commented Feb 19, 2024

dswarbrick commented Feb 19, 2024

NVMe metrics collector based on parallel #204

NVMe metrics collector based on parallel #204

Conversation

dobbi84 commented Feb 6, 2024 • edited Loading

dswarbrick commented Feb 13, 2024

dobbi84 commented Feb 13, 2024

dswarbrick commented Feb 15, 2024

dobbi84 commented Feb 15, 2024

igaw commented Feb 16, 2024

dobbi84 commented Feb 19, 2024

jmhands commented Feb 19, 2024

dswarbrick commented Feb 19, 2024

dobbi84 commented Feb 6, 2024 •

edited

Loading