Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Field descriptors in OCP SMART Log Page are different to those in standard SMART Log Page #2577

Closed
sbates130272 opened this issue Nov 15, 2024 · 17 comments · Fixed by #2624
Closed

Comments

@sbates130272
Copy link
Contributor

sbates130272 commented Nov 15, 2024

There is a lack of consistency between the field descriptors in the standard SMART log page (which uses all lower case and underscores) and the OCP SMART extended log page (which uses mixed cases and white spaces). This is rather annoying and also poses a bit of a challenge when trying to scrape the OCP metrics via a Prometheus node-exporter text collector.

I think it would be lovely to alter the format of the OCP output (both stdout and JSON) but I am concerned that might cause an issue for people who have already written tools that parse the existing log page output. So I am creating this issue so we can discuss if such a change is welcome or not.

$ nvme smart-log /dev/nvme0n1 -o json
{
  "critical_warning":0,
  "temperature":323,
  <snip>
}

versus

$ nvme ocp smart-add-log /dev/nvme0n1 -o json
{
  <snip>
  "Bad user nand blocks - Raw":0,
  "Bad user nand blocks - Normalized":0,
  <snip>
}

Note this is somewhat related to #2189 but is different enough that I wanted to create a new issue. Depending on how this resolves it may allow us to close the other issue.

@igaw
Copy link
Collaborator

igaw commented Nov 21, 2024

Changing the existing output is likely to cause regression, which I really like to avoid. We could introduce a output format version flag so the user can decide which version should be used. Luckily we have a plugin interface for this already in place.

Obvious it means additional maintenance overhead. When we do the next major version update we just rip out the older version then.

@sbates130272
Copy link
Contributor Author

Note there is a conversation going on in prometheus-community/node-exporter-textfile-collector-scripts#228 that also deals with JSON output format changes and some issues around per controller and per namespace SMART log pages.

@sbates130272
Copy link
Contributor Author

@igaw I will try and put together a PR for this in the next few days. I've noted your comments above and will do something that aligns with those.

@igaw
Copy link
Collaborator

igaw commented Nov 21, 2024

Cool. You don't need to go overboard in the first version. I think we have to figure out how to make this maintainable somehow without duplicating too much.

sbates130272 added a commit to sbates130272/nvme-cli that referenced this issue Nov 22, 2024
The current OCP JSON format for the SMART extended log page is not
condusive to metric collection via tools like Prometheus. So we add a
new output mode that uses all lower case and underscores (instead of
spaces). This should help with metric collection.

Fixes linux-nvme#2577.

Signed-off-by: Stephen Bates <[email protected]>
sbates130272 added a commit to sbates130272/nvme-cli that referenced this issue Nov 22, 2024
The current OCP JSON format for the SMART extended log page is not
condusive to metric collection via tools like Prometheus. So we add a
new output mode that uses all lower case and underscores (instead of
spaces). This should help with metric collection. At the same time we
clean up some of the field names.

Note documentation has NOT yet been updated.

Fixes linux-nvme#2577.

Signed-off-by: Stephen Bates <[email protected]>
@sbates130272
Copy link
Contributor Author

@igaw I threw something together that adds a --non-legacy flag and then selects the JSON fields based on that. Take a look and see if you think it looks semi-sane. Then I can clean it up a bit and add documentation updates.

@igaw
Copy link
Collaborator

igaw commented Dec 2, 2024

Thanks for the PR. I like where this is going. I suggest though, to really introduce a --format-version=1 for the legacy case and --format-version=2 for the new. Because legacy and new will eventually replaced by next generation, followed by the even better one :)

@igaw
Copy link
Collaborator

igaw commented Dec 2, 2024

Hmm, I think it should be called --output-format-version to avoid collisions with other commands.

@sbates130272
Copy link
Contributor Author

Thanks @igaw. I will update the PR with that and include the documentation update. Stay tuned!

sbates130272 added a commit to sbates130272/nvme-cli that referenced this issue Dec 9, 2024
The current OCP JSON format for the SMART extended log page is not
condusive to metric collection via tools like Prometheus. So we add a
new output mode that uses all lower case and underscores (instead of
spaces). This should help with metric collection. At the same time we
clean up some of the field names. We add a new argument
(--output-format-version) to allow us to select which output version
we want. Documentation updated to reflect this change.

Fixes linux-nvme#2577.

Signed-off-by: Stephen Bates <[email protected]>
sbates130272 added a commit to sbates130272/nvme-cli that referenced this issue Dec 9, 2024
The current OCP JSON format for the SMART extended log page is not
condusive to metric collection via tools like Prometheus. So we add a
new output mode that uses all lower case and underscores (instead of
spaces). This should help with metric collection. At the same time we
clean up some of the field names. We add a new argument
(--output-format-version) to allow us to select which output version
we want. Documentation updated to reflect this change.

Fixes linux-nvme#2577.

Signed-off-by: Stephen Bates <[email protected]>
@sbates130272
Copy link
Contributor Author

@igaw can you take a look at my updated branch? I think I caught your requests and added a documentation update.

@igaw
Copy link
Collaborator

igaw commented Dec 10, 2024

Looks pretty good to me. As usual, Steven is updating the same lines right now. As your PR came before #2604. If this is okay everyone I'd merge this one first and Steven needs to update his PR. Hope this is okay. Sorry Steven. @sc108-lee

@igaw
Copy link
Collaborator

igaw commented Dec 11, 2024

I'm wondering if we should mark/treat the output of v2 as experimental for the time so we can work on it until it works for your use case. v1 is the default anyway.

@sbates130272
Copy link
Contributor Author

@igaw I can add an experimental comment to the top of the JSON output when version 2 is selected.

@igaw
Copy link
Collaborator

igaw commented Dec 12, 2024

JSON doesn't allow comments, as far I know. That means it would need something like suggest here. Not sure about this approach though. Maybe just mention the experimental status in documentation?

@sbates130272
Copy link
Contributor Author

I like the documentation idea @igaw. I will do something along those lines.

sbates130272 added a commit to sbates130272/nvme-cli that referenced this issue Dec 16, 2024
The current OCP JSON format for the SMART extended log page is not
condusive to metric collection via tools like Prometheus. So we add a
new output mode that uses all lower case and underscores (instead of
spaces). This should help with metric collection. At the same time we
clean up some of the field names. We add a new argument
(--output-format-version) to allow us to select which output version
we want. Documentation updated to reflect this change and mark this as
experimental.

Fixes linux-nvme#2577.

Signed-off-by: Stephen Bates <[email protected]>
@sbates130272
Copy link
Contributor Author

@igaw take a look at the new commit and let me know if it works for you now.

sbates130272 added a commit to sbates130272/nvme-cli that referenced this issue Dec 16, 2024
The current OCP JSON format for the SMART extended log page is not
condusive to metric collection via tools like Prometheus. So we add a
new output mode that uses all lower case and underscores (instead of
spaces). This should help with metric collection. At the same time we
clean up some of the field names. We add a new argument
(--output-format-version) to allow us to select which output version
we want. Documentation updated to reflect this change and mark this as
experimental.

Fixes linux-nvme#2577.

Signed-off-by: Stephen Bates <[email protected]>
@igaw
Copy link
Collaborator

igaw commented Dec 17, 2024

Looks good to me. Please send a PR so I can merge it. Let's see else we have to change so that the smart log output is easy to process by any monitoring system.

@sbates130272
Copy link
Contributor Author

@igaw PR sent. Thanks!

igaw pushed a commit to sbates130272/nvme-cli that referenced this issue Dec 20, 2024
The current OCP JSON format for the SMART extended log page is not
condusive to metric collection via tools like Prometheus. So we add a
new output mode that uses all lower case and underscores (instead of
spaces). This should help with metric collection. At the same time we
clean up some of the field names. We add a new argument
(--output-format-version) to allow us to select which output version
we want. Documentation updated to reflect this change and mark this as
experimental.

Fixes linux-nvme#2577.

Signed-off-by: Stephen Bates <[email protected]>
igaw pushed a commit that referenced this issue Dec 20, 2024
The current OCP JSON format for the SMART extended log page is not
condusive to metric collection via tools like Prometheus. So we add a
new output mode that uses all lower case and underscores (instead of
spaces). This should help with metric collection. At the same time we
clean up some of the field names. We add a new argument
(--output-format-version) to allow us to select which output version
we want. Documentation updated to reflect this change and mark this as
experimental.

Fixes #2577.

Signed-off-by: Stephen Bates <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants