Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prometheus-smartctl-exporter: fix nvme scanning #205165

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

Frostman
Copy link
Member

@Frostman Frostman commented Dec 8, 2022

As discussed a few times, it seems to be the only way to make smartctl work for NVMe devices and so enabling it when smartctl exporter is enabled.

12c26ac

It's independent of #205123 and compatible with both versions

Not the most elegant way to handle it, would be happy to learn a better way

P.S. I've tested it multiple time in a VM and it seems to be working correctly

Description of changes
Things done
  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandbox = true set in nix.conf? (See Nix manual)
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 23.05 Release Notes (or backporting 22.11 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
    • (Release notes changes) Ran nixos/doc/manual/md-to-db.sh to update generated release notes
  • Fits CONTRIBUTING.md.

@Frostman Frostman requested a review from WilliButz as a code owner December 8, 2022 18:57
@github-actions github-actions bot added 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 8.has: module (update) This PR changes an existing module in `nixos/` labels Dec 8, 2022
@Frostman Frostman requested a review from mweinelt December 8, 2022 18:57
@ofborg ofborg bot added 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 1-10 labels Dec 8, 2022
@Frostman
Copy link
Member Author

@mweinelt hi, do you still think we need it? thanks

@robryk
Copy link
Contributor

robryk commented Jan 10, 2023

Do we know what causes disk block devices to be created with group set to disk?

Why isn't this change to udev rules something we'd want to happen always, regardless of whether anyone's using smartctl_exporter? If this change causes undesired effects, we shouldn't make it silently even for those who use smartctl_exporter. If it doesn't, I don't see why it's not useful always.

@mweinelt
Copy link
Member

The group assignment comes from udev rule maintained by systemd upstream, see https://github.com/systemd/systemd/blob/0cf2dcf15402c60498165dbd3f14536766c05051/rules.d/50-udev-default.rules.in#L74.

We should probably create an issue an ask whether it would be fine to add nvme char devices into the disk group as well.

@mweinelt
Copy link
Member

mweinelt commented Jan 10, 2023

@mweinelt hi, do you still think we need it? thanks

Yes, we need this, but I'm very unhappy about the placement we're apparently being forced into.

cc @WilliButz

@robryk
Copy link
Contributor

robryk commented Jan 10, 2023

I've created an upstream issue in systemd about the inconsistent permissions: systemd/systemd#26009

@robryk
Copy link
Contributor

robryk commented Jan 10, 2023

The group assignment comes from udev rule maintained by systemd upstream, see https://github.com/systemd/systemd/blob/0cf2dcf15402c60498165dbd3f14536766c05051/rules.d/50-udev-default.rules.in#L74.

Actually, it's unclear for me what causes that file to have effect. It's not present in /etc/udev/rules.d on the running system, and the rules that write out rules.d seem to suggest that neither it is in initrd. Does systemd-udevd look for rules at a relative path from its binary or something equally magical?

@mweinelt
Copy link
Member

mweinelt commented Jul 8, 2023

Given the upstream reaction we'd probably want to move this into a separate group, that encapsulates raw disk access, e.g. rawio, to make sure we're not handing anything raw io permissions, by allowing the disk group to do it.

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/smartctl-exporter/19110/13

@Frostman Frostman closed this Jul 28, 2023
@robryk
Copy link
Contributor

robryk commented Aug 27, 2023

What's your plan on dealing with this? It doesn't seem to me that systemd-udevd's built-in rules are likely to change anytime soon (or maybe ever).

@@ -183,6 +183,9 @@ let
"ip46tables -A nixos-fw ${conf.firewallFilter} "
"-m comment --comment ${name}-exporter -j nixos-fw-accept"
]);
services.udev.extraRules = mkIf ("${name}" == "smartctl") ''
SUBSYSTEM=="nvme", KERNEL=="nvme[0-9]*", GROUP="disk"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be fine with a dedicated rawio group here. The disk group already offers something else.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an afterthought, it doesn't feel good to create a generic group rawio when configuring smartctl-exporter. I'll push a commit in a moment with it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced that disk is different, and I think that rawio would be misleading. TTBOMK the only interesting thing offered by top-level devices (as opposed to namespace devices, which are group disk) is creation and management of namespaces. I don't think that's "raw I/O" by any stretch. (Note that namespace devices do offer quering of smart status, but smartctl --scan will query the top-level ones. Arguably that's a sound choice, because smart status is per-device and not per-namespace)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just have a dedicated group like smartctl-exporter-nvme with granted permission created only if exporter is enabled? In that case, it'll be isolated to the problem we're solving here and it'll not mess with the generic groups.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, we need to set (via an udev rule) the group owner on nvme devices to something. You can't set it to two different values (udev rules don't really support POSIX ACLs), so setting it to something smartctl-exporter-specific would conflict with anyone else trying to use this.

Given that, it might be useful to set this group unconditionally, so that people don't encounter very mysterious interactions between smartctl-exporter and rest of their configuration (either by it relying on what smartctl-exporter's setup does, or it being in conflict with that).

My own suggestion would be to, unconditionally, set this group to disk, but I expect many would disagree. I think that arguments against that stemming from different kinds of access have a very weak basis, for reasons I gave above.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just opposed to mixing it with disk, as that stems from a convention that systemd dictates. I'm also opposed to making the group smartctl-exporter specific, as the exporter shouldn't have a special claim on the resouce in question.

So we are just looking for a proper name. We could narrow it down to just this single use case and go with nvme and possibly some descriptive suffix.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think nvme is an okay name for it and having it in a separate group wouldn't hurt and will not imperceptibly add permissions to existing services/users using the disk group. We should have it outside of the exporter in that case and unconditional, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, I wasn't able to find a good place for the nvme group creation yet. Maybe any suggestions? Thank you.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nixosRules in services/hardware/udev.nix possibly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, we need to set (via an udev rule) the group owner on nvme devices to something. You can't set it to two different values (udev rules don't really support POSIX ACLs), so setting it to something smartctl-exporter-specific would conflict with anyone else trying to use this.

#333961 uses a smartctl_exporter specific group and POSIX ACL in the udev rule. The result is we can keep the base/standard root:root ownership of the NVMe device nodes while simultaneously giving smartctl_exporter access. I think this is a pretty clean solution, withouth side effects (AFAICT).

As discussed a few times, it seems to be the only way to make smartctl
work for NVMe devices and so enabling it when smartctl exporter is
enabled.

NixOS@12c26ac
@Frostman Frostman reopened this Aug 27, 2023
@Frostman Frostman force-pushed the prometheus-smartctl-exporter-fix-nvme branch from f387d2d to ff1ba40 Compare August 27, 2023 17:06
@Frostman Frostman force-pushed the prometheus-smartctl-exporter-fix-nvme branch from ff1ba40 to f7d9f11 Compare August 27, 2023 17:11
@justinas
Copy link
Member

Still relevant in 23.11, any progress?

@wegank wegank added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Mar 19, 2024
@wegank wegank added the 2.status: merge conflict This PR has merge conflicts with the target branch label Apr 5, 2024
@stale stale bot removed the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Apr 5, 2024
users.groups = (mkIf (conf.group == "${name}-exporter" && !enableDynamicUser) {
"${name}-exporter" = {};
});
users.groups."${name}-exporter" = (mkIf (conf.group == "${name}-exporter" && !enableDynamicUser) {});

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
users.groups."${name}-exporter" = (mkIf (conf.group == "${name}-exporter" && !enableDynamicUser) {});
users.groups."${name}-exporter" = mkIf (conf.group == "${name}-exporter" && !enableDynamicUser) {};

@ibizaman
Copy link

ibizaman commented May 2, 2024

Adding that this worked for me but I did it by using the GROUP="disk" version.

@wegank wegank added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Jul 4, 2024
@stale stale bot removed 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md labels Aug 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.status: merge conflict This PR has merge conflicts with the target branch 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 8.has: module (update) This PR changes an existing module in `nixos/` 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 1-10
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants