Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

node_exporter systemd unit file incorrectly formatted when using sysctl.include collector #341

Open
0xdeadbeefJERKY opened this issue Apr 24, 2024 · 1 comment

Comments

@0xdeadbeefJERKY
Copy link

Bug Summary

Installing node exporter on an EC2 instance configured with the Amazon Linux 2 AMI (systemd 219) fails:

TASK [prometheus.prometheus.node_exporter : Ensure Node Exporter is enabled on boot] ***
fatal: [default]: FAILED! => {"changed": false, "msg": "Error loading unit file 'node_exporter': org.freedesktop.DBus.Error.InvalidArgs \"Invalid argument\""}

Here's the playbook being used:

- hosts: 127.0.0.1
  vars:
    node_exporter_enabled_collectors:
      - sysctl:
          include:
            vm:
              - overcommit_memory
              - overcommit_ratio
              - dirty_background_bytes
              - dirty_background_bytes
              - dirty_background_ratio
              - dirty_bytes
              - dirty_expire_centisecs
              - dirty_ratio
              - swappiness
  roles:
    - prometheus.prometheus.node_exporter

Upon further investigation, it appears the systemd unit file becomes malformed when attempting to wrap the sysctl.include collector in single quotes:

#
# Ansible managed
#

[Unit]
Description=Prometheus Node Exporter
After=network-online.target

[Service]
Type=simple
User=node-exp
Group=node-exp
ExecStart=/usr/local/bin/node_exporter \
    '--collector.sysctl.include={'vm': ['overcommit_memory', 'overcommit_ratio', 'dirty_background_bytes', 'dirty_background_bytes', 'dirty_background_ratio', 'dirty_bytes', 'dirty_expire_centisecs', 'dirty_ratio', 'swappiness']}' 

SyslogIdentifier=node_exporter
Restart=always
RestartSec=1
StartLimitInterval=0

ProtectHome=yes
NoNewPrivileges=yes

ProtectSystem=full

[Install]
WantedBy=multi-user.target

On line 14, you can see the single quote wrapping is prematurely terminated once it reaches 'vm'. More details can be found when checking the status of the service or using journalctl:

$ sudo systemctl status node_exporter
● node_exporter.service - Prometheus Node Exporter
   Loaded: error (Reason: Invalid argument)
   Active: failed (Result: resources) since Wed 2024-04-24 15:37:55 UTC; 20s ago
 Main PID: 2443 (code=killed, signal=KILL)

Apr 24 15:37:54 ip-10-66-137-116.us-west-2.compute.internal systemd[1]: node_exporter.service: main process exited, code=killed, status=9/KILL
Apr 24 15:37:54 ip-10-66-137-116.us-west-2.compute.internal systemd[1]: Unit node_exporter.service entered failed state.
Apr 24 15:37:54 ip-10-66-137-116.us-west-2.compute.internal systemd[1]: node_exporter.service failed.
Apr 24 15:37:55 ip-10-66-137-116.us-west-2.compute.internal systemd[1]: node_exporter.service holdoff time over, scheduling restart.
Apr 24 15:37:55 ip-10-66-137-116.us-west-2.compute.internal systemd[1]: node_exporter.service failed to schedule restart job: Unit is not loaded properly: Invalid argument.
Apr 24 15:37:55 ip-10-66-137-116.us-west-2.compute.internal systemd[1]: Unit node_exporter.service entered failed state.
Apr 24 15:37:55 ip-10-66-137-116.us-west-2.compute.internal systemd[1]: node_exporter.service failed.
Apr 24 15:38:12 ip-10-66-137-116.us-west-2.compute.internal systemd[1]: [/etc/systemd/system/node_exporter.service:13] Trailing garbage, ignoring.
Apr 24 15:38:12 ip-10-66-137-116.us-west-2.compute.internal systemd[1]: node_exporter.service lacks both ExecStart= and ExecStop= setting. Refusing.

Proposed Solution

This can be fixed by using double quotes for wrapping each collector argument being passed to node_exporter in the node_exporter.service.j2 template.

#
# Ansible managed
#

[Unit]
Description=Prometheus Node Exporter
After=network-online.target

[Service]
Type=simple
User=node-exp
Group=node-exp
ExecStart=/usr/local/bin/node_exporter \
    "--collector.sysctl.include={'vm': ['overcommit_memory', 'overcommit_ratio', 'dirty_background_bytes', 'dirty_background_bytes', 'dirty_background_ratio', 'dirty_bytes', 'dirty_expire_centisecs', 'dirty_ratio', 'swappiness']}" \

SyslogIdentifier=node_exporter
Restart=always
RestartSec=1
StartLimitInterval=0

ProtectHome=yes
NoNewPrivileges=yes
    
ProtectSystem=full

[Install]
WantedBy=multi-user.target
@VermiumSifell
Copy link

I'm affected too. Did you manage to solve it without manual intervention?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants