HowTo use relayor's Prometheus Integration #239

nusenu · 2023-02-12T19:02:27Z

nusenu
Feb 12, 2023
Maintainer

Recently relayor got increasingly userfriendly prometheus support. This post should give you a short introduction
on how to make use of it in your environment to keep an eye on your relays and to configure the foundation needed
for nice looking grafana dashboards of your relays' metrics and rule based alerts that tell you when something needs your attention.

If you are new to prometheus I recommend starting with the prometheus documentation first.

Tor's MetricsPort feature provides prometheus metrics for many tor relay properties that can help you understand operational issues, bottlenecks and generally how your relays are doing. It is important to understand that these metrics are sensitive and MUST NOT be made public.

Unlike more mature exporters like node_exporter (an exporter you should also deploy on all your tor servers) tor's builtin exporter (torrc: MetricsPort) does not come with security features like TLS and authentication. To workaround this limitation we use a well established webserver - nginx - as a reverse proxy to provide us with TLS and authentication features so we can collect (scrape) metrics from tor's MetricsPort over the internet without exposing them to the public.

Prometheus server software does not support conf.d style config folders where we could drop in the configuration needed for tor MetricsPort scraping without knowing or interfering with the rest of the configuration. Therefore we implemented support for conf.d style folder using ansible.

In scope tasks for relayor in the prometheus context:

torrc (MetricsPort/MetricsPortPolicy)
nginx reverse proxy configuration for MetricsPort
nginx authentication: htpasswd file generation incl. random password generation
reload nginx after nginx config changes
prometheus scrape configuration for tor MetricsPort incl. authentication
prometheus scrape configuration for blackbox exporter (optional)
prometheus alert rules for tor (optional)
reload prometheus after prometheus config changes

Out of scope tasks for relayor

prometheus server installation
nginx installation
TLS certificates (letsencrypt)
blackbox exporter installation (optional)
alertmanager installation (optional)

...other ansible roles are available for that.

Overview

To explain how to use relayor's prometheus feature we will use this example setup with two tor servers, running each two tor relay instances and one prometheus server that collects metrics from all 4 tor MetricsPorts via nginx.

Before you start using relayor's prometheus features make sure to at least have relayor version 23.1.0 or newer.

Overview of the following steps

prepare prometheus server requirements
prepare tor server requirements:
- promexporters folder
- include promexporters folder in nginx configuration
enable prometheus features in your ansible playbook.

Prepare Prometheus Server Requirements (prometheus.example.com)

If you do not have a prometheus server yet, you might enjoy this ansible role:
https://github.com/prometheus-community/ansible/tree/main/roles/prometheus

Create conf.d Folder

mkdir /etc/prometheus/conf.d
chown root:prometheus /etc/prometheus/conf.d
chmod 0750 /etc/prometheus/conf.d

Prometheus First Config Section

If you already have a prometheus configuration, simply copy it to /etc/prometheus/conf.d/1_prometheus.yml
and make sure no tor scrape_configs are included and the file can be appended with additional scrape jobs at the end.

If you do not have a prometheus.yml file yet, you can create the first section of the prometheus configuration file and make sure the filename starts with "1_..." so it gets sorted before the "tor_..." files when assembling the global prometheus.yml file

/etc/prometheus/conf.d/1_prometheus.yml example:

global:
  scrape_interval: 60s
  # scrape_timeout is set to the global default (10s).

rule_files:
  - "/etc/prometheus/rules/*.rules"

scrape_configs:

Also make sure promtool is installed on your prometheus server, relayor will use it to validate
the generated prometheus configuration files. The prometheus ansible role installs promtool by default.

relayor will create one configuration file per server in that conf.d folder:

/etc/prometheus/conf.d/tor_server1.example.com.yml
/etc/prometheus/conf.d/tor_server2.example.com.yml

and assemble the conf.d/* files into the globlal file /etc/prometheus/prometheus.yml and make backups in the same folder before generating the new file. Files in the conf.d subfolder are not backed up.

Tor Server Requirements (nginx)

have nginx and a TLS certificate installed for the hostname of the server (ansible_fqdn)
relayor connects to nginx on the default https port (443) if you want to use a non-default port, set the ansible variable tor_prometheus_scrape_port to your desired value.
relayor places its nginx configuration file in /etc/nginx/promexporters/tor_metricsports_relayor.conf by default but it can also be configured.
create the folder on the tor server:

mkdir /etc/nginx/promexporters/

include the configuration in your vhost reachable as https://server1.example.com:

include /etc/nginx/promexporters/*.conf;

Ansible Playbook

That is probably the easiest part, enable relayor's prometheus integration in your playbook, by adding at least these two variables:

tor_enableMetricsPort: True
tor_prometheus_host: prometheus.example.com

and make sure the prometheus server is in your ansible inventory file and you have sudo privileges.

Now you can run your playbook and relayor should create all the file as seen in the overview diagram.

If everything went well your prometheus webinterface should show one new job per tor relay (4 in total in the example).

job1 target: https://server1.example.com/10-lower-case-random-chars/0
job2 target: https://server1.example.com/10-lower-case-random-chars/1
job3 target: https://server2.example.com/10-lower-case-random-chars/0
job4 target: https://server2.example.com/10-lower-case-random-chars/1

All targets are protected with HTTP basic authentication and random passwords (one per server).

Job names follow this scheme: tor-FQDN-hostname-counter, so for example the first job name is "tor-server1.example.com-0".

Since relayor has complete awareness over all torrc settings it also enriches the prometheus
scrape configuration with a few additional labels that tor does not include by default. They are handy when creating Grafana dashboards:

id (IP_ORPort)
relaytype (exit/nonexit)
tor_nickname

Prometheus Alert Rules (optional)

If you also have an Alertmanager connected to your prometheus server you can tell relayor's to enable the included alert rules, by setting this variable in your playbook:

tor_gen_prometheus_alert_rules: True

Blackbox Exporter (optional)

If you also have a blackbox_exporter running, you can also monitor all tor ports by telling relayor where your blackbox_exporter is running (from the point of view of prometheus.example.com) by setting the following variable:

tor_blackbox_exporter_host: 127.0.0.1:9115

relayor requires a simple tcp_probe module named tcp_connect in your blackbox_exporter configuration.

This is the minimal /etc/blackbox_exporter.yml configuration that would work with relayor:

modules:
  tcp_connect:
    prober: tcp
    timeout: 5s

Also make sure your blackbox exporter has IPv6 connectivity when your relays have IPv6 enabled.

Next Steps

Now that all metrics data is collected on the prometheus server, the natural next step is to create a Grafana dashboard that displays the data to make sense of it. One challenge though, is that tor's metrics are not well documented yet.

More Alert Rules

Since relayor uses tor's OfflineMasterKeys feature by default there is always the risk, that the operator forget to renew the signing cert. Therefore it would be nice to ship an alert rule that warns operators when their signing cert is about to expire, but currently tor does not include the necessary metric yet.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HowTo use relayor's Prometheus Integration #239

{{title}}

Replies: 0 comments

Select a reply

HowTo use relayor's Prometheus Integration #239

nusenu Feb 12, 2023 Maintainer

In scope tasks for relayor in the prometheus context:

Out of scope tasks for relayor

Overview

Prepare Prometheus Server Requirements (prometheus.example.com)

Tor Server Requirements (nginx)

Ansible Playbook

Next Steps

Replies: 0 comments

nusenu
Feb 12, 2023
Maintainer