Monitor Type: collectd/consul
(Source)
Accepts Endpoints: Yes
Multiple Instances Allowed: Yes
Monitors the Consul data store by using the Consul collectd Python plugin, which collects metrics from Consul instances by hitting these endpoints:
- /agent/self
- /agent/metrics
- /catalog/nodes
- /catalog/node/:node
- /status/leader
- /status/peers
- /coordinate/datacenters
- /coordinate/nodes
- /health/state/any
Supports Consul 0.7.0+.
If running Consul version below 0.9.1, configure the Consul agents that are to be monitored to send telemetry to a SignalFx Agent instance by adding the below configuration to Consul agents configuration file:
{"telemetry":
{"statsd_address": "<agent host>:<agent port, default 8125>"}
}
This monitor should then be be configured with the telemetryServer: true
option set. This will start a UDP server listening on 0.0.0.0:8125
by
default.
To activate this monitor in the Smart Agent, add the following to your agent config:
monitors: # All monitor config goes under this key
- type: collectd/consul
... # Additional config
For a list of monitor options that are common to all monitors, see Common Configuration.
Config option | Required | Type | Description |
---|---|---|---|
pythonBinary |
no | string |
Path to a python binary that should be used to execute the Python code. If not set, a built-in runtime will be used. Can include arguments to the binary as well. |
host |
yes | string |
|
port |
yes | integer |
|
aclToken |
no | string |
Consul ACL token |
useHTTPS |
no | bool |
Set to true to connect to Consul using HTTPS. You can figure the certificate for the server with the caCertificate config option. (default: false ) |
telemetryServer |
no | bool |
(default: false ) |
telemetryHost |
no | string |
IP address or DNS to which Consul is configured to send telemetry UDP packets. Relevant only if telemetryServer is set to true. (default: 0.0.0.0 ) |
telemetryPort |
no | integer |
Port to which Consul is configured to send telemetry UDP packets. Relevant only if telemetryServer is set to true. (default: 8125 ) |
enhancedMetrics |
no | bool |
Set to true to enable collecting all metrics from Consul's runtime telemetry send via UDP or from the /agent/metrics endpoint. (default: false ) |
caCertificate |
no | string |
If Consul server has HTTPS enabled for the API, specifies the path to the CA's Certificate. |
clientCertificate |
no | string |
If client-side authentication is enabled, specifies the path to the certificate file. |
clientKey |
no | string |
If client-side authentication is enabled, specifies the path to the key file. |
signalFxAccessToken |
no | string |
These are the metrics available for this monitor. Metrics that are categorized as container/host (default) are in bold and italics in the list below.
consul.dns.stale_queries
(gauge)
Number of times an agent serves a DNS query based on information from a server that is more than 5 seconds out of date. This metric has the dimensionsdatacenter
,consul_node
andconsul_mode
.consul.memberlist.msg.suspect
(gauge)
This increments when an agent suspects another as failed when executing random probes as part of the gossip protocol. These can be an indicator of overloaded agents, network problems, or configuration errors where agents can not connect to each other on the required ports. This metric has the dimensionsdatacenter
,consul_node
andconsul_mode
.consul.serf.member.flap
(gauge)
This metric increments when an agent is marked dead and then recovers within a short time period. This can be an indicator of overloaded agents, network problems, or configuration errors where agents can not connect to each other on the required ports. This metric has the dimensionsdatacenter
,consul_node
andconsul_mode
.gauge.consul.catalog.nodes.total
(gauge)
The total number of nodes in the Consul datacenter. This metric is common to the cluster and, therefore, reported by leader only. This metric is reported with the dimensiondatacenter
,consul_node
name andconsul_mode
to indicate which mode - server or client - is the reporting consul agent.gauge.consul.catalog.nodes_by_service
(gauge)
Number of nodes providing a given service. This metric is reported by the leader only. The dimensionconsul_service
indicates which service the metric corresponds too. Additionally, the metric also has thedatacenter
andconsul_mode
dimension.gauge.consul.catalog.services.total
(gauge)
The total number of services registered with Consul in the given datacenter. This metric is common to the cluster and, therefore, reported by leader only. This metric is reported with the dimensiondatacenter
,consul_node
name andconsul_mode
to indicate which mode - server or client - is the reporting consul agent.gauge.consul.catalog.services_by_node
(gauge)
Number of services registered with a node. This metric is reported by the leader only. The dimensionconsul_node
indicates which node the metric corresponds too. Additionally, the metric also has thedatacenter
andconsul_mode
dimension.gauge.consul.consul.dns.domain_query.AGENT.avg
(gauge)
This tracks how long it takes to service forward DNS lookups on the given Consul agent. This metric has the dimensionsdatacenter
,consul_node
andconsul_mode
.gauge.consul.consul.dns.domain_query.AGENT.max
(gauge)
This tracks maximum time takes to service forward DNS lookups on the given Consul agent. This metric has the dimensionsdatacenter
,consul_node
andconsul_mode
.gauge.consul.consul.dns.domain_query.AGENT.min
(gauge)
This tracks minimum time it takes to service forward DNS lookups on the given Consul agent. This metric has the dimensionsdatacenter
,consul_node
andconsul_mode
.gauge.consul.consul.dns.ptr_query.AGENT.avg
(gauge)
This tracks average time it takes to service reverse DNS lookups on the given Consul agent. This metric has the dimensionsdatacenter
,consul_node
andconsul_mode
.gauge.consul.consul.dns.ptr_query.AGENT.max
(gauge)
This tracks maximum time it takes to service reverse DNS lookups on the given Consul agent. This metric has the dimensionsdatacenter
,consul_node
andconsul_mode
.gauge.consul.consul.dns.ptr_query.AGENT.min
(gauge)
This tracks minimum time it takes to service reverse DNS lookups on the given Consul agent. This metric has the dimensionsdatacenter
,consul_node
andconsul_mode
.gauge.consul.consul.leader.reconcile.avg
(gauge)
Time it takes the leader to reconcile the differences between Serf membership and Consul's store. This metric has the dimensionsdatacenter
,consul_node
andconsul_mode
.gauge.consul.consul.rpc.query
(gauge)
A general measure of all read volume. This metric has the dimensionsdatacenter
,consul_node
andconsul_mode
.gauge.consul.health.nodes.critical
(gauge)
Number of nodes for which health checks are reporting Critical state. This metric is reported by leader only. This metric is reported with the dimensiondatacenter
,consul_node
name andconsul_mode
.gauge.consul.health.nodes.passing
(gauge)
Number of nodes which health checks are reporting to be in Passing state. This metric is reported by leader only. This metric is reported with the dimensiondatacenter
,consul_node
name andconsul_mode
.gauge.consul.health.nodes.warning
(gauge)
Number of nodes which health checks are reporting to be in Warning state. This metric is reported by leader only. This metric is reported with the dimensiondatacenter
,consul_node
name andconsul_mode
.gauge.consul.health.services.critical
(gauge)
Number of services for which health checks are reporting Critical state. This metric is reported by leader only. This metric is reported with the dimensiondatacenter
,consul_node
name andconsul_mode
.gauge.consul.health.services.passing
(gauge)
Number of services which health checks are reporting to be in Passing state. This metric is reported by leader only. This metric is reported with the dimensiondatacenter
,consul_node
name andconsul_mode
.gauge.consul.health.services.warning
(gauge)
Number of services which health checks are reporting to be in Warning state. This metric is reported by leader only. This metric is reported with the dimensiondatacenter
,consul_node
name andconsul_mode
.gauge.consul.is_leader
(gauge)
Metric to map consul server's in leader or follower state. A follower instance returns value of 0 and leader returns a value of 1. Used by a Heat Map in the dashboard which makes recognizing the leader from followers visually easy. This metric comes with the dimension -consul_server_state
which can be either leader or follower. Also has the dimensionsdatacenter
,consul_node
andconsul_mode
.gauge.consul.network.dc.latency.avg
(gauge)
Average datacenter latency between 2 datacenters. This metric has the additional dimensiondestination_dc
dimension. The latency is calculated between this destination datacenter and the agent's datacenter given by thedatacenter
dimension. Only the leader in the source datacenter calculates this metric. The metric also has the dimensionsconsul_mode
andconsul_node
.gauge.consul.network.dc.latency.max
(gauge)
Maximum datacenter latency between 2 datacenters. This metric has the additional dimensiondestination_dc
dimension. The latency is calculated between this destination datacenter and the agent's datacenter given by thedatacenter
dimension. Only the leader in the source datacenter calculates this metric. The metric also has the dimensionsconsul_mode
andconsul_node
.gauge.consul.network.dc.latency.min
(gauge)
Minimum datacenter latency between 2 datacenters. This metric has the additional dimensiondestination_dc
dimension. The latency is calculated between this destination datacenter and the agent's datacenter given by thedatacenter
dimension. Only the leader in the source datacenter calculates this metric. The metric also has the dimensionsconsul_mode
andconsul_node
.gauge.consul.network.node.latency.avg
(gauge)
Average network latency between given node and other nodes in the datacenter. The dimensionconsul_node
corresponds to the source node. The metric also has the dimensionsdatacenter
andconsul_mode
.gauge.consul.network.node.latency.max
(gauge)
Minimum network latency between given node and other nodes in the datacenter. The dimensionconsul_node
corresponds to the source node. The metric also has the dimensionsdatacenter
andconsul_mode
.gauge.consul.network.node.latency.min
(gauge)
Minimum network latency between given node and other nodes in the datacenter. The dimensionconsul_node
corresponds to the source node. The metric also has the dimensionsdatacenter
andconsul_mode
.gauge.consul.peers
(gauge)
Number of consul Raft peers or consul agents in server mode in a given datacenter. This metric is reported by the leader only. This metric is reported with the dimensiondatacenter
,consul_node
name andconsul_mode
gauge.consul.raft.apply
(gauge)
This metric is a general indicator of the write load on the Consul servers. This metric has the global dimensionsconsul_node
,consul_mode
anddatacenter
.gauge.consul.raft.commitTime.avg
(gauge)
This measures the mean time it takes to commit a new entry to the Raft log on the leader. This metric has the dimensionsdatacenter
,consul_node
andconsul_mode
.gauge.consul.raft.commitTime.max
(gauge)
This measures the max time it takes to commit a new entry to the Raft log on the leader. This metric has the dimensionsdatacenter
,consul_node
andconsul_mode
.gauge.consul.raft.commitTime.min
(gauge)
This measures the minimum time it takes to commit a new entry to the Raft log on the leader. This metric has the dimensionsdatacenter
,consul_node
andconsul_mode
.gauge.consul.raft.leader.dispatchLog.avg
(gauge)
This measures the mean time it takes for the leader to write log entries to disk. This metric has the dimensionsdatacenter
,consul_node
andconsul_mode
.gauge.consul.raft.leader.dispatchLog.max
(gauge)
This measures the maximum time it takes for the leader to write log entries to disk. This metric has the dimensionsdatacenter
,consul_node
andconsul_mode
.gauge.consul.raft.leader.dispatchLog.min
(gauge)
This measures the minimum time it takes for the leader to write log entries to disk. This metric has the dimensionsdatacenter
,consul_node
andconsul_mode
.gauge.consul.raft.leader.lastContact.avg
(gauge)
This measures the time since the leader was last able to contact the follower nodes when checking its leader lease. It can be used as a measure for how stable the Raft timing is and how close the leader is to timing out its lease. This metric has the dimensionsdatacenter
,consul_node
andconsul_mode
.gauge.consul.raft.leader.lastContact.max
(gauge)
This measures the maximum time since the leader was last able to contact the follower nodes when checking its leader lease. It can be used as a measure for how stable the Raft timing is and how close the leader is to timing out its lease. This metric has the dimensionsdatacenter
,consul_node
andconsul_mode
.gauge.consul.raft.leader.lastContact.min
(gauge)
This measures the minimum time since the leader was last able to contact the follower nodes when checking its leader lease. It can be used as a measure for how stable the Raft timing is and how close the leader is to timing out its lease. This metric has the dimensionsdatacenter
,consul_node
andconsul_mode
.gauge.consul.raft.replication.appendEntries.rpc.AGENT.avg
(gauge)
This measures the time it takes to replicate log entries to followers. This is a general indicator of the load pressure on the Consul servers, as well as the performance of the communication between the servers. This metric is sent by the leader for each follower. The metric has the followers ip or hostname added to the metric name. This metric has the dimensionsdatacenter
,consul_node
andconsul_mode
.gauge.consul.raft.replication.appendEntries.rpc.AGENT.max
(gauge)
This measures the maximum time it takes to replicate log entries to followers. This is a general indicator of the load pressure on the Consul servers, as well as the performance of the communication between the servers. This metric is sent by the leader for each follower. The metric has the followers ip or hostname added to the metric name. This metric has the dimensionsdatacenter
,consul_node
andconsul_mode
.gauge.consul.raft.replication.appendEntries.rpc.AGENT.min
(gauge)
This measures the minimum time it takes to replicate log entries to followers. This is a general indicator of the load pressure on the Consul servers, as well as the performance of the communication between the servers. This metric is sent by the leader for each follower. The metric has the followers ip or hostname added to the metric name. This metric has the dimensionsdatacenter
,consul_node
andconsul_mode
.gauge.consul.raft.state.candidate
(gauge)
Tracks the number of times given node enters the candidate state, i.e., the number of times the Consul server starts a leader election. If this increments without a leadership change occurring it could indicate that a single server is overloaded or is experiencing network connectivity issues. This metric has the dimensionsdatacenter
,consul_node
andconsul_mode
.gauge.consul.raft.state.leader
(gauge)
This metric increments whenever a Consul server becomes a leader. If there are frequent leadership changes this may be indication that the servers are overloaded and aren't meeting the soft real-time requirements for Raft, or that there are networking problems between the servers. This metric has the dimensionsdatacenter
,consul_node
andconsul_mode
.gauge.consul.rpc.query
(gauge)gauge.consul.runtime.alloc_bytes
(gauge)
Number of bytes allocated to Consul process on the node. This metric has the dimensionsdatacenter
,consul_node
andconsul_mode
.gauge.consul.runtime.heap_objects
(gauge)
Number of heap objects allocated to Consul, indicates memory pressure on a Consul agent. This metric has the dimensionsdatacenter
,consul_node
andconsul_mode
.gauge.consul.runtime.num_goroutines
(gauge)
Number of GO routines run by Consul process on the node. Gives the general load pressure indicator for Consul agent. This metric has the dimensionsdatacenter
,consul_node
andconsul_mode
.gauge.consul.serf.events
(gauge)
Number of serf events processed by Consul. This metric has the dimensionsdatacenter
,consul_node
andconsul_mode
.gauge.consul.serf.events.consul:new-leader
(gauge)gauge.consul.serf.member.join
(gauge)
This metric tracks successful node joins to the Serf memberlist. This metric has the dimensionsdatacenter
,consul_node
andconsul_mode
.gauge.consul.serf.member.left
(gauge)
This metric tracks successful node leaves to the Serf memberlist. This metric has the dimensionsdatacenter
,consul_node
andconsul_mode
.gauge.consul.serf.queue.Event.avg
(gauge)
Average number of serf events in queue yet to be processed by Consul agent. This metric has the dimensionsdatacenter
,consul_node
andconsul_mode
.gauge.consul.serf.queue.Event.max
(gauge)
Maximum number of serf events in queue yet to be processed by Consul agent. This metric has the dimensionsdatacenter
,consul_node
andconsul_mode
.gauge.consul.serf.queue.Event.min
(gauge)
Minimum number of serf events in queue yet to be processed by Consul agent. This metric has the dimensionsdatacenter
,consul_node
andconsul_mode
.gauge.consul.serf.queue.Query.avg
(gauge)
Average number of serf queries in queue yet to be processed by Consul agent. This metric has the dimensionsdatacenter
,consul_node
andconsul_mode
.gauge.consul.serf.queue.Query.max
(gauge)
Maximum number of serf queries in queue yet to be processed by Consul agent. This metric has the dimensionsdatacenter
,consul_node
andconsul_mode
.gauge.consul.serf.queue.Query.min
(gauge)
Minimum number of serf queries in queue yet to be processed by Consul agent. This metric has the dimensionsdatacenter
,consul_node
andconsul_mode
.
The following information applies to the agent version 4.7.0+ that has
enableBuiltInFiltering: true
set on the top level of the agent config.
To emit metrics that are not default, you can add those metrics in the
generic monitor-level extraMetrics
config option. Metrics that are derived
from specific configuration options that do not appear in the above list of
metrics do not need to be added to extraMetrics
.
To see a list of metrics that will be emitted you can run agent-status monitors
after configuring this monitor in a running agent instance.
The following information only applies to agent version older than 4.7.0. If
you have a newer agent and have set enableBuiltInFiltering: true
at the top
level of your agent config, see the section above. See upgrade instructions in
Old-style whitelist filtering.
If you have a reference to the whitelist.json
in your agent's top-level
metricsToExclude
config option, and you want to emit metrics that are not in
that whitelist, then you need to add an item to the top-level
metricsToInclude
config option to override that whitelist (see Inclusion
filtering. Or you can just
copy the whitelist.json, modify it, and reference that in metricsToExclude
.
The following dimensions may occur on metrics emitted by this monitor. Some dimensions may be specific to certain metrics.
Name | Description |
---|---|
consul_mode |
Whether this consul instance is running as a server or client |
consul_node |
The name of the consul node |
datacenter |
The name of the consul datacenter |