Skip to content

Commit

Permalink
Merge pull request #1075 from stackhpc/monitor-swap-usage
Browse files Browse the repository at this point in the history
Add alerts for low available swap space
  • Loading branch information
seunghun1ee authored May 31, 2024
2 parents 2884d3c + 643aa78 commit f23d52c
Show file tree
Hide file tree
Showing 3 changed files with 37 additions and 0 deletions.
18 changes: 18 additions & 0 deletions etc/kayobe/kolla/config/prometheus/system.rules
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,24 @@ groups:
summary: "Prometheus exporter at {{ $labels.instance }} reports low memory"
description: "Available memory is {{ $value }} GiB."

- alert: LowSwapSpace
expr: (node_memory_SwapFree_bytes / node_memory_SwapTotal_bytes) < {% endraw %}{{ alertmanager_node_free_swap_warning_threshold_ratio }}{% raw %}
for: 1m
labels:
severity: warning
annotations:
summary: "Swap space at {{ $labels.instance }} reports low memory"
description: "Available swap space is {{ $value | humanizePercentage }}. Running out of swap space causes OOM Kills."

- alert: LowSwapSpace
expr: (node_memory_SwapFree_bytes / node_memory_SwapTotal_bytes) < {% endraw %}{{ alertmanager_node_free_swap_critical_threshold_ratio }}{% raw %}
for: 1m
labels:
severity: critical
annotations:
summary: "Swap space at {{ $labels.instance }} reports low memory"
description: "Available swap space is {{ $value | humanizePercentage }}. Running out of swap space causes OOM Kills."

- alert: HostOomKillDetected
expr: increase(node_vmstat_oom_kill[5m]) > 0
for: 5m
Expand Down
6 changes: 6 additions & 0 deletions etc/kayobe/stackhpc-monitoring.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,12 @@ alertmanager_low_memory_threshold_gib: 5
# link. Change to false to disable this alert.
alertmanager_warn_network_bond_single_link: true

# Threshold to trigger an LowSwapSpace alert on swap space depletion (ratio).
# When the ratio of free swap space is lower than each of these values, warning
# and critical alerts will be triggered respectively.
alertmanager_node_free_swap_warning_threshold_ratio: 0.25
alertmanager_node_free_swap_critical_threshold_ratio: 0.1

###############################################################################
# Exporter configuration

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
---
features:
- |
Added two alerts (Warning and critical) that are triggered when the ratio
of (free_swap_sppace / total_swap_space) is below thresholds.
Each threshold can be modified by alterting value of
``alertmanager_node_free_swap_warning_threshold_ratio`` and
``alertmanager_node_free_swap_critical_threshold_ratio``.
Currently this solution has limitation of having one-size fits all policy.
This can cause unwanted alerts for the hosts which utilise swap heavily
Therefore it is recommended to tune the thresholds or apply silence rules
for the needs.

0 comments on commit f23d52c

Please sign in to comment.