Skip to content

Commit

Permalink
Fix alert expression modifiers
Browse files Browse the repository at this point in the history
  • Loading branch information
Deezzir committed Oct 1, 2024
1 parent 7286c13 commit 2770f80
Showing 1 changed file with 5 additions and 5 deletions.
10 changes: 5 additions & 5 deletions src/prometheus_alert_rules/dcgm.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ groups:
- name: NVIDIA DCGM Throttling Alerts
rules:
- alert: HWPowerBrakeThrottle
expr: DCGM_FI_DEV_CLOCK_THROTTLE_REASONS and 128 != 0
expr: DCGM_FI_DEV_CLOCK_THROTTLE_REASONS and 128 != bool 0
for: 3m
labels:
severity: warning
Expand All @@ -14,7 +14,7 @@ groups:
- External Power Brake Assertion being triggered (e.g. by the system power supply)
LABELS = {{ $labels }}
- alert: HWThermalThrottle
expr: DCGM_FI_DEV_CLOCK_THROTTLE_REASONS and 64 != 0
expr: DCGM_FI_DEV_CLOCK_THROTTLE_REASONS and 64 != bool 0
for: 3m
labels:
severity: warning
Expand All @@ -26,7 +26,7 @@ groups:
- Temperature being too high
LABELS = {{ $labels }}
- alert: SWThermalThrottle
expr: DCGM_FI_DEV_CLOCK_THROTTLE_REASONS and 32 != 0
expr: DCGM_FI_DEV_CLOCK_THROTTLE_REASONS and 32 != bool 0
for: 3m
labels:
severity: warning
Expand All @@ -39,7 +39,7 @@ groups:
- Current memory temperature above the Memory Max Operating Temperature
LABELS = {{ $labels }}
- alert: HWSlowdownThrottle
expr: DCGM_FI_DEV_CLOCK_THROTTLE_REASONS and 8 != 0
expr: DCGM_FI_DEV_CLOCK_THROTTLE_REASONS and 8 != bool 0
for: 3m
labels:
severity: warning
Expand All @@ -54,7 +54,7 @@ groups:
- May be also reported during PState or clock change
LABELS = {{ $labels }}
- alert: SWPowerThrottle
expr: DCGM_FI_DEV_CLOCK_THROTTLE_REASONS and 4 != 0
expr: DCGM_FI_DEV_CLOCK_THROTTLE_REASONS and 4 != bool 0
for: 5m
labels:
severity: warning
Expand Down

0 comments on commit 2770f80

Please sign in to comment.