Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CheckMK] Temp & Humidity Alert levels should be set separately per sensor #122

Open
4 tasks
acozine opened this issue Dec 6, 2024 · 0 comments
Open
4 tasks
Assignees

Comments

@acozine
Copy link
Contributor

acozine commented Dec 6, 2024

User story

As operations folks, we want to know when there's an HVAC problem in a data center, but we don't want continual alerts for normal readings.

We have been monitoring these sensors for a few months now, so we have some idea of what the typical readings for temperature and humidity look like. Now we can determine what appropriate alerts look like for each sensor, and implement those levels as checks in CheckMK.

Acceptance criteria

  • Each temperature and humidity alert is set appropriately for its data center, rack, and position
  • We no longer see frequent alerts under normal operating conditions
  • If temperature rises or humidity falls out of normal bounds, we get an alert
  • We have documentation for how to set different alert levels for specific sensors in CheckMK

Implementation notes, if any

Here are the levels we agreed on for the five racks in the "A" Data Center:

  • set E22 Front Temp to warn at 94 and critical at 97
  • set E22 Front Humidity to warn at 18% and critical at 13%
  • set E23 Rear Temp for E23 to warn at 94 and critical at 97
  • set E23 Rear Humidity to warn at 18% and critical at 13%
  • set E24 Rear Humidity to warn at 18% and critical at 13%
  • set E25 Rear Humidity to warn at 18% and critical at 13%
  • set E26 Front Temp to warn at 94 and critical at 97
  • set E26 Front Humidity to warn at 18% and critical at 13%
  • all other sensor settings for this data center can retain the default warning and critical settings

Once those are done, we can review and discuss appropriate settings for the "B" data center. - once we know how to set sensor-specific levels, we can implement those more easily/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants