Skip to content

Commit

Permalink
Merge pull request #6460 from ministryofjustice/opensearch-updates
Browse files Browse the repository at this point in the history
docs: ✏️ update kibana to opensearch refs
  • Loading branch information
sj-williams authored Nov 15, 2024
2 parents 0398c7e + 8e895f6 commit fd9ee7b
Show file tree
Hide file tree
Showing 3 changed files with 42 additions and 42 deletions.
6 changes: 3 additions & 3 deletions runbooks/source/grafana-dashboards.html.md.erb
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: Grafana Dashboards
weight: 9106
last_reviewed_on: 2024-10-09
last_reviewed_on: 2024-11-15
review_in: 3 months
---

Expand Down Expand Up @@ -36,7 +36,7 @@ kubectl describe node <node_name>

### Fixing "failed to load dashboard" errors

The kibana alert has reported an error similar to:
The OpenSearch alert has reported an error similar to:

> Grafana failed to load one or more dashboards - This could prevent new dashboards from being created ⚠️

Expand Down Expand Up @@ -68,7 +68,7 @@ Contact the user in the given slack-channel and ask them to fix it. Provide the

### Fixing "duplicate dashboard uid" errors

The kibana alert has reported an error similar to:
The OpenSearch alert has reported an error similar to:

> Duplicate Grafana dashboard UIDs found

Expand Down
39 changes: 0 additions & 39 deletions runbooks/source/kibana-podsecurity-violations-alert.html.md.erb

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
---
title: OpenSearch PodSecurity Violations Alert
weight: 191
last_reviewed_on: 2024-11-15
review_in: 3 months
---

# OpenSearch PodSecurity Violations Alert
This runbook will document the OpenSearch PodSecurity (PSA) violations monitor and how to debug the offending namespace and resources.

## OpenSearch Alert/Monitor

[This OpenSearch monitor](https://app-logs.cloud-platform.service.justice.gov.uk/_dashboards/app/alerting#/monitors/t4z3XI8BxtKHqtnhcXO2) has been created that will alert if any PSA violations are detected.

You can see when previous alerts have been triggered under the `Alerts` section on the monitor.

## Checking logs for PSA violations in OpenSearch

To diagnose which namespace(s) are violating and to see the reason in the logs, either go to the [discover section on OpenSearch](https://app-logs.cloud-platform.service.justice.gov.uk/_dashboards/app/data-explorer/discover/) and search for the following query:

```
"violates PodSecurity" AND NOT "smoketest-restricted" AND NOT "smoketest-privileged"
```

Or follow [this link](https://app-logs.cloud-platform.service.justice.gov.uk/_dashboards/app/data-explorer/discover#?_q=(filters:!(),query:(language:kuery,query:'%22violates%20PodSecurity%22%20AND%20NOT%20%22smoketest-restricted%22%20AND%20NOT%20%22smoketest-privileged%22'))&_a=(discover:(columns:!(_source),isDirty:!f,sort:!()),metadata:(indexPattern:bb90f230-0d2e-11ef-bf63-53113938c53a,view:discover))&_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:now-5h,to:now))) to get the same search.

This will show any logs of PSA violations (excluding smoketests). If no logs appear, then increase the time frame to match when the alert was triggered. You can check this on the monitor under the `Alerts` heading.

In the logs, it will provide information such as the offending namespace and the reason it has been triggered.

## Fixing PSA Violations

To fix a PSA violation and stop the monitor from triggering, gather the namespace and violation reason from the logs and then contact a member of the team that owns the violating namespace with details of what is causing the issue, the user then should resolve this issue.

## Slack Alert

OpenSearch will put a message into the `#low-priority-alarms` slack channel whenever the [PodSecurity Violations monitor](https://app-logs.cloud-platform.service.justice.gov.uk/_dashboards/app/alerting#/monitors/t4z3XI8BxtKHqtnhcXO2) first goes into the `Triggered` status.

The monitor is throttled to only send 1 message every 24 hours per trigger. This means if a namespace is already triggering the monitor then when another violation occurs, then it will not send another message. The best way to check what is triggering the monitor is to use the steps mentioned above under [Checking logs for PSA violation in OpenSearch](#checking-logs-for-psa-violations-in-opensearch).

0 comments on commit fd9ee7b

Please sign in to comment.