Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: Improve Prometheus Metrics #1338

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 13 additions & 7 deletions docs/src/pages/guides/cli-options.md
Original file line number Diff line number Diff line change
Expand Up @@ -245,13 +245,19 @@ Then you can scrape the metrics from `http://localhost:3001/metrics`.

Monika exposes [Prometheus default metrics](https://prometheus.io/docs/instrumenting/writing_clientlibs/#standard-and-runtime-collectors), [Node.js specific metrics](https://github.com/siimon/prom-client/tree/master/lib/metrics), and Monika probe metrics below.

| Metric Name | Type | Purpose | Label |
| -------------------------------------- | --------- | -------------------------------------------- | ------------------------------------------- |
| `monika_probes_total` | Gauge | Collect total probe | - |
| `monika_request_status_code_info` | Gauge | Collect HTTP status code | `id`, `name`, `url`, `method` |
| `monika_request_response_time_seconds` | Histogram | Collect duration of probe request in seconds | `id`, `name`, `url`, `method`, `statusCode` |
| `monika_request_response_size_bytes` | Gauge | Collect size of response size in bytes | `id`, `name`, `url`, `method`, `statusCode` |
| `monika_alert_total` | Counter | Collect total alert triggered | `id`, `name`, `url`, `method`, `alertQuery` |
| Metric Name | Type | Purpose | Labels |
| -------------------------------------- | --------- | --------------------------------------------------------------------- | ----------------------------------------------------- |
| `monika_alerts_triggered` | Counter | Indicates the count of alerts triggered | `id`, `name`, `url`, `method`, `alertQuery` |
| `monika_alerts_triggered_total` | Counter | Indicates the cumulative count of alerts triggered | - |
| `monika_probes_running` | Gauge | Indicates whether a probe is running (1) or idle (0) | `id` |
| `monika_probes_running_total` | Gauge | Indicates the total count of probes that are currently running checks | - |
| `monika_probes_status` | Gauge | Indicates whether a probe is healthy (1) or is having an incident (0) | `id`, `name`, `url`, `method` |
| `monika_probes_total` | Gauge | Total count of all probes configured | - |
| `monika_request_response_size_bytes` | Gauge | Indicates the size of probe request's response in bytes | `id`, `name`, `url`, `method`, `statusCode`, `result` |
| `monika_request_response_time_seconds` | Histogram | Indicates the duration of the probe request in seconds | `id`, `name`, `url`, `method`, `statusCode`, `result` |
| `monika_request_status_code_info` | Gauge | Indicates the HTTP status code of the probe requests' response(s) | `id`, `name`, `url`, `method` |
| `monika_notifications_triggered` | Counter | Indicates the count of notifications triggered | `type`, `status` |
| `monika_notifications_triggered_total` | Counter | Indicates the cumulative count of notifications triggered | - |

## Repeat

Expand Down
16 changes: 10 additions & 6 deletions packages/notification/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -34,29 +34,33 @@ async function sendNotifications(
notifications: Notification[],
message: NotificationMessage,
sender?: InputSender
): Promise<void> {
): Promise<{ type: string; success: boolean }[]> {
if (sender) {
updateSender(sender)
}

await Promise.all(
// Map notifications to an array of results
const results = await Promise.all(
notifications.map(async ({ data, type }) => {
const channel = channels[type]

try {
if (!channel) {
throw new Error('Notification channel is not available')
}

await channel.send(data, message)
return { type, success: true }
} catch (error: unknown) {
const message = getErrorMessage(error)
throw new Error(
`Failed to send message using ${type}, please check your ${type} notification config.\nMessage: ${message}`
const errorMessage = getErrorMessage(error)
console.error(
`Failed to send message using ${type}, please check your ${type} notification config.\nMessage: ${errorMessage}`
)
return { type, success: false }
}
})
)

return results
}

export { sendNotifications }
Expand Down
27 changes: 26 additions & 1 deletion src/components/config/get.ts
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@
* SOFTWARE. *
**********************************************************************************/

import { randomUUID } from 'node:crypto'
import { getContext } from '../../context'
import type { Config } from '../../interfaces/config'
import { log } from '../../utils/pino'
Expand All @@ -41,7 +42,10 @@ export async function getRawConfig(): Promise<Config> {
return addDefaultNotifications(config)
}

return config
// Add default alerts for Probe not Accessible
const finalizedConfig = addDefaultAlerts(config)

return finalizedConfig
}

// mergeConfigs merges configs by overwriting each other
Expand Down Expand Up @@ -82,6 +86,27 @@ async function parseNativeConfig(): Promise<Config[]> {
)
}

export const FAILED_REQUEST_ASSERTION = {
assertion: '',
message: 'Probe not accessible',
}

function addDefaultAlerts(config: Config) {
return {
...config,
probes: config.probes?.map((probe) => ({
...probe,
alerts: [
...(probe.alerts || []),
{
id: randomUUID(),
...FAILED_REQUEST_ASSERTION,
},
],
})),
}
}

async function parseNonNativeConfig(): Promise<Config | undefined> {
const { flags } = getContext()
const hasNonNativeConfig =
Expand Down
10 changes: 9 additions & 1 deletion src/components/notification/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -22,11 +22,13 @@
* SOFTWARE. *
**********************************************************************************/

import { getEventEmitter } from '../../utils/events'
import { ValidatedResponse } from '../../plugins/validate-response'
import getIp from '../../utils/ip'
import { getMessageForAlert } from './alert-message'
import { sendNotifications } from '@hyperjumptech/monika-notification'
import type { Notification } from '@hyperjumptech/monika-notification'
import events from '../../events'

type SendAlertsProps = {
probeID: string
Expand Down Expand Up @@ -54,5 +56,11 @@ export async function sendAlerts({
response: validation.response,
})

return sendNotifications(notifications, message)
const results = await sendNotifications(notifications, message)
for (const result of results) {
getEventEmitter().emit(events.notifications.sent, {
type: result.type,
status: result.success ? 'success' : 'failed',
})
}
}
14 changes: 13 additions & 1 deletion src/components/probe/prober/http/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -159,6 +159,12 @@ export class HTTPProber extends BaseProber {
response,
})

getEventEmitter().emit(events.probe.status.changed, {
probe: this.probeConfig,
requestIndex,
status: 'up',
})

this.logMessage(
true,
getProbeResultMessage({
Expand Down Expand Up @@ -226,10 +232,16 @@ export class HTTPProber extends BaseProber {
}
const alertId = getAlertID(url, validation, probeID)

getEventEmitter().emit(events.probe.status.changed, {
probe: this.probeConfig,
requestIndex,
status: 'down',
})

getEventEmitter().emit(events.probe.alert.triggered, {
probe: this.probeConfig,
requestIndex,
alertQuery: '',
alertQuery: triggeredAlert,
})

addIncident({
Expand Down
6 changes: 6 additions & 0 deletions src/components/probe/prober/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,7 @@ export abstract class BaseProber implements Prober {

// this probe is definitely in incident state because of fail assertion, so send notification, etc.
this.handleFailedProbe(probeResults)

return
}

Expand All @@ -148,6 +149,11 @@ export abstract class BaseProber implements Prober {
requestIndex: index,
response: requestResponse,
})
getEventEmitter().emit(events.probe.status.changed, {
probe: this.probeConfig,
requestIndex: index,
status: 'up',
})
logResponseTime(requestResponse.responseTime)

if (
Expand Down
6 changes: 6 additions & 0 deletions src/events/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,9 @@ export default {
sanitized: 'CONFIG_SANITIZED',
updated: 'CONFIG_UPDATED',
},
notifications: {
sent: 'NOTIFICATIONS_SENT',
},
probe: {
alert: {
triggered: 'PROBE_ALERT_TRIGGERED',
Expand All @@ -46,5 +49,8 @@ export default {
notification: {
willSend: 'PROBE_NOTIFICATION_WILL_SEND',
},
status: {
changed: 'PROBE_STATUS_CHANGED',
},
},
}
4 changes: 4 additions & 0 deletions src/loaders/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,8 @@ function initPrometheus(prometheusPort: number) {
decrementProbeRunningTotal,
incrementProbeRunningTotal,
resetProbeRunningTotal,
collectProbeStatus,
collectNotificationSentMetrics,
} = new PrometheusCollector()

// collect prometheus metrics
Expand All @@ -93,6 +95,8 @@ function initPrometheus(prometheusPort: number) {
eventEmitter.on(events.probe.ran, incrementProbeRunningTotal)
eventEmitter.on(events.probe.finished, decrementProbeRunningTotal)
eventEmitter.on(events.config.updated, resetProbeRunningTotal)
eventEmitter.on(events.probe.status.changed, collectProbeStatus)
eventEmitter.on(events.notifications.sent, collectNotificationSentMetrics)

startPrometheusMetricsServer(prometheusPort)
}
Loading
Loading