Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Dashboards for the OTel collector #274

Merged
merged 23 commits into from
Sep 17, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions .chloggen/dashboard-initial-2.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Use this changelog template to create an entry for release notes.

# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
change_type: 'enhancement'

# The name of the component, or a single word describing the area of concern, (e.g. filelogreceiver)
component: 'docs'

# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
note: 'Added Dynatrace dashboards that can be used to inspect the collectors internal telemetry'

# Mandatory: One or more tracking issues related to the change. You can use the PR number here if no issue exists.
issues: [ 274 ]

# (Optional) One or more lines of additional information to render under the primary note.
# These lines will be padded with 2 spaces and then inserted directly into the document.
# Use pipe (|) for multiline entries.
subtext:
2,081 changes: 2,081 additions & 0 deletions docs/dashboards/OTel Collector self-monitoring (all collectors).json

Large diffs are not rendered by default.

2,707 changes: 2,707 additions & 0 deletions docs/dashboards/OTel Collector self-monitoring (single collector).json

Large diffs are not rendered by default.

97 changes: 97 additions & 0 deletions docs/dashboards/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
# OpenTelemetry collector self-monitoring dashboards

> [!WARNING]
> The dashboards shared in this repository are in an alpha state and can change significantly.
> They are provided as-is, with no support guarantees.
> Newer versions of these dashboards could look significantly different from earlier versions and add or remove certain metrics.

This folder contains dashboards that can be used to monitor the health of deployed OpenTelemetry collectors. The dashboards are in JSON format and can be uploaded to your Dynatrace tenant by [following the steps in the Dynatrace documentation](https://docs.dynatrace.com/docs/observe-and-explore/dashboards-and-notebooks/dashboards-new/get-started/dashboards-manage#dashboards-upload).

![A screenshot of the dashboard providing an overview of running collectors. Some are running (green), some have recently stopped sending data (yellow), and some have not sent data in a longer time (red)](img/dashboard_overview_1.png)

There are two dashboards:
- [OTel Collector self-monitoring (all collectors)](./OTel%20Collector%20self-monitoring%20(all%20collectors).json) - shows an overview of all detected OpenTelemetry collectors
- [OTel Collector self-monitoring (single collector)](./OTel%20Collector%20self-monitoring%20(single%20collector).json) - allows to look at one specific collector instance.

The dashboards rely on the presence of the `service.instance.id` resource attribute.
This attribute is added automatically by the collector to all exported telemetry.
However, it is not ingested into Dynatrace by default.
To find out how to add it, please see [Adding `service.instance.id` to the allow list](#adding-serviceinstanceid-to-the-allow-list)

The dashboards use metrics from the collectors' [internal telemetry](https://opentelemetry.io/docs/collector/internal-telemetry/).
See the [list of internal metrics](https://opentelemetry.io/docs/collector/internal-telemetry/#lists-of-internal-metrics) for an overview of which metrics are available.

## Prerequisites
The dashboards rely on the self-monitoring capabilities of the OTel collector as well as certain attributes on the exported metrics data.
Required attributes are:
- `service.name` (automatically added by the collector and added to data ingested by Dynatrace)
- `service.instance.id` (automatically added by the collector, needs to be [added to the Dynatrace attribute allow list](#adding-serviceinstanceid-to-the-allow-list))

Dynatrace accepts metrics data with Delta temporality via OTLP/HTTP.
Collector and Collector Contrib versions 0.107.0 and above as well as Dynatrace collector versions 0.12.0 and above support exporting metrics data in that format.
Earlier versions ignore the `temporality_preference` flag and would, therefore, require additional processing (cumulative to delta conversion) before ingestion.
It is possible to to this conversion in a collector, but would make the setup more complicated, so it is initially omitted in this document.

The dashboards only use metrics that have a `service.name` from this list: `dynatrace-otel-collector,otelcorecol,otelcontribcol,otelcol,otelcol-contrib`.
At the top of the dashboards, you can filter for specific `service.name`s.
You can also edit the variable and add service names if your collector has a different `service.name` and does therefore not show up on the dash.

### Adding `service.instance.id` to the allow list
While `service.name` is on the Dynatrace OTLP metrics ingest allow list by default, `service.instance.id` is not.
To add it, follow [this guide](https://docs.dynatrace.com/docs/shortlink/metrics-configuration#allow-list) and add `service.instance.id` (case-sensitive) to the list.
This will ensure that this resource attribute is stored as a dimension on the metrics in Dynatrace.
The dashboard will indicate that `service.instance.id` is not set up correctly at the top of the dashboard:

![A screenshot of how a missing service.instance.id would look in the dashboard](img/sid-missing.png)

## Sending internal telemetry (self-monitoring data) to Dynatrace
Every OpenTelemetry collector has self-monitoring capabilities, but they need to be activated.
Self-monitoring data can be exported from the collector via the OTLP protocol.
The configuration below assumes the environment variables `DT_ENDPOINT` and `DT_API_TOKEN` to be set.
In order to send data to Dynatrace via OTLP, you will need to supply a Dynatrace endpoint and an ingest token with the `metrics.ingest` scope set.
See the [Dynatrace docs](https://docs.dynatrace.com/docs/extend-dynatrace/opentelemetry/getting-started/otlp-export) for more information.
The `DT_ENDPOINT` environment variable should contain the base url and the base `/api/v2/otlp` (e.g. `https://{your-environment-id}.live.dynatrace.com/api/v2/otlp`).

To send self-monitoring data to Dynatrace, use the following configuration:

```yaml
service:
# turn on selfmon
telemetry:
metrics:
# metrics verbosity level. Higher verbosity means more metrics.
# The dashboard relies on metrics at level detailed.
level: detailed
# set up OTLP exporter
readers:
- periodic:
interval: 60000
exporter:
otlp:
protocol: http/protobuf
temporality_preference: delta
endpoint: "${env:DT_ENDPOINT}/v1/metrics"
headers:
Authorization: "Api-Token ${env:DT_API_TOKEN}"
```

Note that the OTel collector can automatically merge configuration files for you, so by assuming the above configuration is stored in a file called `selfmon-config.yaml`, it is possible to start the collector like this:

```sh
./dynatrace-otel-collector --config=your-already-existing-config.yaml --config=selfmon-config.yaml
```

Of course, you can also add the configuration directly to your existing collector configuration.

## More screenshots

### Dashboard containing all collectors

![A screenshot of a dashboard showing total numbers for incoming and outgoing telemetry for OpenTelemetry collectors](img/dashboard_overview_2.png)
![A screenshot of a dashboard showing memory and CPU usage metrics for OpenTelemetry collectors](img/dashboard_overview_3.png)

### Single-collector dashboard

![A screenshot of the single-collector dashboard, showing telemetry passing through the collector.](img/dashboard_single_1.png)
![A screenshot of the single-collector dashboard, showing metrics about incoming HTTP and RPC requests.](img/dashboard_single_2.png)
![A screenshot of the single-collector dashboard, showing memory, CPU, and batch processor metrics.](img/dashboard_single_3.png)
Binary file added docs/dashboards/img/dashboard_overview_1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/dashboards/img/dashboard_overview_2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/dashboards/img/dashboard_overview_3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/dashboards/img/dashboard_single_1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/dashboards/img/dashboard_single_2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/dashboards/img/dashboard_single_3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/dashboards/img/sid-missing.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading