Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs for operator #655

Merged
merged 1 commit into from
Oct 23, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
119 changes: 119 additions & 0 deletions internal/docs/Operator-CRD.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
## Managed Prometheus support for CRD (In private preview)

### Use Prometheus Pod and Service Monitor Custom Resources
The Azure Monitor metrics add-on supports scraping Prometheus metrics using Prometheus - Pod Monitors and Service Monitors, similar to the OSS Prometheus operator. Enabling the add-on will deploy the Pod and Service Monitor custom resource definitions to allow you to create your own custom resources.
Creating these custom resources allows for easy configuration of scrape jobs in any namespace, especially useful in the multi tenancy scenario, where partners don’t have access to the kube-system namespace and one error prone scrape job might affect the other partners scrape jobs. This is also true without the multitenancy model. Currently, when there is an erroneous scrape job in the custom config map, all the jobs in custom config map are ignored and the addon uses just the default scrape jobs.

Doc link for the existing custom config map way of scrape job configuration –
https://learn.microsoft.com/en-us/azure/azure-monitor/containers/prometheus-metrics-scrape-validate#create-prometheus-configuration-file

### Create a Pod or Service Monitor
The metrics add-on will use the same custom resource definition (CRD) for pod and service monitors as Prometheus, except for a change in the group name and API version. If you have existing Prometheus CRDs and custom resources on your cluster, these will not conflict with the CRDs created by the add-on.
At the same time, the CRDs created for the OSS Prometheus will not be picked up by the managed Prometheus addon. This is intentional for the purposes of isolation of scrape jobs.

Use the Pod and Service Monitor templates and follow the API specification to create your custom resources.
Your pod and service monitors should look like the examples below:


#### Example Service Monitor -
```yaml
# Note the API version is azmonitoring.coreos.com/v1 instead of monitoring.coreos.com/v1
apiVersion: azmonitoring.coreos.com/v1
kind: ServiceMonitor

# Can be deployed in any namespace
metadata:
name: reference-app
namespace: app-namespace
spec:
labelLimit: 63
labelNameLengthLimit: 511
labelValueLengthLimit: 1023

# The selector filters endpoints by service labels.
selector:
matchLabels:
app: reference-app

# Multiple endpoints can be specified. Port requires a named port.
endpoints:
- port: metrics
```

#### Example Pod Monitor -

```yaml
# Note the API version is azmonitoring.coreos.com/v1 instead of monitoring.coreos.com/v1
apiVersion: azmonitoring.coreos.com/v1
kind: PodMonitor

# Can be deployed in any namespace
metadata:
name: reference-app
namespace: app-namespace
spec:
labelLimit: 63
labelNameLengthLimit: 511
labelValueLengthLimit: 1023

# The selector specifies which pods to filter for
selector:

# Filter by pod labels
matchLabels:
environment: test
matchExpressions:
- key: app
operator: In
values: [app-frontend, app-backend]

# [Optional] Filter by pod namespace
namespaceSelector:
matchNames: [app-frontend, app-backend]

# [Optional] Labels on the pod with these keys will be added as labels to each metric scraped
podTargetLabels: [app, region, environment]

# Multiple pod endpoints can be specified. Port requires a named port.
podMetricsEndpoints:
- port: metrics
```

### Deploy a Pod or Service Monitor

You can deploy the pod or service monitor the same way as any other Kubernetes resource. Save your pod or service monitor as a yaml file and run:
```bash
kubectl apply -f <file name>.yaml
```
Any validation errors will be shown after running the command.

#### Verify Deployment
```bash
kubectl get podmonitors --all-namespaces
kubectl get servicemonitors --all-namespaces
kubectl describe podmonitor <pod monitor name> -n <pod monitor namespace>
kubectl describe servicemonitor <service monitor name> -n <service monitor namespace>
```

Verify that the API version is `azmonitoring.coreos.com/v1`.
Verify in the logs that the custom resource was detected:

```bash
kubectl get pods -n kube-system | grep ama-metrics-operator-targets
kubectl logs -n kube-system <ama-metrics- operator-targets name> -c targetallocator
```


The log output should have a similar message indicating the deployment was successful:


```bash
{"level":"info","ts":"2023-08-30T17:51:06Z","logger":"allocator","msg":"ScrapeConfig hash is different, updating new scrapeconfig detected for ","source":"EventSourcePrometheusCR"}
```

#### Delete a Pod or Service Monitor
You can delete a pod or service monitor the same way as any other Kubernetes resource.

kubectl delete podmonitor <podmonitor name> -n <podmonitor namespace>
kubectl delete servicemonitor <servicemonitor name> -n <servicemonitor namespace>

28 changes: 28 additions & 0 deletions internal/docs/Operator-Sharding.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
## Managed Prometheus support for Sharding (In private preview)


### Overview
The Managed Prometheus Addon takes dependency on the OSS [Opentelemetry Operator](https://github.com/open-telemetry/opentelemetry-operator) - [TargetAllocator(TA)](https://github.com/open-telemetry/opentelemetry-operator/tree/main/cmd/otel-allocator#target-allocator) component specifically.

The link below is the documentation on how TA handles sharding.

https://github.com/open-telemetry/opentelemetry-operator/tree/main/cmd/otel-allocator#even-distribution-of-prometheus-targets

As a part of the addon, we set the shards(collector instances) to a specific number based on our telemetry and usage or customer specific requests (not customer configurable today), in order to evenly distribute the target scraping to collector instances (ama-metrics replicas)

### Resource utilization
The sharding capability deploys multiple instances of the ama-metrics replicaset, which gets the targets allocated by the TA by querying the service exposing the targets per collector instance.
The default resource requests and limits for the ama-metrics replicaset is -
````
resources:
limits:
cpu: 7
memory: 14Gi
requests:
cpu: 150m
memory: 500Mi
````
When we shard to multiple instances, the limits and requests will be set to the same for all the instances, except we can expect a drop in the resource utilization since the scraping will be split by a collector instance.