From 193086fe678828c26a5f099919aca7fe7cc7e309 Mon Sep 17 00:00:00 2001 From: Rashmi Chandrashekar Date: Mon, 23 Oct 2023 15:32:10 -0700 Subject: [PATCH] docs for operator --- internal/docs/Operator-CRD.md | 119 +++++++++++++++++++++++++++++ internal/docs/Operator-Sharding.md | 28 +++++++ 2 files changed, 147 insertions(+) create mode 100644 internal/docs/Operator-CRD.md create mode 100644 internal/docs/Operator-Sharding.md diff --git a/internal/docs/Operator-CRD.md b/internal/docs/Operator-CRD.md new file mode 100644 index 000000000..94f9f6dd7 --- /dev/null +++ b/internal/docs/Operator-CRD.md @@ -0,0 +1,119 @@ +## Managed Prometheus support for CRD (In private preview) + +### Use Prometheus Pod and Service Monitor Custom Resources +The Azure Monitor metrics add-on supports scraping Prometheus metrics using Prometheus - Pod Monitors and Service Monitors, similar to the OSS Prometheus operator. Enabling the add-on will deploy the Pod and Service Monitor custom resource definitions to allow you to create your own custom resources. +Creating these custom resources allows for easy configuration of scrape jobs in any namespace, especially useful in the multi tenancy scenario, where partners don’t have access to the kube-system namespace and one error prone scrape job might affect the other partners scrape jobs. This is also true without the multitenancy model. Currently, when there is an erroneous scrape job in the custom config map, all the jobs in custom config map are ignored and the addon uses just the default scrape jobs. + +Doc link for the existing custom config map way of scrape job configuration – +https://learn.microsoft.com/en-us/azure/azure-monitor/containers/prometheus-metrics-scrape-validate#create-prometheus-configuration-file + +### Create a Pod or Service Monitor +The metrics add-on will use the same custom resource definition (CRD) for pod and service monitors as Prometheus, except for a change in the group name and API version. If you have existing Prometheus CRDs and custom resources on your cluster, these will not conflict with the CRDs created by the add-on. +At the same time, the CRDs created for the OSS Prometheus will not be picked up by the managed Prometheus addon. This is intentional for the purposes of isolation of scrape jobs. + +Use the Pod and Service Monitor templates and follow the API specification to create your custom resources. +Your pod and service monitors should look like the examples below: + + +#### Example Service Monitor - +```yaml +# Note the API version is azmonitoring.coreos.com/v1 instead of monitoring.coreos.com/v1 +apiVersion: azmonitoring.coreos.com/v1 +kind: ServiceMonitor + +# Can be deployed in any namespace +metadata: + name: reference-app + namespace: app-namespace +spec: + labelLimit: 63 + labelNameLengthLimit: 511 + labelValueLengthLimit: 1023 + + # The selector filters endpoints by service labels. + selector: + matchLabels: + app: reference-app + + # Multiple endpoints can be specified. Port requires a named port. + endpoints: + - port: metrics +``` + +#### Example Pod Monitor - + +```yaml +# Note the API version is azmonitoring.coreos.com/v1 instead of monitoring.coreos.com/v1 +apiVersion: azmonitoring.coreos.com/v1 +kind: PodMonitor + +# Can be deployed in any namespace +metadata: + name: reference-app + namespace: app-namespace +spec: + labelLimit: 63 + labelNameLengthLimit: 511 + labelValueLengthLimit: 1023 + + # The selector specifies which pods to filter for + selector: + + # Filter by pod labels + matchLabels: + environment: test + matchExpressions: + - key: app + operator: In + values: [app-frontend, app-backend] + + # [Optional] Filter by pod namespace + namespaceSelector: + matchNames: [app-frontend, app-backend] + + # [Optional] Labels on the pod with these keys will be added as labels to each metric scraped + podTargetLabels: [app, region, environment] + + # Multiple pod endpoints can be specified. Port requires a named port. + podMetricsEndpoints: + - port: metrics +``` + +### Deploy a Pod or Service Monitor + +You can deploy the pod or service monitor the same way as any other Kubernetes resource. Save your pod or service monitor as a yaml file and run: +```bash +kubectl apply -f .yaml +``` +Any validation errors will be shown after running the command. + +#### Verify Deployment +```bash +kubectl get podmonitors --all-namespaces +kubectl get servicemonitors --all-namespaces +kubectl describe podmonitor -n +kubectl describe servicemonitor -n +``` + +Verify that the API version is `azmonitoring.coreos.com/v1`. +Verify in the logs that the custom resource was detected: + +```bash +kubectl get pods -n kube-system | grep ama-metrics-operator-targets +kubectl logs -n kube-system -c targetallocator +``` + + +The log output should have a similar message indicating the deployment was successful: + + +```bash +{"level":"info","ts":"2023-08-30T17:51:06Z","logger":"allocator","msg":"ScrapeConfig hash is different, updating new scrapeconfig detected for ","source":"EventSourcePrometheusCR"} +``` + +#### Delete a Pod or Service Monitor +You can delete a pod or service monitor the same way as any other Kubernetes resource. + +kubectl delete podmonitor -n +kubectl delete servicemonitor -n + diff --git a/internal/docs/Operator-Sharding.md b/internal/docs/Operator-Sharding.md new file mode 100644 index 000000000..1f449f6ba --- /dev/null +++ b/internal/docs/Operator-Sharding.md @@ -0,0 +1,28 @@ +## Managed Prometheus support for Sharding (In private preview) + + +### Overview +The Managed Prometheus Addon takes dependency on the OSS [Opentelemetry Operator](https://github.com/open-telemetry/opentelemetry-operator) - [TargetAllocator(TA)](https://github.com/open-telemetry/opentelemetry-operator/tree/main/cmd/otel-allocator#target-allocator) component specifically. + +The link below is the documentation on how TA handles sharding. + +https://github.com/open-telemetry/opentelemetry-operator/tree/main/cmd/otel-allocator#even-distribution-of-prometheus-targets + +As a part of the addon, we set the shards(collector instances) to a specific number based on our telemetry and usage or customer specific requests (not customer configurable today), in order to evenly distribute the target scraping to collector instances (ama-metrics replicas) + +### Resource utilization +The sharding capability deploys multiple instances of the ama-metrics replicaset, which gets the targets allocated by the TA by querying the service exposing the targets per collector instance. +The default resource requests and limits for the ama-metrics replicaset is - +```` +resources: + limits: + cpu: 7 + memory: 14Gi + requests: + cpu: 150m + memory: 500Mi +```` +When we shard to multiple instances, the limits and requests will be set to the same for all the instances, except we can expect a drop in the resource utilization since the scraping will be split by a collector instance. + + +