After declaring all the main elements for your monitoring stack setup, it's time to define the remaining components and prepare the global Kustomize project that puts them together.
The components you're missing in your monitoring stack setup are several different resources. So, start by creating the usual resources
folder at the root of this Prometheus server Kustomize project.
$ mkdir -p $HOME/k8sprjs/monitoring/resources
You have to enable the two storage volumes you configured in the first part of this guide as persistent volume resources. Do the following.
-
Generate two new yaml files under the
resources
folder, one per persistent volume.$ touch $HOME/k8sprjs/monitoring/resources/{data-grafana,data-prometheus}.persistentvolume.yaml
-
Copy each yaml below in their correct file.
-
In
data-grafana.persistentvolume.yaml
.apiVersion: v1 kind: PersistentVolume metadata: name: data-grafana spec: capacity: storage: 1.9G volumeMode: Filesystem accessModes: - ReadWriteOnce storageClassName: local-path persistentVolumeReclaimPolicy: Retain local: path: /mnt/monitoring-ssd/grafana-data/k3smnt nodeAffinity: required: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/hostname operator: In values: - k3sagent01
-
In
data-prometheus.persistentvolume.yaml
.apiVersion: v1 kind: PersistentVolume metadata: name: data-prometheus spec: capacity: storage: 9.8G volumeMode: Filesystem accessModes: - ReadWriteOnce storageClassName: local-path persistentVolumeReclaimPolicy: Retain local: path: /mnt/monitoring-ssd/prometheus-data/k3smnt nodeAffinity: required: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/hostname operator: In values: - k3sagent02
Both PVs above are like the ones you declared for the Nextcloud platform, so I won't repeat what I explained about them here. Just remember the following.
- Ensure that names and capacities align to what you've declared in the corresponding persistent volume claims.
- Verify that the paths exist in the corresponding K3s agent nodes.
- The
nodeAffinity
specification has to point, in thevalues
list, to the right node on each PV.
-
Although Prometheus is the core element of this setup, I decided to identify the whole thing with the monitoring moniker. Therefore, let's declare the namespace with such term as follows.
-
Create a file for the namespace element under the
resources
folder.$ touch $HOME/k8sprjs/monitoring/resources/monitoring.namespace.yaml
-
Put in
monitoring.namespace.yaml
the declaration below.apiVersion: v1 kind: Namespace metadata: name: monitoring
Your Prometheus server will call the Kubernetes APIs available in your K3s cluster to scrape all the available metrics from resources like nodes, pods, deployments, and more. So you need to associate the proper read-only privileges to your monitoring stack with an RBAC policy declared in a ClusterRole
resource, as it's done next.
-
Generate the file
monitoring.clusterrole.yaml
within theresources
directory.$ touch $HOME/k8sprjs/monitoring/resources/monitoring.clusterrole.yaml
-
In your new
monitoring.clusterrole.yaml
file, copy the following yaml.apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: monitoring rules: - apiGroups: [""] resources: - nodes - nodes/proxy - services - endpoints - pods verbs: ["get", "list", "watch"] - apiGroups: - extensions resources: - ingresses verbs: ["get", "list", "watch"] - nonResourceURLs: ["/metrics"] verbs: ["get"]
Notice the list of
resources
that thisClusterRole
resource allows to reach, and also that all theverbs
indicated are related to read-only actions.BEWARE!
TheClusterRole
resources are not namespaced, so you won't see anamespace
parameter in them.
The ClusterRole you've just created before won't be enforced unless you bind it to a user or set of users. Here you'll bind it to a default
ServiceAccount
user within the monitoring
namespace, which is the one that always exists by default within any namespace and the one used by services unless other is created and specified.
-
Produce a new
monitoring.clusterrolebinding.yaml
file in theresources
directory.$ touch $HOME/k8sprjs/monitoring/resources/monitoring.clusterrolebinding.yaml
-
Put the yaml below in
monitoring.clusterrolebinding.yaml
.apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: monitoring roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: monitoring subjects: - kind: ServiceAccount name: default namespace: monitoring
Above, you can see how the
monitoring
ClusterRole
is binded to the users specified insubjects
list which, in this case, only has thedefault
ServiceAccount
user within themonitoring
namespace.
To clone the Secret
of your wildcard certificate, you'll have to do exactly the same as in previous setups like Gitea's or Nextcloud's: modify the patch yaml file you have within the cert-manager
Kustomize project to add the desired new namespace after the ones already present in the Reflector annotations.
-
Edit the
wildcard.deimos.cloud-tls.certificate.cert-manager.reflector.namespaces.yaml
file found in thecertificates/patches
directory under yourcert-manager
Kustomize project. It's full path on this guide is$HOME/k8sprjs/cert-manager/certificates/patches/wildcard.deimos.cloud-tls.certificate.cert-manager.reflector.namespaces.yaml
. There, just concatenate themonitoring
namespace to bothreflector
annotations. The file should end looking like below.# Certificate wildcard.deimos.cloud-tls patch for Reflector-managed namespaces apiVersion: cert-manager.io/v1 kind: Certificate metadata: name: wildcard.deimos.cloud-tls namespace: certificates spec: secretTemplate: annotations: reflector.v1.k8s.emberstack.com/reflection-allowed-namespaces: "kube-system,nextcloud,gitea,monitoring" reflector.v1.k8s.emberstack.com/reflection-auto-namespaces: "kube-system,nextcloud,gitea,monitoring"
-
Check the Kustomize output of the
certificates
project (kubectl kustomize $HOME/k8sprjs/cert-manager/certificates | less
) to ensure that it looks like below.apiVersion: v1 kind: Namespace metadata: name: certificates --- apiVersion: cert-manager.io/v1 kind: Certificate metadata: name: wildcard.deimos.cloud-tls namespace: certificates spec: dnsNames: - '*.deimos.cloud' - deimos.cloud duration: 8760h isCA: false issuerRef: group: cert-manager.io kind: ClusterIssuer name: cluster-issuer-selfsigned privateKey: algorithm: ECDSA encoding: PKCS8 rotationPolicy: Always size: 384 renewBefore: 720h secretName: wildcard.deimos.cloud-tls secretTemplate: annotations: reflector.v1.k8s.emberstack.com/reflection-allowed: "true" reflector.v1.k8s.emberstack.com/reflection-allowed-namespaces: kube-system,nextcloud,gitea,monitoring reflector.v1.k8s.emberstack.com/reflection-auto-enabled: "true" reflector.v1.k8s.emberstack.com/reflection-auto-namespaces: kube-system,nextcloud,gitea,monitoring subject: organizations: - Deimos --- apiVersion: cert-manager.io/v1 kind: ClusterIssuer metadata: name: cluster-issuer-selfsigned spec: selfSigned: {}
-
After validating the output, apply the project on your cluster.
$ kubectl apply -k $HOME/k8sprjs/cert-manager/certificates
BEWARE!
Don't forget that thecert-manager
system won't automatically apply this modification to the annotations in the secret already generated for your wildcard certificate. This is fine at this point but later, after you've deployed the whole monitoring stack, you'll have to apply the change to the certificate's secret to make Reflector clone it into the newmonitoring
namespace.
Next, you must tie everything up with the mandatory kustomization.yaml
file required for your monitoring setup. Do as it's explained next.
-
Under the
monitoring
folder, generate akustomization.yaml
file.touch $HOME/k8sprjs/monitoring/kustomization.yaml
-
Put the following yaml declaration in that new
kustomization.yaml
.# Monitoring stack setup apiVersion: kustomize.config.k8s.io/v1beta1 kind: Kustomization namespace: monitoring commonLabels: platform: monitoring namePrefix: mntr- resources: - resources/data-grafana.persistentvolume.yaml - resources/data-prometheus.persistentvolume.yaml - resources/monitoring.namespace.yaml - resources/monitoring.clusterrole.yaml - resources/monitoring.clusterrolebinding.yaml - components/agent-kube-state-metrics - components/agent-prometheus-node-exporter - components/server-prometheus - components/ui-grafana
You declared a very similar file for your Nextcloud platform, so go back to my explanation in that guide if you don't remember anything about this yaml. Beyond that, notice that the
nameprefix
for all the resources in this monitoring stack will bemntr-
. -
As in other cases, before you apply this
kustomization.yaml
file, you have to be sure that the output of this Kustomize project is correct. Be aware that the output it's quite big, so dump it in a file with a significant name likemonitoring.k.output.yaml
.$ kubectl kustomize $HOME/k8sprjs/monitoring > monitoring.k.output.yaml
-
Compare the dumped Kustomize output in your
monitoring.k.output.yaml
file with the one below.apiVersion: v1 kind: Namespace metadata: labels: platform: monitoring name: monitoring --- apiVersion: v1 automountServiceAccountToken: false kind: ServiceAccount metadata: labels: app.kubernetes.io/component: exporter app.kubernetes.io/name: kube-state-metrics app.kubernetes.io/version: 2.5.0 platform: monitoring name: mntr-agent-kube-state-metrics namespace: monitoring --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: labels: app.kubernetes.io/component: exporter app.kubernetes.io/name: kube-state-metrics app.kubernetes.io/version: 2.5.0 platform: monitoring name: mntr-agent-kube-state-metrics rules: - apiGroups: - "" resources: - configmaps - secrets - nodes - pods - services - resourcequotas - replicationcontrollers - limitranges - persistentvolumeclaims - persistentvolumes - namespaces - endpoints verbs: - list - watch - apiGroups: - apps resources: - statefulsets - daemonsets - deployments - replicasets verbs: - list - watch - apiGroups: - batch resources: - cronjobs - jobs verbs: - list - watch - apiGroups: - autoscaling resources: - horizontalpodautoscalers verbs: - list - watch - apiGroups: - authentication.k8s.io resources: - tokenreviews verbs: - create - apiGroups: - authorization.k8s.io resources: - subjectaccessreviews verbs: - create - apiGroups: - policy resources: - poddisruptionbudgets verbs: - list - watch - apiGroups: - certificates.k8s.io resources: - certificatesigningrequests verbs: - list - watch - apiGroups: - storage.k8s.io resources: - storageclasses - volumeattachments verbs: - list - watch - apiGroups: - admissionregistration.k8s.io resources: - mutatingwebhookconfigurations - validatingwebhookconfigurations verbs: - list - watch - apiGroups: - networking.k8s.io resources: - networkpolicies - ingresses verbs: - list - watch - apiGroups: - coordination.k8s.io resources: - leases verbs: - list - watch --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: labels: platform: monitoring name: mntr-monitoring rules: - apiGroups: - "" resources: - nodes - nodes/proxy - services - endpoints - pods verbs: - get - list - watch - apiGroups: - extensions resources: - ingresses verbs: - get - list - watch - nonResourceURLs: - /metrics verbs: - get --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: labels: app.kubernetes.io/component: exporter app.kubernetes.io/name: kube-state-metrics app.kubernetes.io/version: 2.5.0 platform: monitoring name: mntr-agent-kube-state-metrics roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: mntr-agent-kube-state-metrics subjects: - kind: ServiceAccount name: mntr-agent-kube-state-metrics namespace: monitoring --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: labels: platform: monitoring name: mntr-monitoring roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: mntr-monitoring subjects: - kind: ServiceAccount name: default namespace: monitoring --- apiVersion: v1 data: prometheus.rules.yaml: |- groups: - name: example_alert_rule_unique_name rules: - alert: HighRequestLatency expr: job:request_latency_seconds:mean5m{job="node-exporter"} > 0.5 for: 10m labels: severity: page annotations: summary: High request latency prometheus.yaml: | # Prometheus main configuration file global: scrape_interval: 60s evaluation_interval: 60s rule_files: - /etc/prometheus/prometheus_alerts.rules.yaml alerting: alertmanagers: - scheme: http static_configs: - targets: # - "mntr-alertmanager.monitoring.svc.deimos.cluster.io:9093" scrape_configs: - job_name: 'kubernetes-apiservers' scrape_interval: 180s kubernetes_sd_configs: - role: endpoints scheme: https tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token relabel_configs: - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name] action: keep regex: default;kubernetes;https - job_name: 'kubernetes-nodes' scrape_interval: 120s scheme: https tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token kubernetes_sd_configs: - role: node relabel_configs: - action: labelmap regex: __meta_kubernetes_node_label_(.+) - target_label: __address__ replacement: kubernetes.default.svc.deimos.cluster.io:443 - source_labels: [__meta_kubernetes_node_name] regex: (.+) target_label: __metrics_path__ replacement: /api/v1/nodes/${1}/proxy/metrics - job_name: 'kubernetes-pods' scrape_interval: 240s kubernetes_sd_configs: - role: pod relabel_configs: - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape] action: keep regex: true - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path] action: replace target_label: __metrics_path__ regex: (.+) - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port] action: replace regex: ([^:]+)(?::\d+)?;(\d+) replacement: $1:$2 target_label: __address__ - action: labelmap regex: __meta_kubernetes_pod_label_(.+) - source_labels: [__meta_kubernetes_namespace] action: replace target_label: kubernetes_namespace - source_labels: [__meta_kubernetes_pod_name] action: replace target_label: kubernetes_pod_name - job_name: 'kubernetes-cadvisor' scrape_interval: 180s scheme: https tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token kubernetes_sd_configs: - role: node relabel_configs: - action: labelmap regex: __meta_kubernetes_node_label_(.+) - target_label: __address__ replacement: kubernetes.default.svc.deimos.cluster.io:443 - source_labels: [__meta_kubernetes_node_name] regex: (.+) target_label: __metrics_path__ replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor - job_name: 'kubernetes-service-endpoints' scrape_interval: 45s kubernetes_sd_configs: - role: endpoints relabel_configs: - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape] action: keep regex: true - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme] action: replace target_label: __scheme__ regex: (https?) - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path] action: replace target_label: __metrics_path__ regex: (.+) - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port] action: replace target_label: __address__ regex: ([^:]+)(?::\d+)?;(\d+) replacement: $1:$2 - action: labelmap regex: __meta_kubernetes_service_label_(.+) - source_labels: [__meta_kubernetes_namespace] action: replace target_label: kubernetes_namespace - source_labels: [__meta_kubernetes_service_name] action: replace target_label: kubernetes_name tls_config: insecure_skip_verify: true - job_name: 'kube-state-metrics' scrape_interval: 50s static_configs: - targets: ['mntr-agent-kube-state-metrics.monitoring.svc.deimos.cluster.io:8080'] - job_name: 'node-exporter' scrape_interval: 55s kubernetes_sd_configs: - role: endpoints relabel_configs: - source_labels: [__meta_kubernetes_endpoints_name] regex: 'node-exporter' action: keep kind: ConfigMap metadata: labels: app: server-prometheus platform: monitoring name: mntr-server-prometheus-6mdmdtddbk namespace: monitoring --- apiVersion: v1 kind: Service metadata: labels: app.kubernetes.io/component: exporter app.kubernetes.io/name: kube-state-metrics app.kubernetes.io/version: 2.5.0 platform: monitoring name: mntr-agent-kube-state-metrics namespace: monitoring spec: clusterIP: None ports: - name: http-metrics port: 8080 targetPort: http-metrics - name: telemetry port: 8081 targetPort: telemetry selector: app.kubernetes.io/component: exporter app.kubernetes.io/name: kube-state-metrics app.kubernetes.io/version: 2.5.0 platform: monitoring --- apiVersion: v1 kind: Service metadata: annotations: prometheus.io/port: "9100" prometheus.io/scrape: "true" labels: app.kubernetes.io/component: exporter app.kubernetes.io/name: node-exporter platform: monitoring name: mntr-agent-prometheus-node-exporter namespace: monitoring spec: ports: - name: node-exporter port: 9100 protocol: TCP targetPort: 9100 selector: app.kubernetes.io/component: exporter app.kubernetes.io/name: node-exporter platform: monitoring --- apiVersion: v1 kind: Service metadata: labels: app: server-prometheus platform: monitoring name: mntr-server-prometheus namespace: monitoring spec: ports: - name: http port: 443 protocol: TCP targetPort: 9090 selector: app: server-prometheus platform: monitoring type: ClusterIP --- apiVersion: v1 kind: Service metadata: annotations: prometheus.io/port: "3000" prometheus.io/scrape: "true" labels: app: ui-grafana platform: monitoring name: mntr-ui-grafana namespace: monitoring spec: ports: - name: http port: 443 protocol: TCP targetPort: 3000 selector: app: ui-grafana platform: monitoring type: ClusterIP --- apiVersion: v1 kind: PersistentVolume metadata: labels: platform: monitoring name: mntr-data-grafana spec: accessModes: - ReadWriteOnce capacity: storage: 1.9G local: path: /mnt/monitoring-ssd/grafana-data/k3smnt nodeAffinity: required: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/hostname operator: In values: - k3sagent01 persistentVolumeReclaimPolicy: Retain storageClassName: local-path volumeMode: Filesystem --- apiVersion: v1 kind: PersistentVolume metadata: labels: platform: monitoring name: mntr-data-prometheus spec: accessModes: - ReadWriteOnce capacity: storage: 9.8G local: path: /mnt/monitoring-ssd/prometheus-data/k3smnt nodeAffinity: required: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/hostname operator: In values: - k3sagent02 persistentVolumeReclaimPolicy: Retain storageClassName: local-path volumeMode: Filesystem --- apiVersion: v1 kind: PersistentVolumeClaim metadata: labels: app: server-prometheus platform: monitoring name: mntr-data-server-prometheus namespace: monitoring spec: accessModes: - ReadWriteOnce resources: requests: storage: 9.8G storageClassName: local-path volumeName: mntr-data-prometheus --- apiVersion: v1 kind: PersistentVolumeClaim metadata: labels: app: ui-grafana platform: monitoring name: mntr-data-ui-grafana namespace: monitoring spec: accessModes: - ReadWriteOnce resources: requests: storage: 1.9G storageClassName: local-path volumeName: mntr-data-grafana --- apiVersion: apps/v1 kind: Deployment metadata: labels: app.kubernetes.io/component: exporter app.kubernetes.io/name: kube-state-metrics app.kubernetes.io/version: 2.5.0 platform: monitoring name: mntr-agent-kube-state-metrics namespace: monitoring spec: replicas: 1 selector: matchLabels: app.kubernetes.io/component: exporter app.kubernetes.io/name: kube-state-metrics app.kubernetes.io/version: 2.5.0 platform: monitoring template: metadata: labels: app.kubernetes.io/component: exporter app.kubernetes.io/name: kube-state-metrics app.kubernetes.io/version: 2.5.0 platform: monitoring spec: automountServiceAccountToken: true containers: - image: registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.5.0 livenessProbe: httpGet: path: /healthz port: 8080 initialDelaySeconds: 5 timeoutSeconds: 5 name: server ports: - containerPort: 8080 name: http-metrics - containerPort: 8081 name: telemetry readinessProbe: httpGet: path: / port: 8081 initialDelaySeconds: 5 timeoutSeconds: 5 resources: limits: cpu: 500m memory: 128Mi requests: cpu: 250m memory: 64Mi securityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL readOnlyRootFilesystem: true runAsUser: 65534 nodeSelector: kubernetes.io/os: linux serviceAccountName: mntr-agent-kube-state-metrics tolerations: - effect: NoExecute operator: Exists --- apiVersion: apps/v1 kind: StatefulSet metadata: labels: app: server-prometheus platform: monitoring name: mntr-server-prometheus namespace: monitoring spec: replicas: 1 selector: matchLabels: app: server-prometheus platform: monitoring serviceName: mntr-server-prometheus template: metadata: labels: app: server-prometheus platform: monitoring spec: containers: - args: - --storage.tsdb.retention.time=12h - --config.file=/etc/prometheus/prometheus.yaml - --storage.tsdb.path=/prometheus image: prom/prometheus:v2.35.0 name: server ports: - containerPort: 9090 name: http resources: limits: cpu: 1000m memory: 512Mi requests: cpu: 500m memory: 256Mi volumeMounts: - mountPath: /etc/prometheus/prometheus.yaml name: server-prometheus-config subPath: prometheus.yaml - mountPath: /etc/prometheus/prometheus.rules.yaml name: server-prometheus-config subPath: prometheus.rules.yaml - mountPath: /prometheus name: server-prometheus-storage securityContext: fsGroup: 65534 runAsGroup: 65534 runAsNonRoot: true runAsUser: 65534 volumes: - configMap: defaultMode: 420 items: - key: prometheus.yaml path: prometheus.yaml - key: prometheus.rules.yaml path: prometheus.rules.yaml name: mntr-server-prometheus-6mdmdtddbk name: server-prometheus-config - name: server-prometheus-storage persistentVolumeClaim: claimName: mntr-data-server-prometheus --- apiVersion: apps/v1 kind: StatefulSet metadata: labels: app: ui-grafana platform: monitoring name: mntr-ui-grafana namespace: monitoring spec: replicas: 1 selector: matchLabels: app: ui-grafana platform: monitoring serviceName: mntr-ui-grafana template: metadata: labels: app: ui-grafana platform: monitoring spec: containers: - image: grafana/grafana:8.5.2 livenessProbe: failureThreshold: 3 initialDelaySeconds: 30 periodSeconds: 10 successThreshold: 1 tcpSocket: port: 3000 timeoutSeconds: 1 name: server ports: - containerPort: 3000 name: http protocol: TCP readinessProbe: failureThreshold: 3 httpGet: path: /robots.txt port: 3000 scheme: HTTP initialDelaySeconds: 10 periodSeconds: 30 successThreshold: 1 timeoutSeconds: 2 resources: limits: cpu: 500m memory: 256Mi requests: cpu: 250m memory: 128Mi volumeMounts: - mountPath: /var/lib/grafana name: ui-grafana-storage securityContext: fsGroup: 472 supplementalGroups: - 0 volumes: - name: ui-grafana-storage persistentVolumeClaim: claimName: mntr-data-ui-grafana --- apiVersion: apps/v1 kind: DaemonSet metadata: labels: app.kubernetes.io/component: exporter app.kubernetes.io/name: node-exporter platform: monitoring name: mntr-agent-prometheus-node-exporter namespace: monitoring spec: selector: matchLabels: app.kubernetes.io/component: exporter app.kubernetes.io/name: node-exporter platform: monitoring template: metadata: labels: app.kubernetes.io/component: exporter app.kubernetes.io/name: node-exporter platform: monitoring spec: containers: - args: - --path.sysfs=/host/sys - --path.rootfs=/host/root - --no-collector.wifi - --no-collector.hwmon - --collector.filesystem.mount-points-exclude=^/(dev|proc|sys|var/lib/docker/.+|var/lib/kubelet/pods/.+)($|/) - --collector.netclass.ignored-devices=^(veth.*)$ image: prom/node-exporter:v1.3.1 name: server ports: - containerPort: 9100 protocol: TCP resources: limits: cpu: 250m memory: 180Mi requests: cpu: 102m memory: 180Mi volumeMounts: - mountPath: /host/sys mountPropagation: HostToContainer name: sys readOnly: true - mountPath: /host/root mountPropagation: HostToContainer name: root readOnly: true tolerations: - effect: NoExecute operator: Exists volumes: - hostPath: path: /sys name: sys - hostPath: path: / name: root --- apiVersion: traefik.containo.us/v1alpha1 kind: IngressRoute metadata: labels: app: server-prometheus platform: monitoring name: mntr-server-prometheus namespace: monitoring spec: entryPoints: - websecure routes: - kind: Rule match: Host(`prometheus.deimos.cloud`) || Host(`prm.deimos.cloud`) services: - kind: Service name: mntr-server-prometheus port: 443 scheme: http tls: secretName: wildcard.deimos.cloud-tls --- apiVersion: traefik.containo.us/v1alpha1 kind: IngressRoute metadata: labels: app: ui-grafana platform: monitoring name: mntr-ui-grafana namespace: monitoring spec: entryPoints: - websecure routes: - kind: Rule match: Host(`grafana.deimos.cloud`) || Host(`gfn.deimos.cloud`) services: - kind: Service name: mntr-ui-grafana port: 443 scheme: http tls: secretName: wildcard.deimos.cloud-tls
Corroborate that the resource's
names
have been changed correctly by Kustomize and that they appear where they should. In particular, be sure that theserver-prometheus
andui-grafana
services' names have themntr-
prefix in theIngressRoute
resources, prefix that was added in their respective previous guides. -
After validating the Kustomize output, you can apply the yaml on your K3s cluster.
$ kubectl apply -k $HOME/k8sprjs/monitoring
-
Right after executing the previous command, remember that you can monitor its progress with
kubectl
(in a different shell).$ watch kubectl -n monitoring get pvc,cm,secret,deployment,replicaset,statefulset,pod,svc
Notice in the command above that I've omitted the parameter
pv
(for showing persistent volumes) that I've used in other guides. This is because the persistent volumes are not namespaced, sokubectl get
with thepv
option would show all the existing ones, which includes those of the Nextcloud and Gitea platforms that you may have running at this point. This would make the output print with too many lines to see in one screen, as it happened to me.The output of this
watch kubectl
command should end being similar as the following.Every 2,0s: kubectl -n monitoring get pvc,cm,secret,deployment,replicaset,statefulset,pod,svc jv11dev: Tue Jun 14 13:32:33 2022 NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE persistentvolumeclaim/mntr-data-server-prometheus Bound mntr-data-prometheus 9800M RWO local-path 50s persistentvolumeclaim/mntr-data-ui-grafana Bound mntr-data-grafana 1900M RWO local-path 50s NAME DATA AGE configmap/kube-root-ca.crt 1 52s configmap/mntr-server-prometheus-58dk66cf6g 2 51s NAME TYPE DATA AGE secret/mntr-agent-kube-state-metrics-token-dmddr kubernetes.io/service-account-token 3 51s secret/default-token-bhc66 kubernetes.io/service-account-token 3 51s NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/mntr-agent-kube-state-metrics 1/1 1 1 50s NAME DESIRED CURRENT READY AGE replicaset.apps/mntr-agent-kube-state-metrics-6b6f798cbf 1 1 1 50s NAME READY AGE statefulset.apps/mntr-server-prometheus 1/1 50s statefulset.apps/mntr-ui-grafana 1/1 50s NAME READY STATUS RESTARTS AGE pod/mntr-agent-prometheus-node-exporter-n4sgc 1/1 Running 0 49s pod/mntr-agent-prometheus-node-exporter-lwrm6 1/1 Running 0 49s pod/mntr-server-prometheus-0 1/1 Running 0 49s pod/mntr-agent-prometheus-node-exporter-jtdtf 1/1 Running 0 49s pod/mntr-agent-kube-state-metrics-6b6f798cbf-ck6l4 1/1 Running 0 50s pod/mntr-ui-grafana-0 1/1 Running 0 49s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/mntr-agent-kube-state-metrics ClusterIP None <none> 8080/TCP,8081/TCP 51s service/mntr-agent-prometheus-node-exporter ClusterIP 10.43.229.121 <none> 9100/TCP 51s service/mntr-server-prometheus ClusterIP 10.43.101.126 <none> 443/TCP 51s service/mntr-ui-grafana ClusterIP 10.43.23.222 <none> 443/TCP 51s
BEWARE!
Notice how in my output above, the secret corresponding to the wildcard certificate (wildcard.deimos.cloud-tls
) is not present yet in themonitoring
namespace. Until it is, you won't be able to reach your Prometheus and Grafana web interfaces, you'll get anInternal Server Error
if you try to browse to them. That error happens within the Traefik service, because it tries to use a secret that's not available where is expected.
At this point, you need to update your wildcard certificate's secret exactly as you did for the Nextcloud's's or Gitea's deployments. So, execute the cert-manager command with kubectl
as follows.
$ kubectl cert-manager -n certificates renew wildcard.deimos.cloud-tls
Manually triggered issuance of Certificate certificates/wildcard.deimos.cloud-tls
After a moment, the certificate's secret should be updated, and Reflector will replicate the secret into your monitoring
namespace. Check it out with kubectl
.
$ kubectl -n monitoring get secrets
NAME TYPE DATA AGE
mntr-agent-kube-state-metrics-token-dmddr kubernetes.io/service-account-token 3 24h
default-token-bhc66 kubernetes.io/service-account-token 3 24h
wildcard.deimos.cloud-tls kubernetes.io/tls 3 12m
Above, you can see how the wildcard.deimos.cloud-tls
secret has the most recent AGE
of all the secrets present in my monitoring
namespace.
With the certificate in place and the whole deployment done, you can try browsing to the web interface of your Prometheus server and finish its setup. In this guide's case, you would browse to https://prometheus.deimos.cloud
, accept the risk of the untrusted certificate (your wildcard one), and reach a page like the one below.
The page you reach is the Graph
section of the Prometheus web interface. Graph
is where you can make manual queries about any stats Prometheus has stored. Give Prometheus some time (about five minutes) to find and connect with the Prometheus-compatible endpoints currently available in your K3s cluster. Then, unfold the Status
menu and click on Targets
.
The Targets
page lists all the Prometheus-compatible endpoints found in your Kubernetes cluster, which are the ones defined in the Prometheus configuration. Remember that a bunch of these stats are from endpoints declared in Service
resources you annotated with prometheus.io
tags. This page also shows the status of each detected endpoint and their related labels.
As you've seen, the interface is rather simple and essentially for read-only operations. Since querying manually about the statistics in your Kubernetes cluster can be cumbersome, it's better to use a more graphical interface like Grafana to get a more user-friendly representation of all those statistics that Prometheus gets from your cluster.
Grafana is running in your K3s cluster yet needs some configuring, so let's get to it.
-
Browse to it's URL, which in this guide is
https://grafana.deimos.cloud
. Accept the risk of your wildcard certificate and then you should be automatically redirected to the login page. -
Enter
admin
as username and also as password. Right after login you'll be asked to change the password. Do it or skip this step altogether. -
At this point you'll have reached your Grafana's Home dashboard.
This dashboard is essentially empty, since you don't have any data source connected nor any dashboard created.
The very first thing you must configure is the connection to a datasource from which Grafana can get data to show. In this case you'll connect with your Prometheus server.
-
In the options available on the left bar, hover over the
Configuration
option to unfold it.The very first option in the list is the one you were looking for,
Data sources
. -
Click on
Data sources
to reach the corresponding configuration page. -
Press the
Add data source
button. The first thing you'll see is a list of data source types to choose from.See how the very first option offered happens to be Prometheus.
-
Choose the Prometheus option and you'll reach the form page below.
Notice that there are two tabs on this page, and that you're in the
Settings
one. -
Remain in the
Settings
tab and fill the form as indicated next.-
Name
: put something significant here, likePrometheus Deimos Cloud server
. -
HTTP
section:-
URL
: here you must specify the internal FQDN of your PrometheusService
resource, which in this guide ismntr-server-prometheus.monitoring.svc.deimos.cluster.io
, and concatenate to it the443
port. Since within the cluster your Prometheus server answers to HTTP requests, the full url is as shown next:http://mntr-server-prometheus.monitoring.svc.deimos.cluster.io:443
BEWARE!
You might be thinking that your Prometheus server is configured to listen in the9090
port, but remember that you'll connect to itsService
resource which is configured to listen in the443
port and reroute traffic to the9090
port.
Also notice how the protocol specified in the url above ishttp
, although443
is the default port forhttps
. You do this because in this case you're using the443
port just as a regularhttp
port for that service within the internal cluster networking. The HTTPS part is being taken care of by the related TraefikIngressRoute
you declared for handling only external connections to Prometheus.
-
Leave all the rest of fields in the form with their default values.
-
-
Go to the bottom of the form and click on
Save & test
.Right after pressing the button you should see above the buttons line a success message.
Now you have an active Prometheus data source, but you still need a dashboard to visualize the data it provides in Grafana.
-
Return to the top of your Prometheus data source form and click on the
Dashboards
tab.You'll get to see the following list of dashboards.
-
Since you're using a Prometheus server of the branch 2.x, choose the
Prometheus 2.0 stats
from the dashboards list by pressing on the correspondingImport
button.The action should be immediate, and the item will switch its
Import
button for aRe-import
one as a result. -
Go to
Dashboards
>Browse
.In this page you'll find listed your newly imported
Prometheus 2.0 Stats
dashboard. -
Click on the
Prometheus 2.0 Stats
one to enter into the following dashboard.As you can see, this dashboard only manages to show the
scrape duration
statistics and nothing else, but you can try and edit every other block to make them show what they're supposed to display or some other data.
Since it's not the intention of this guide series to go as deep as explaining how any of the deployed applications work, I'll leave to you to discover how to import other dashboards or even configure your own custom ones. A good starting point would be the official "marketplace" that Grafana has for them.
As you've seen, a basic installation of Prometheus doesn't have any kind of security. If you want to enforce login with a user, you can do as you already did when you configured the access to the Traefik web dashboard in the G031 guide: by enabling a basic auth login directly in the IngressRoute of your Prometheus server.
This will protect a bit the external accesses to your Prometheus dashboard, while it won't affect the connections through the internal networking of your cluster. To enforce more advanced security methods, you'll have to check out the official Prometheus documentation and see what security options are available.
Unlike Prometheus, Grafana already comes with an integrated user authentication and management system. You can find its page in Configuration
> Users
.
Click on the Users
option and you'll reach the users management page of your Grafana setup.
See that there's only the admin
user you've used before, so it would be better if you created at least another one with lesser privileges to use it as your regular user.
You can find the Kustomize project for this Monitoring stack deployment in the following attached folder.
k8sprjs/monitoring
$HOME/k8sprjs/monitoring
$HOME/k8sprjs/monitoring/resources
$HOME/k8sprjs/monitoring/kustomization.yaml
$HOME/k8sprjs/monitoring/resources/data-grafana.persistentvolume.yaml
$HOME/k8sprjs/monitoring/resources/data-prometheus.persistentvolume.yaml
$HOME/k8sprjs/monitoring/resources/monitoring.clusterrolebinding.yaml
$HOME/k8sprjs/monitoring/resources/monitoring.clusterrole.yaml
$HOME/k8sprjs/monitoring/resources/monitoring.namespace.yaml
<< Previous (G035. Deploying services 04. Monitoring stack Part 5) | +Table Of Contents+ | Next (G036. Host and K3s cluster) >>