Skip to content

Commit

Permalink
Arc: upgrade node-exporter (#982)
Browse files Browse the repository at this point in the history
[comment]: # (Note that your PR title should follow the conventional
commit format: https://conventionalcommits.org/en/v1.0.0/#summary)
# PR Description
- Upgrade node-exporter image: 1.7.0 -> 1.8.2 and corresponding chart:
4.26.0 -> 2.39.0
- Update README for dependent charts upgrades to latest instructions for
node-exporter and kube-state-metrics since before it was the 1P way
- Update README for releases to include step to cleanup trivy ignore
file

# Validation
- Tested upgrade scenario from old chart to new
- Node-exporter charts are working and no errors in the logs

---------

Co-authored-by: Vishwanath <[email protected]>
  • Loading branch information
gracewehner and vishiy authored Sep 26, 2024
1 parent d3bf1af commit ef4f473
Show file tree
Hide file tree
Showing 11 changed files with 195 additions and 72 deletions.
8 changes: 4 additions & 4 deletions internal/docs/BUILDANDRELEASE.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,7 @@ Each merge into `main` will push the image to the public mcr and deploy to the d
## Release Process
- **PR 1**: Bump the version in the VERSION file following semantic versioning.
- Clean the `.trivyignore` file. Run the scan on the image using the Github Action and add back in the still existing vulnerabilities to the file.
- Add the latest `addon-token-adapter-linux` and `addon-token-adapter-windows` versions in the values-template.yaml file by checking the version [here](https://msazure.visualstudio.com/CloudNativeCompute/_git/aks-rp?path=%2Fccp%2Fcharts%2Fkube-control-plane%2Ftemplates%2F_images.tpl&_a=contents&version=GBmaster).
- If you know your PR with the last feature changes will be the last one before the release, you can do this then.
- **Build 1**: The `values.yaml` and `Chart.yaml` templates for the HELM chart will automatically be replaced with the image tag and the HELM chart version during the CI/CD build.
Expand All @@ -75,9 +76,8 @@ Each merge into `main` will push the image to the public mcr and deploy to the d
- Once pushed, you can manually start the `Deploy to prod clusters` stage to deploy the image to our prod clusters.
- **E2E Conformance Tests**: Ask for our conformance tests to be run in the [Arc Conformance teams channel](https://teams.microsoft.com/l/channel/19%3arlnJ5tIxEMP-Hhe-pRPPp9C6iYQ1CwAelt4zTqyC_NI1%40thread.tacv2/General?groupId=a077ab34-99ea-490c-b204-358d31c24fbe&tenantId=72f988bf-86f1-41af-91ab-2d7cd011db47). Follow the instructions in the [Arc test README](../../otelcollector/test/arc-conformance/README.md#testing-on-the-arc-conformance-matrix).
- **PR 2**: Get the chart semver or container image tag from the commit used for **Build 1** and update the release notes with the changelog. Link to a similar PR [here](https://github.com/Azure/prometheus-collector/pull/298)
- **PR 3**: Make a PR to update the [Geneva docs](https://msazure.visualstudio.com/One/_git/EngSys-MDA-GenevaDocs?path=%2Fdocumentation%2Fmetrics%2FPrometheus&version=GBmaster&_a=contents) with any changes made in `/otelcollector/deploy/eng.ms/docs/Prometheus`
- **PR 4**: Make changes in AgentBaker for this new image version. Link to similar PR [here](https://github.com/Azure/AgentBaker/pull/2285/files)
- **PR 5**: Update prometheus-addon image in AKS-RP.
- **PR 3**: Make changes in AgentBaker for this new image version. Link to similar PR [here](https://github.com/Azure/AgentBaker/pull/2285/files)
- **PR 4**: Update prometheus-addon image in AKS-RP.
First update the files here - https://msazure.visualstudio.com/DefaultCollection/CloudNativeCompute/_git/aks-rp?path=/toolkit/versioning/manifests/addon/azure-monitor-metrics/azure-monitor-metrics-linux.yaml
https://msazure.visualstudio.com/DefaultCollection/CloudNativeCompute/_git/aks-rp?path=/toolkit/versioning/manifests/addon/azure-monitor-metrics/azure-monitor-metrics-windows.yaml
https://msazure.visualstudio.com/DefaultCollection/CloudNativeCompute/_git/aks-rp?path=/toolkit/versioning/manifests/addon/azure-monitor-metrics/azure-monitor-metrics-ksm.yaml
Expand All @@ -89,5 +89,5 @@ https://msazure.visualstudio.com/CloudNativeCompute/_git/aks-rp?path=/toolkit/ve
- To generate snapshots(required when you update the image and/or chart) –
- [Re-Render Test Snapshots](https://msazure.visualstudio.com/CloudNativeCompute/_git/aks-rp?path=/ccp/charts/tests/addon-adapter-charts&version=GBmaster&_a=contents&anchor=re-render-test-snapshots)
- [Re-Render Addon Chart Snapshots](https://msazure.visualstudio.com/CloudNativeCompute/_git/aks-rp?path=/ccp/charts/tests/addon-charts/README.md&version=GBmaster&_a=contents)
- **PR 6**: Toggle Monitoring clusters for Control Plane image. Link to similar PR [here](https://msazure.visualstudio.com/DefaultCollection/CloudNativeCompute/_git/aks-rp/pullrequest/10083525?_a=files)
- **PR 5**: Toggle Monitoring clusters for Control Plane image. Link to similar PR [here](https://msazure.visualstudio.com/DefaultCollection/CloudNativeCompute/_git/aks-rp/pullrequest/10083525?_a=files)
- **Arc**: Start Arc release to Canary regions. The new version will be automatically deployed to each region batch every 24 hours.
44 changes: 22 additions & 22 deletions internal/docs/DEPENDENTCHARTS.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,10 @@
- **Main branch builds:** ![Builds on main branch](https://github.com/Azure/prometheus-collector/actions/workflows/build-and-push-image-and-chart.yml/badge.svg?branch=main&event=push)

- **PR builds:** ![PRs to main branch](https://github.com/Azure/prometheus-collector/actions/workflows/build-and-push-image-and-chart.yml/badge.svg?branch=main&event!=push)


# Instructions for taking newer versions for our dependent charts

We have dependency on kube-state-metrics and prometheus-node-exporter external charts. The source for both the dependent charts are under otelcollector/deploy/dependentcharts in respective folders.
We have dependency on `kube-state-metrics` and `prometheus-node-exporter` external charts. The source for both the dependent charts are under otelcollector/deploy/dependentcharts in respective folders.

We will take periodic updated charts (and images) for these dependencies. Below is the outline for steps involved in updating these dependencies to a later version. MSFT OSS Upstream team will produce safe images for each release for the above 2 projects. We (Container Insights team) will consume that image and produce charts for these 2 projects.

#### Step 1 : Check and look for updated versions in the below repos for these charts -
## Check for updated versions in the below repos for these charts
- [Kube-state-metrics](https://github.com/prometheus-community/helm-charts/blob/main/charts/kube-state-metrics)
- [Prometheus node exporter](https://github.com/prometheus-community/helm-charts/blob/main/charts/prometheus-node-exporter/)

Expand All @@ -18,19 +14,23 @@ Chart version and image version are different. You can check the latest chart &

The OSS MSFT repository for kube-state-metrics is [here](https://azcuindexer.azurewebsites.net/repositories/oss/kubernetes/kube-state-metrics) and for Prometheus-node-exporter is [here](https://azcuindexer.azurewebsites.net/repositories/oss/prometheus/node-exporter). Ensure that the image tag used in the chart, is indeed available in MSFT OSS container repository for the corresponding chart.

After taking the latest chart(s), only change required is changing the default value for `image.repository` in values.yaml to the below -

kube-state-metrics : `mcr.microsoft.com/oss/prometheus/node-exporter`
prometheus-node-exporter : `mcr.microsoft.com/oss/kubernetes/kube-state-metrics`
## Node Exporter Chart
This chart is only used for our Arc agent. AKS now handles and owns node-exporter installation/upgrades.

#### Step 2 : Create a PR for chart update only. Please keep this PR seperate from other changes.
#### Step 3 : After PR is approved and merged, trigger chart build & push thru the action `build-and-push-dependent-helm-charts`. The parameter is 1 chart name. i.e `prometheus-node-exporter` or `kube-state-metrics` depending on what is being updated/refreshed. If you want to update both, you would trigger this action twice (one after another). Currently, these charts will be packaged and pushed to our cidev ACR repository, which will be reconciled with MCR.
#### Step 4 : Update 'build-and-push-image-and-chart' workflow to scan for the updated images thru trivy
#### Step 5 : Once dependent chart(s) is/are packaged and pushed to our mcr, update our Prometheus collector charts' Chart-template.yaml with the correct chart version(s) for the dependency(ies) updated, and creatre a PR.


>Update the following variables in the [release pipeline](https://github-private.visualstudio.com/azure/_releaseDefinition?definitionId=79&_a=definition-variables)
> a. KSMChartTag - with the new version
> b. NEChartTag - with the new version
> c. PushNewKSMChart - to true for the said release (remember to set it back to false after the release is done!)
> d. PushNewNEChart - to true for the said release (remember to set it back to false after the release is done!)
1. Create a branch for chart update only. Copy the new node-exporter chart into `otelcollector/deploy/dependentcharts/prometheus-node-exporter`.
2. Trigger chart build & push through the Github action `build-and-push-dependent-helm-charts`. The parameter is 1 chart name. i.e `prometheus-node-exporter`. Currently, these charts will be packaged and pushed to our cidev ACR repository, which will be reconciled with MCR. The image tag will be the chart version in the branch
3. Once the chart is pushed to MCR, update `otelcollector/deploy/addon-chart/azure-monitor-metrics-addon/Chart-template.yaml` with the correct dependent chart version. Include this in the branch and PR with the node-exporter chart changes.
4. Test that upgrading the helm chart from the existing version to the new one succeeds without conflicts. You may need to revert the `selector labels` if these were changed.
5. Update the following variables in the [release pipeline](https://github-private.visualstudio.com/azure/_releaseDefinition?definitionId=79&_a=definition-variables):
- `NEChartTag` - with the new chart version
- `PushNewNEChart` - to true for the said release (remember to set it back to false after the release is done!)

## Kube-State-Metrics Chart
`Kube-state-metrics` is now included directly in our chart templates so that it replicates what we have in the AKS-RP repo. This will be used by both AKS and Arc.

The relevant files are prefixed with `ama-metrics-ksm` in `otelcollector/deploy/addon-chart/azure-monitor-metrics-addon/templates`.

1. Check if any changes in the `kube-state-metrics` chart are relevant to be added to our templates. Selector labels should not be changed to prevent upgrade issues.
2. Change the `KubeStateMetrics.ImageTag` value to the corresponding version to the chart in `otelcollector/deploy/addon-chart/azure-monitor-metrics-addon/values-template.yaml`. This tag is different from the chart version.
3. Create a PR with these template changes.
4. Test and release as usual.
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ version: ${IMAGE_TAG}
appVersion: "${IMAGE_TAG}"
dependencies:
- name: prometheus-node-exporter
version: "4.26.0"
version: "4.39.0"
repository: oci://${MCR_REGISTRY}${MCR_REPOSITORY_HELM_DEPENDENCIES}
condition: AzureMonitorMetrics.ArcExtension

Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@ keywords:
- prometheus
- exporter
type: application
version: 4.26.0
appVersion: 1.7.0
version: 4.39.0
appVersion: 1.8.2
home: https://github.com/prometheus/node_exporter/
sources:
- https://github.com/prometheus/node_exporter/
Expand Down

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -43,8 +43,8 @@ app.kubernetes.io/part-of: {{ include "prometheus-node-exporter.name" . }}
{{- with .Chart.AppVersion }}
app.kubernetes.io/version: {{ . | quote }}
{{- end }}
{{- with .Values.podLabels }}
{{ toYaml . }}
{{- with .Values.commonLabels }}
{{ tpl (toYaml .) $ }}
{{- end }}
{{- if .Values.releaseLabel }}
release: {{ .Release.Name }}
Expand Down Expand Up @@ -183,3 +183,20 @@ labelNameLengthLimit: {{ . }}
labelValueLengthLimit: {{ . }}
{{- end }}
{{- end }}

{{/* Sets sidecar volumeMounts */}}
{{- define "prometheus-node-exporter.sidecarVolumeMounts" -}}
{{- range $_, $mount := $.Values.sidecarVolumeMount }}
- name: {{ $mount.name }}
mountPath: {{ $mount.mountPath }}
readOnly: {{ $mount.readOnly }}
{{- end }}
{{- range $_, $mount := $.Values.sidecarHostVolumeMounts }}
- name: {{ $mount.name }}
mountPath: {{ $mount.mountPath }}
readOnly: {{ $mount.readOnly }}
{{- if $mount.mountPropagation }}
mountPropagation: {{ $mount.mountPropagation }}
{{- end }}
{{- end }}
{{- end }}
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@ apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: {{ include "prometheus-node-exporter.fullname" . }}
namespace: {{ include "prometheus-node-exporter.namespace" . }}
labels:
{{- include "prometheus-node-exporter.labels" . | nindent 4 }}
rules:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,9 @@ spec:
{{- end }}
labels:
{{- include "prometheus-node-exporter.labels" . | nindent 8 }}
{{- with .Values.podLabels }}
{{- tpl (toYaml .) $ | nindent 8 }}
{{- end }}
spec:
automountServiceAccountToken: {{ ternary true false (or .Values.serviceAccount.automountServiceAccountToken .Values.kubeRBACProxy.enabled) }}
{{- with .Values.securityContext }}
Expand All @@ -40,6 +43,9 @@ spec:
{{- toYaml . | nindent 8 }}
{{- end }}
serviceAccountName: {{ include "prometheus-node-exporter.serviceAccountName" . }}
{{- with .Values.terminationGracePeriodSeconds }}
terminationGracePeriodSeconds: {{ . }}
{{- end }}
containers:
{{- $servicePort := ternary .Values.kubeRBACProxy.port .Values.service.port .Values.kubeRBACProxy.enabled }}
- name: node-exporter
Expand All @@ -50,7 +56,7 @@ spec:
- --path.sysfs=/host/sys
{{- if .Values.hostRootFsMount.enabled }}
- --path.rootfs=/host/root
{{- if semverCompare ">=1.4.0" (coalesce .Values.version .Values.image.tag .Chart.AppVersion) }}
{{- if semverCompare ">=1.4.0-0" (coalesce .Values.version .Values.image.tag .Chart.AppVersion) }}
- --path.udev.data=/host/root/run/udev/data
{{- end }}
{{- end }}
Expand Down Expand Up @@ -124,12 +130,24 @@ spec:
resources:
{{- toYaml . | nindent 12 }}
{{- end }}
{{- if .Values.terminationMessageParams.enabled }}
{{- with .Values.terminationMessageParams }}
terminationMessagePath: {{ .terminationMessagePath }}
terminationMessagePolicy: {{ .terminationMessagePolicy }}
{{- end }}
{{- end }}
volumeMounts:
- name: proc
mountPath: /host/proc
{{- with .Values.hostProcFsMount.mountPropagation }}
mountPropagation: {{ . }}
{{- end }}
readOnly: true
- name: sys
mountPath: /host/sys
{{- with .Values.hostSysFsMount.mountPropagation }}
mountPropagation: {{ . }}
{{- end }}
readOnly: true
{{- if .Values.hostRootFsMount.enabled }}
- name: root
Expand Down Expand Up @@ -160,24 +178,10 @@ spec:
- name: {{ .name }}
mountPath: {{ .mountPath }}
{{- end }}
{{- with .Values.sidecars }}
{{- toYaml . | nindent 8 }}
{{- if or $.Values.sidecarVolumeMount $.Values.sidecarHostVolumeMounts }}
volumeMounts:
{{- range $_, $mount := $.Values.sidecarVolumeMount }}
- name: {{ $mount.name }}
mountPath: {{ $mount.mountPath }}
readOnly: {{ $mount.readOnly }}
{{- end }}
{{- range $_, $mount := $.Values.sidecarHostVolumeMounts }}
- name: {{ $mount.name }}
mountPath: {{ $mount.mountPath }}
readOnly: {{ $mount.readOnly }}
{{- if $mount.mountPropagation }}
mountPropagation: {{ $mount.mountPropagation }}
{{- end }}
{{- end }}
{{- end }}
{{- range .Values.sidecars }}
{{- $overwrites := dict "volumeMounts" (concat (include "prometheus-node-exporter.sidecarVolumeMounts" $ | fromYamlArray) (.volumeMounts | default list) | default list) }}
{{- $defaults := dict "image" (include "prometheus-node-exporter.image" $) "securityContext" $.Values.containerSecurityContext "imagePullPolicy" $.Values.image.pullPolicy }}
- {{- toYaml (merge $overwrites . $defaults) | nindent 10 }}
{{- end }}
{{- if .Values.kubeRBACProxy.enabled }}
- name: kube-rbac-proxy
Expand All @@ -187,7 +191,7 @@ spec:
{{- end }}
- --secure-listen-address=:{{ .Values.service.port}}
- --upstream=http://127.0.0.1:{{ $servicePort }}/
- --proxy-endpoints-port=8888
- --proxy-endpoints-port={{ .Values.kubeRBACProxy.proxyEndpointsPort }}
- --config-file=/etc/kube-rbac-proxy-config/config-file.yaml
volumeMounts:
- name: kube-rbac-proxy-config
Expand All @@ -204,18 +208,34 @@ spec:
{{- if .Values.kubeRBACProxy.enableHostPort }}
hostPort: {{ .Values.service.port }}
{{- end }}
- containerPort: 8888
- containerPort: {{ .Values.kubeRBACProxy.proxyEndpointsPort }}
{{- if .Values.kubeRBACProxy.enableProxyEndpointsHostPort }}
hostPort: {{ .Values.kubeRBACProxy.proxyEndpointsPort }}
{{- end }}
name: "http-healthz"
readinessProbe:
httpGet:
scheme: HTTPS
port: 8888
port: {{ .Values.kubeRBACProxy.proxyEndpointsPort }}
path: healthz
initialDelaySeconds: 5
timeoutSeconds: 5
{{- if .Values.kubeRBACProxy.resources }}
resources:
{{ toYaml .Values.kubeRBACProxy.resources | nindent 12 }}
{{- toYaml .Values.kubeRBACProxy.resources | nindent 12 }}
{{- end }}
{{- if .Values.terminationMessageParams.enabled }}
{{- with .Values.terminationMessageParams }}
terminationMessagePath: {{ .terminationMessagePath }}
terminationMessagePolicy: {{ .terminationMessagePolicy }}
{{- end }}
{{- end }}
{{- with .Values.kubeRBACProxy.env }}
env:
{{- range $key, $value := $.Values.kubeRBACProxy.env }}
- name: {{ $key }}
value: {{ $value | quote }}
{{- end }}
{{- end }}
{{- if .Values.kubeRBACProxy.containerSecurityContext }}
securityContext:
Expand All @@ -228,6 +248,7 @@ spec:
{{- end }}
hostNetwork: {{ .Values.hostNetwork }}
hostPID: {{ .Values.hostPID }}
hostIPC: {{ .Values.hostIPC }}
{{- with .Values.affinity }}
affinity:
{{- toYaml . | nindent 8 }}
Expand All @@ -240,6 +261,9 @@ spec:
nodeSelector:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.restartPolicy }}
restartPolicy: {{ . }}
{{- end }}
{{- with .Values.tolerations }}
tolerations:
{{- toYaml . | nindent 8 }}
Expand All @@ -260,6 +284,9 @@ spec:
- name: {{ $mount.name }}
hostPath:
path: {{ $mount.hostPath }}
{{- with $mount.type }}
type: {{ . }}
{{- end }}
{{- end }}
{{- range $_, $mount := .Values.sidecarVolumeMount }}
- name: {{ $mount.name }}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,9 @@ metadata:
namespace: {{ include "prometheus-node-exporter.namespace" . }}
labels:
{{- include "prometheus-node-exporter.labels" $ | nindent 4 }}
{{- with .Values.service.labels }}
{{- toYaml . | nindent 4 }}
{{- end }}
{{- with .Values.service.annotations }}
annotations:
{{- toYaml . | nindent 4 }}
Expand All @@ -14,10 +17,16 @@ spec:
{{- if .Values.service.ipDualStack.enabled }}
ipFamilies: {{ toYaml .Values.service.ipDualStack.ipFamilies | nindent 4 }}
ipFamilyPolicy: {{ .Values.service.ipDualStack.ipFamilyPolicy }}
{{- end }}
{{- if .Values.service.externalTrafficPolicy }}
externalTrafficPolicy: {{ .Values.service.externalTrafficPolicy }}
{{- end }}
type: {{ .Values.service.type }}
{{- if and (eq .Values.service.type "ClusterIP") .Values.service.clusterIP }}
clusterIP: "{{ .Values.service.clusterIP }}"
{{- end }}
ports:
- port: {{ .Values.service.port }}
- port: {{ .Values.service.servicePort | default .Values.service.port }}
{{- if ( and (eq .Values.service.type "NodePort" ) (not (empty .Values.service.nodePort)) ) }}
nodePort: {{ .Values.service.nodePort }}
{{- end }}
Expand Down
Loading

0 comments on commit ef4f473

Please sign in to comment.