Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[kube-prometheus-stack] ServiceMonitor issues with config-reloader #4164

Closed
jkroepke opened this issue Jan 22, 2024 · 2 comments · Fixed by #4230
Closed

[kube-prometheus-stack] ServiceMonitor issues with config-reloader #4164

jkroepke opened this issue Jan 22, 2024 · 2 comments · Fixed by #4230

Comments

@jkroepke
Copy link
Member

jkroepke commented Jan 22, 2024

It looks like 0.71.0 version of Prometheus Operator starts config-reloader side-car container of Alertmanager and Prometheus with TLS enabled if TLS is enabled for the main container (e.g., using .alertmanager.alertmanagerSpec.web.tlsConfig and .prometheus.prometheusSpec.web.tlsConfig values of kube-promethues-stack Helm chart):

$ kubectl logs alertmanager-prometheus-kube-prometheus-alertmanager-0 -n monitoring -c config-reloader
d0)"
level=info ts=2024-01-22T04:21:35.063518686Z caller=main.go:138 build_context="(go=go1.21.5, platform=linux/amd64, user=Action-Run-ID-7500027263, date=20240112-09:04:19, tags=unknown)"
level=info ts=2024-01-22T04:21:35.063805761Z caller=reloader.go:246 msg="reloading via HTTP"
level=info ts=2024-01-22T04:21:35.066813295Z caller=main.go:193 msg="Starting web server for metrics" listen=:8080
level=info ts=2024-01-22T04:21:35.071097943Z caller=tls_config.go:313 msg="Listening on" address=[::]:8080
level=info ts=2024-01-22T04:21:35.071741226Z caller=tls_config.go:349 msg="TLS is enabled." http2=true address=[::]:8080
level=info ts=2024-01-22T04:21:35.074918397Z caller=reloader.go:424 msg="Reload triggered" cfg_in=/etc/alertmanager/config/alertmanager.yaml.gz cfg_out=/etc/alertmanager/config_out/alertmanager.env.yaml watched_dirs=/etc/alertmanager/config
level=info ts=2024-01-22T04:21:35.074990138Z caller=reloader.go:282 msg="started watching config file and directories for changes" cfg=/etc/alertmanager/config/alertmanager.yaml.gz out=/etc/alertmanager/config_out/alertmanager.env.yaml dirs=/etc/alertmanager/config

vs (Prometheus Operator 0.70.0)

$ kubectl logs alertmanager-prometheus-kube-prometheus-alertmanager-0 -n monitoring -c config-reloader
level=info ts=2024-01-22T05:12:45.974903978Z caller=main.go:137 msg="Starting prometheus-config-reloader" version="(version=0.70.0, branch=refs/tags/v0.70.0, revision=c2c673f7123f3745a2a982b4a2bdc43a11f50fad)"
level=info ts=2024-01-22T05:12:45.974944151Z caller=main.go:138 build_context="(go=go1.21.4, platform=linux/amd64, user=Action-Run-ID-7048794395, date=20231130-15:42:49, tags=unknown)"
level=info ts=2024-01-22T05:12:45.975229132Z caller=reloader.go:246 msg="reloading via HTTP"
level=info ts=2024-01-22T05:12:45.975562214Z caller=main.go:193 msg="Starting web server for metrics" listen=:8080
level=info ts=2024-01-22T05:12:45.975798399Z caller=tls_config.go:274 msg="Listening on" address=[::]:8080
level=info ts=2024-01-22T05:12:45.975813392Z caller=tls_config.go:277 msg="TLS is disabled." http2=false address=[::]:8080
level=info ts=2024-01-22T05:12:45.982538439Z caller=reloader.go:424 msg="Reload triggered" cfg_in=/etc/alertmanager/config/alertmanager.yaml.gz cfg_out=/etc/alertmanager/config_out/alertmanager.env.yaml watched_dirs=/etc/alertmanager/config
level=info ts=2024-01-22T05:12:45.982677219Z caller=reloader.go:282 msg="started watching config file and directories for changes" cfg=/etc/alertmanager/config/alertmanager.yaml.gz out=/etc/alertmanager/config_out/alertmanager.env.yaml dirs=/etc/alertmanager/config

But the ServiceMonitor which monitors config-reloader side-car container of Alertmanager and Prometheus uses hard-coded http scheme (https://github.com/prometheus-community/helm-charts/blob/kube-prometheus-stack-56.0.1/charts/kube-prometheus-stack/templates/alertmanager/servicemonitor.yaml#L55 and https://github.com/prometheus-community/helm-charts/blob/kube-prometheus-stack-56.0.1/charts/kube-prometheus-stack/templates/prometheus/servicemonitor.yaml#L48) which leads to respective target scraping failing and Prometheus alerts triggered (refer to AlertmanagerMembersInconsistent and AlertmanagerClusterDown Prometheus rules deployed by kube-prometheus-stack Helm chart).

If I manually change prometheus-kube-prometheus-alertmanager ServiceMonitor (deployed by kube-prometheus-stack Helm chart of 56.0.1 version) from

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  annotations:
    meta.helm.sh/release-name: prometheus
    meta.helm.sh/release-namespace: monitoring
  labels:
    app: kube-prometheus-stack-alertmanager
    app.kubernetes.io/instance: prometheus
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/part-of: kube-prometheus-stack
    app.kubernetes.io/version: 56.0.1
    chart: kube-prometheus-stack-56.0.1
    heritage: Helm
    release: prometheus
  name: prometheus-kube-prometheus-alertmanager
  namespace: monitoring
spec:
  endpoints:
    - enableHttp2: true
      path: /metrics
      port: http-web
      scheme: https
      tlsConfig:
        ca:
          secret:
            key: ca.crt
            name: alertmanager-tls
        serverName: prometheus-kube-prometheus-alertmanager.monitoring.svc
    - path: /metrics
      port: reloader-web
      scheme: http
  namespaceSelector:
    matchNames:
      - monitoring
  selector:
    matchLabels:
      app: kube-prometheus-stack-alertmanager
      release: prometheus
      self-monitor: "true"

to

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  annotations:
    meta.helm.sh/release-name: prometheus
    meta.helm.sh/release-namespace: monitoring
  labels:
    app: kube-prometheus-stack-alertmanager
    app.kubernetes.io/instance: prometheus
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/part-of: kube-prometheus-stack
    app.kubernetes.io/version: 56.0.1
    chart: kube-prometheus-stack-56.0.1
    heritage: Helm
    release: prometheus
  name: prometheus-kube-prometheus-alertmanager
  namespace: monitoring
spec:
  endpoints:
    - enableHttp2: true
      path: /metrics
      port: http-web
      scheme: https
      tlsConfig:
        ca:
          secret:
            key: ca.crt
            name: alertmanager-tls
        serverName: prometheus-kube-prometheus-alertmanager.monitoring.svc
    - path: /metrics
      port: reloader-web
      scheme: https
      tlsConfig:
        ca:
          secret:
            key: ca.crt
            name: alertmanager-tls
        serverName: prometheus-kube-prometheus-alertmanager.monitoring.svc
  namespaceSelector:
    matchNames:
      - monitoring
  selector:
    matchLabels:
      app: kube-prometheus-stack-alertmanager
      release: prometheus
      self-monitor: "true"

and prometheus-kube-prometheus-prometheus ServiceMonitor from

kind: ServiceMonitor
metadata:
  annotations:
    meta.helm.sh/release-name: prometheus
    meta.helm.sh/release-namespace: monitoring
  labels:
    app: kube-prometheus-stack-prometheus
    app.kubernetes.io/instance: prometheus
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/part-of: kube-prometheus-stack
    app.kubernetes.io/version: 56.0.1
    chart: kube-prometheus-stack-56.0.1
    heritage: Helm
    release: prometheus
  name: prometheus-kube-prometheus-prometheus
  namespace: monitoring
spec:
  endpoints:
    - path: /metrics
      port: http-web
      scheme: https
      tlsConfig:
        ca:
          secret:
            key: ca.crt
            name: prometheus-tls
        serverName: prometheus-kube-prometheus-prometheus.monitoring.svc
    - path: /metrics
      port: reloader-web
      scheme: http
  namespaceSelector:
    matchNames:
      - monitoring
  selector:
    matchLabels:
      app: kube-prometheus-stack-prometheus
      release: prometheus
      self-monitor: "true"

to

kind: ServiceMonitor
metadata:
  annotations:
    meta.helm.sh/release-name: prometheus
    meta.helm.sh/release-namespace: monitoring
  labels:
    app: kube-prometheus-stack-prometheus
    app.kubernetes.io/instance: prometheus
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/part-of: kube-prometheus-stack
    app.kubernetes.io/version: 56.0.1
    chart: kube-prometheus-stack-56.0.1
    heritage: Helm
    release: prometheus
  name: prometheus-kube-prometheus-prometheus
  namespace: monitoring
spec:
  endpoints:
    - path: /metrics
      port: http-web
      scheme: https
      tlsConfig:
        ca:
          secret:
            key: ca.crt
            name: prometheus-tls
        serverName: prometheus-kube-prometheus-prometheus.monitoring.svc
    - path: /metrics
      port: reloader-web
      scheme: https
      tlsConfig:
        ca:
          secret:
            key: ca.crt
            name: prometheus-tls
        serverName: prometheus-kube-prometheus-prometheus.monitoring.svc
  namespaceSelector:
    matchNames:
      - monitoring
  selector:
    matchLabels:
      app: kube-prometheus-stack-prometheus
      release: prometheus
      self-monitor: "true"

then my issue with failed scraping and Prometheus alerts get resolved. Unfortunately, I need these fixed ServiceMonitors to be deployed by kube-prometheus-stack Helm chart itself and I cannot implement any change of ServiceMonitors after they are deployed by CI/CD.

Originally posted by @mabrarov in #4151 (comment)

@jkroepke jkroepke changed the title It looks like 0.71.0 version of Prometheus Operator starts config-reloader side-car container of Alertmanager and Prometheus with TLS enabled if TLS is enabled for the main container (e.g., using .alertmanager.alertmanagerSpec.web.tlsConfig and .prometheus.prometheusSpec.web.tlsConfig values of kube-promethues-stack Helm chart): [kube-prometheus-stack] ServiceMonitor issues with config-reloader Jan 22, 2024
@rebeccarapthap
Copy link

We are also impacted by this issue while upgrading to 56.6.2 KPS helm chart. I see the fix is in the bugfix version 56.6.3 by when can we expect the release?

@jkroepke
Copy link
Member Author

He, there is no fix merged yet. Waiting for #4230

If merge, a release will be published anyways.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants