Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(helm): update rook-ceph group to v1.16.0 (minor) #746

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

chii-bot[bot]
Copy link
Contributor

@chii-bot chii-bot bot commented Aug 31, 2022

This PR contains the following updates:

Package Update Change
rook-ceph minor v1.9.12 -> v1.16.0
rook-ceph-cluster minor v1.9.12 -> v1.16.0
rook/ceph minor v1.9.13 -> v1.16.0

⚠ Dependency Lookup Warnings ⚠

Warnings were logged while processing this repo. Please check the Dependency Dashboard for more information.


Release Notes

rook/rook

v1.16.0

Compare Source

Upgrade Guide

To upgrade from previous versions of Rook, see the Rook upgrade guide.

Breaking Changes
  • Removed support for Ceph Quincy (v17) since it has reached end of life. Reef (v18) and Squid (v19) are the currently supported Ceph versions.
  • Rook has removed CSI network "holder" pods. If there are pods named csi-plugin-holder- in the Rook operator namespace, see the detailed documentation to disable them before upgrading to v1.16.
  • The minimum K8s version is increased to v1.27.
Features

v1.15.7

Compare Source

Improvements

Rook v1.15.7 is a patch release limited in scope and focusing on feature additions and bug fixes to the Ceph operator.

v1.15.6

Compare Source

Improvements

Rook v1.15.6 is a patch release limited in scope and focusing on feature additions and bug fixes to the Ceph operator.

v1.15.5

Compare Source

Improvements

Rook v1.15.5 is a patch release limited in scope and focusing on feature additions and bug fixes to the Ceph operator.

v1.15.4

Compare Source

Improvements

Rook v1.15.4 is a patch release limited in scope and focusing on feature additions and bug fixes to the Ceph operator.

v1.15.3

Compare Source

Improvements

Rook v1.15.3 is a patch release limited in scope and focusing on feature additions and bug fixes to the Ceph operator.

v1.15.2

Compare Source

Improvements

Rook v1.15.2 is a patch release limited in scope and focusing on feature additions and bug fixes to the Ceph operator.

v1.15.1

Compare Source

Improvements

Rook v1.15.1 is a patch release limited in scope and focusing on feature additions and bug fixes to the Ceph operator.

v1.15.0

Compare Source

Upgrade Guide

To upgrade from previous versions of Rook, see the Rook upgrade guide.

Breaking Changes

  • Minimum version of Kubernetes supported is increased to K8s v1.26.
  • During CephBlockPool updates, Rook will now return an error if an invalid device class is specified. Pools with invalid device classes may start failing until the correct device class is specified. For more details, see #​14057.
  • Rook has deprecated CSI network "holder" pods. If there are pods named csi-*plugin-holder-* in the Rook operator namespace, see the detailed documentation to disable them. This deprecation process will be required before upgrading to the future Rook v1.16.
  • Ceph COSI driver images have been updated. This impacts existing COSI Buckets, BucketClaims, and BucketAccesses. Update existing clusters following the guide here.
  • CephObjectStore, CephObjectStoreUser, and OBC endpoint behavior has changed when CephObjectStore spec.hosting configurations are set. Use the new spec.hosting.advertiseEndpoint config to define required behavior as documented.

Features

  • Added support for Ceph Squid (v19), in addition to Reef (v18) and Quincy (v17). Quincy support will be removed in Rook v1.16.
  • Ceph-CSI driver v3.12, including new options for RBD, log rotation, and updated sidecar images.
  • Allow updating the device class of OSDs, if allowDeviceClassUpdate: true is set in the CephCluster CR.
  • Allow updating the weight of an OSD, if allowOsdCrushWeightUpdate: true is set in the CephCluster CR.
  • Use fully-qualified image names (docker.io/rook/ceph) in operator manifests and helm charts.

Experimental Features

  • CephObjectStore support for keystone authentication for S3 and Swift. See the Object store documentation to configure.
  • CSI operator: CSI settings are moving to CRs managed by a new operator. Once enabled, Rook will convert the settings previously defined in the operator configmap or env vars into the new CRs managed by the CSI operator. There are two steps to enable:

v1.14.12

Compare Source

Improvements

Rook v1.14.12 is a patch release limited in scope and focusing on feature additions and bug fixes to the Ceph operator.

v1.14.11

Compare Source

Improvements

Rook v1.14.11 is a patch release limited in scope and focusing on feature additions and bug fixes to the Ceph operator.

v1.14.10

Compare Source

Improvements

Rook v1.14.10 is a patch release limited in scope and focusing on feature additions and bug fixes to the Ceph operator.

v1.14.9

Compare Source

Improvements

Rook v1.14.9 is a patch release limited in scope and focusing on feature additions and bug fixes to the Ceph operator.

v1.14.8

Compare Source

Improvements

Rook v1.14.8 is a patch release limited in scope and focusing on feature additions and bug fixes to the Ceph operator.

v1.14.7

Compare Source

What's Changed

monitoring: fix CephPoolGrowthWarning expression (#​14346, @​matofeder)
monitoring: Set honor labels on the service monitor (#​14339, @​travisn)

Full Changelog: rook/rook@v1.14.6...v1.14.7

v1.14.6

Compare Source

What's Changed

v1.14.5

Compare Source

Improvements

Rook v1.14.5 is a patch release limited in scope and focusing on feature additions and bug fixes to the Ceph operator.

v1.14.4

Compare Source

Improvements

Rook v1.14.4 is a patch release limited in scope and focusing on feature additions and bug fixes to the Ceph operator.

v1.14.3

Compare Source

Improvements

Rook v1.14.3 is a patch release limited in scope and focusing on feature additions and bug fixes to the Ceph operator.

v1.14.2

Compare Source

Improvements

Rook v1.14.2 is a patch release limited in scope and focusing on feature additions and bug fixes to the Ceph operator.

v1.14.1

Compare Source

Improvements

Rook v1.14.1 is a patch release limited in scope and focusing on feature additions and bug fixes to the Ceph operator.

v1.14.0

Compare Source

Upgrade Guide

To upgrade from previous versions of Rook, see the Rook upgrade guide.

Breaking Changes

  • The minimum supported version of Kubernetes is v1.25. Upgrade to Kubernetes v1.25 or higher before upgrading Rook.
  • The image repository and tag settings are specified separately in the helm chart values.yaml for the CSI images. Helm users previously specifying the CSI images with the image setting will need to update their values.yaml with the separate repository and tag settings.
  • Rook is beginning the process of deprecating CSI network "holder" pods. If there are pods named csi-*plugin-holder-* in the Rook operator namespace, see the holder pod deprecation documentation to disable them. Migration of affected clusters is optional for v1.14, but will be required in a future release.
  • The Rook operator config CSI_ENABLE_READ_AFFINITY was removed. v1.13 clusters that have modified this value to be "true" must set the option as desired in each CephCluster as documented here before upgrading to v1.14.

Features

  • Kubernetes versions v1.25 through v1.29 are supported. K8s v1.30 will be supported as soon as released.
  • Ceph daemon pods using the default service account now use a new rook-ceph-default service account.
  • A custom Ceph application can be applied to a CephBlockPool CR.
  • Object stores can be created with shared metadata and data pools. Isolation between object stores is enabled via RADOS namespaces. This configuration is recommended to limit the number of pools when multiple object stores are created.
  • Support for VolumeSnapshotGroup is available for the RBD and CephFS CSI drivers.
  • Support for virtual style hosting for s3 buckets is added in the CephObjectStore, by adding hosting.dnsNames to the object store.
  • A static prefix can be specified for the CSI drivers and OBC provisioner (the default prefix is the rook-ceph namespace).
  • Azure Key Vault KMS support is added for storing OSD encryption keys.
  • Additional status columns added to the kubectl output for Rook CRDs.

v1.13.10

Compare Source

Improvements

Rook v1.13.10 is a patch release limited in scope and focusing on feature additions and bug fixes to the Ceph operator.

v1.13.9

Compare Source

Improvements

Rook v1.13.9 is a patch release limited in scope and focusing on feature additions and bug fixes to the Ceph operator.

v1.13.8

Compare Source

Improvements

Rook v1.13.8 is a patch release limited in scope and focusing on feature additions and bug fixes to the Ceph operator.

v1.13.7

Compare Source

Improvements

Rook v1.13.7 is a patch release limited in scope and focusing on feature additions and bug fixes to the Ceph operator.

v1.13.6

Compare Source

Improvements

Rook v1.13.6 is a patch release limited in scope and focusing on feature additions and bug fixes to the Ceph operator.

v1.13.5

Compare Source

Improvements

Rook v1.13.5 is a patch release limited in scope and focusing on feature additions and bug fixes to the Ceph operator.

v1.13.4

Compare Source

Improvements

Rook v1.13.4 is a patch release limited in scope and focusing on feature additions and bug fixes to the Ceph operator.

v1.13.3

Compare Source

Improvements

Rook v1.13.3 is a patch release limited in scope and focusing on feature additions and bug fixes to the Ceph operator.

v1.13.2

Compare Source

Improvements

Rook v1.13.2 is a patch release limited in scope and focusing on feature additions and bug fixes to the Ceph operator.

v1.13.1

Compare Source

Improvements

Rook v1.13.1 is a patch release limited in scope and focusing on feature additions and bug fixes to the Ceph operator.

v1.13.0

Compare Source

Upgrade Guide

To upgrade from previous versions of Rook, see the Rook upgrade guide.

Breaking Changes

  • Removed support for Ceph Pacific (v16). Ceph Quincy (v17) and Ceph Reef (v18) are the only currently supported versions.
  • The minimum supported Kubernetes version is v1.23
  • The minimum supported Ceph-CSI driver is 3.9
  • The admission controller is removed. If the admission controller is enabled (it is disabled by default), it is recommended to be disabled before the upgrade. See the upgrade guide for more details.

Features

  • Added experimental cephConfig to the CephCluster CR to allow setting Ceph config options in the Ceph MON config store via the CRD. These settings supersede the ceph.conf override settings.
  • CephCSI v3.10 is now the default CSI driver version.
  • The default CephFS SubvolumeGroup has pinning enabled by default to distribute load across MDS ranks in predictable and stable ways.
  • The Ceph exporter daemon is updated to use a Ceph keyring with reduced privileges instead of the admin keyring.
  • If the host network setting changes in the CephCluster CR, the mons will now automatically failover to enable the new configuration.
  • Allow for additional advanced maintenance and troubleshooting of Ceph daemons, by respecting the label ceph.rook.io/do-not-reconcile for all Ceph daemons. This is helpful when using the debug command in the kubectl rook-ceph plugin.

v1.12.11

Compare Source

Improvements

Rook v1.12.11 is a patch release limited in scope and focusing on feature additions and bug fixes to the Ceph operator.

v1.12.10

Compare Source

Improvements

Rook v1.12.10 is a patch release limited in scope and focusing on feature additions and bug fixes to the Ceph operator.

v1.12.9

Compare Source

Improvements

Rook v1.12.9 is a patch release limited in scope and focusing on feature additions and bug fixes to the Ceph operator.

v1.12.8

Compare Source

Improvements

Rook v1.12.8 is a patch release limited in scope and focusing on feature additions and bug fixes to the Ceph operator.


Configuration

📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about these updates again.


  • If you want to rebase/retry this PR, click this checkbox.

This PR has been generated by Renovate Bot.

@chii-bot chii-bot bot requested a review from toboshii as a code owner August 31, 2022 22:21
@chii-bot chii-bot bot added renovate/container renovate/helm type/minor area/cluster Changes made in the cluster directory size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Aug 31, 2022
@chii-bot
Copy link
Contributor Author

chii-bot bot commented Aug 31, 2022

Path: cluster/core/rook-ceph/cluster/helm-release.yaml
Version: v1.9.12 -> v1.16.0

@@ -73,11 +73,25 @@
 # imagePullSecrets:
 # - name: my-registry-secret
 ---
+# Source: rook-ceph-cluster/templates/rbac.yaml
+# Service account for other components
+apiVersion: v1
+kind: ServiceAccount
+metadata:
+ name: rook-ceph-default
+ namespace: default # namespace:cluster
+ labels:
+ operator: rook
+ storage-backend: ceph
+# imagePullSecrets:
+# - name: my-registry-secret
+---
 # Source: rook-ceph-cluster/templates/configmap.yaml
 kind: ConfigMap
 apiVersion: v1
 metadata:
 name: rook-config-override
+ namespace: default # namespace:cluster
 data:
 config: |2
 [global]
@@ -96,16 +110,17 @@
 pool: ceph-blockpool
 clusterID: default
 csi.storage.k8s.io/controller-expand-secret-name: rook-csi-rbd-provisioner
- csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph
+ csi.storage.k8s.io/controller-expand-secret-namespace: 'default'
 csi.storage.k8s.io/fstype: ext4
 csi.storage.k8s.io/node-stage-secret-name: rook-csi-rbd-node
- csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph
+ csi.storage.k8s.io/node-stage-secret-namespace: 'default'
 csi.storage.k8s.io/provisioner-secret-name: rook-csi-rbd-provisioner
- csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
+ csi.storage.k8s.io/provisioner-secret-namespace: 'default'
 imageFeatures: layering
 imageFormat: "2"
 reclaimPolicy: Delete
 allowVolumeExpansion: true
+volumeBindingMode: Immediate
 ---
 # Source: rook-ceph-cluster/templates/cephfilesystem.yaml
 apiVersion: storage.k8s.io/v1
@@ -120,14 +135,15 @@
 pool: ceph-filesystem-data0
 clusterID: default
 csi.storage.k8s.io/controller-expand-secret-name: rook-csi-cephfs-provisioner
- csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph
+ csi.storage.k8s.io/controller-expand-secret-namespace: 'default'
 csi.storage.k8s.io/fstype: ext4
 csi.storage.k8s.io/node-stage-secret-name: rook-csi-cephfs-node
- csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph
+ csi.storage.k8s.io/node-stage-secret-namespace: 'default'
 csi.storage.k8s.io/provisioner-secret-name: rook-csi-cephfs-provisioner
- csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
+ csi.storage.k8s.io/provisioner-secret-namespace: 'default'
 reclaimPolicy: Delete
 allowVolumeExpansion: true
+volumeBindingMode: Immediate
 ---
 # Source: rook-ceph-cluster/templates/cephobjectstore.yaml
 apiVersion: storage.k8s.io/v1
@@ -136,6 +152,7 @@
 name: ceph-bucket
 provisioner: default.ceph.rook.io/bucket
 reclaimPolicy: Delete
+volumeBindingMode: Immediate
 parameters:
 objectStoreName: ceph-objectstore
 objectStoreNamespace: default
@@ -179,10 +196,10 @@
 namespace: default # namespace:cluster
 rules:
 # this is needed for rook's "key-management" CLI to fetch the vault token from the secret when
- # validating the connection details
+ # validating the connection details and for key rotation operations.
 - apiGroups: [""]
 resources: ["secrets"]
- verbs: ["get"]
+ verbs: ["get", "update"]
 - apiGroups: [""]
 resources: ["configmaps"]
 verbs: ["get", "list", "watch", "create", "update", "delete"]
@@ -191,23 +208,6 @@
 verbs: ["get", "list", "create", "update", "delete"]
 ---
 # Source: rook-ceph-cluster/templates/rbac.yaml
-kind: Role
-apiVersion: rbac.authorization.k8s.io/v1
-metadata:
- name: rook-ceph-rgw
- namespace: default # namespace:cluster
-rules:
- # Placeholder role so the rgw service account will
- # be generated in the csv. Remove this role and role binding
- # when fixing https://github.com/rook/rook/issues/10141.
- - apiGroups:
- - ""
- resources:
- - configmaps
- verbs:
- - get
----
-# Source: rook-ceph-cluster/templates/rbac.yaml
 # Aspects of ceph-mgr that operate within the cluster's namespace
 kind: Role
 apiVersion: rbac.authorization.k8s.io/v1
@@ -242,9 +242,31 @@
 - apiGroups:
 - ceph.rook.io
 resources:
- - "*"
+ - cephclients
+ - cephclusters
+ - cephblockpools
+ - cephfilesystems
+ - cephnfses
+ - cephobjectstores
+ - cephobjectstoreusers
+ - cephobjectrealms
+ - cephobjectzonegroups
+ - cephobjectzones
+ - cephbuckettopics
+ - cephbucketnotifications
+ - cephrbdmirrors
+ - cephfilesystemmirrors
+ - cephfilesystemsubvolumegroups
+ - cephblockpoolradosnamespaces
+ - cephcosidrivers
 verbs:
- - "*"
+ - get
+ - list
+ - watch
+ - create
+ - update
+ - delete
+ - patch
 - apiGroups:
 - apps
 resources:
@@ -339,102 +361,6 @@
 - update
 ---
 # Source: rook-ceph-cluster/templates/rbac.yaml
-apiVersion: rbac.authorization.k8s.io/v1
-kind: RoleBinding
-metadata:
- name: rook-ceph-default-psp
- namespace: default # namespace:cluster
- labels:
- operator: rook
- storage-backend: ceph
- app.kubernetes.io/part-of: rook-ceph-operator
- app.kubernetes.io/managed-by: Helm
- app.kubernetes.io/created-by: helm
-roleRef:
- apiGroup: rbac.authorization.k8s.io
- kind: ClusterRole
- name: psp:rook
-subjects:
- - kind: ServiceAccount
- name: default
- namespace: default # namespace:cluster
----
-# Source: rook-ceph-cluster/templates/rbac.yaml
-apiVersion: rbac.authorization.k8s.io/v1
-kind: RoleBinding
-metadata:
- name: rook-ceph-osd-psp
- namespace: default # namespace:cluster
-roleRef:
- apiGroup: rbac.authorization.k8s.io
- kind: ClusterRole
- name: psp:rook
-subjects:
- - kind: ServiceAccount
- name: rook-ceph-osd
- namespace: default # namespace:cluster
----
-# Source: rook-ceph-cluster/templates/rbac.yaml
-apiVersion: rbac.authorization.k8s.io/v1
-kind: RoleBinding
-metadata:
- name: rook-ceph-rgw-psp
- namespace: default # namespace:cluster
-roleRef:
- apiGroup: rbac.authorization.k8s.io
- kind: ClusterRole
- name: psp:rook
-subjects:
- - kind: ServiceAccount
- name: rook-ceph-rgw
- namespace: default # namespace:cluster
----
-# Source: rook-ceph-cluster/templates/rbac.yaml
-apiVersion: rbac.authorization.k8s.io/v1
-kind: RoleBinding
-metadata:
- name: rook-ceph-mgr-psp
- namespace: default # namespace:cluster
-roleRef:
- apiGroup: rbac.authorization.k8s.io
- kind: ClusterRole
- name: psp:rook
-subjects:
- - kind: ServiceAccount
- name: rook-ceph-mgr
- namespace: default # namespace:cluster
----
-# Source: rook-ceph-cluster/templates/rbac.yaml
-apiVersion: rbac.authorization.k8s.io/v1
-kind: RoleBinding
-metadata:
- name: rook-ceph-cmd-reporter-psp
- namespace: default # namespace:cluster
-roleRef:
- apiGroup: rbac.authorization.k8s.io
- kind: ClusterRole
- name: psp:rook
-subjects:
- - kind: ServiceAccount
- name: rook-ceph-cmd-reporter
- namespace: default # namespace:cluster
----
-# Source: rook-ceph-cluster/templates/rbac.yaml
-apiVersion: rbac.authorization.k8s.io/v1
-kind: RoleBinding
-metadata:
- name: rook-ceph-purge-osd-psp
- namespace: default # namespace:cluster
-roleRef:
- apiGroup: rbac.authorization.k8s.io
- kind: ClusterRole
- name: psp:rook
-subjects:
- - kind: ServiceAccount
- name: rook-ceph-purge-osd
- namespace: default # namespace:cluster
----
-# Source: rook-ceph-cluster/templates/rbac.yaml
 # Allow the operator to create resources in this cluster's namespace
 kind: RoleBinding
 apiVersion: rbac.authorization.k8s.io/v1
@@ -467,22 +393,6 @@
 namespace: default # namespace:cluster
 ---
 # Source: rook-ceph-cluster/templates/rbac.yaml
-# Allow the rgw pods in this namespace to work with configmaps
-kind: RoleBinding
-apiVersion: rbac.authorization.k8s.io/v1
-metadata:
- name: rook-ceph-rgw
- namespace: default # namespace:cluster
-roleRef:
- apiGroup: rbac.authorization.k8s.io
- kind: Role
- name: rook-ceph-rgw
-subjects:
- - kind: ServiceAccount
- name: rook-ceph-rgw
- namespace: default # namespace:cluster
----
-# Source: rook-ceph-cluster/templates/rbac.yaml
 # Allow the ceph mgr to access resources scoped to the CephCluster namespace necessary for mgr modules
 kind: RoleBinding
 apiVersion: rbac.authorization.k8s.io/v1
@@ -582,6 +492,7 @@
 kind: Ingress
 metadata:
 name: default-dashboard
+ namespace: default # namespace:cluster
 spec:
 rules:
 - host: rook.${SECRET_DOMAIN}
@@ -599,11 +510,14 @@
 - hosts:
 - rook.${SECRET_DOMAIN}
 ---
+
+---
 # Source: rook-ceph-cluster/templates/cephblockpool.yaml
 apiVersion: ceph.rook.io/v1
 kind: CephBlockPool
 metadata:
 name: ceph-blockpool
+ namespace: default # namespace:cluster
 spec:
 failureDomain: host
 replicated:
@@ -614,12 +528,13 @@
 kind: CephCluster
 metadata:
 name: default
+ namespace: default # namespace:cluster
 spec:
 monitoring:
 enabled: true
 cephVersion:
 allowUnsupported: false
- image: quay.io/ceph/ceph:v16.2.10
+ image: quay.io/ceph/ceph:v19.2.0
 cleanupPolicy:
 allowUninstallWithVolumes: false
 confirmation: ""
@@ -636,8 +551,6 @@
 urlPrefix: /
 dataDirHostPath: /var/lib/rook
 disruptionManagement:
- machineDisruptionBudgetNamespace: openshift-machine-api
- manageMachineDisruptionBudgets: false
 managePodBudgets: true
 osdMaintenanceTimeout: 30
 pgHealthCheckTimeout: 0
@@ -659,16 +572,24 @@
 disabled: false
 osd:
 disabled: false
+ logCollector:
+ enabled: true
+ maxLogSize: 500M
+ periodicity: daily
 mgr:
 allowMultiplePerNode: false
 count: 2
- modules:
- - enabled: true
- name: pg_autoscaler
+ modules: null
 mon:
 allowMultiplePerNode: false
 count: 3
 network:
+ connections:
+ compression:
+ enabled: false
+ encryption:
+ enabled: false
+ requireMsgr2: false
 provider: host
 priorityClassNames:
 mgr: system-cluster-critical
@@ -678,49 +599,48 @@
 resources:
 cleanup:
 limits:
- cpu: 500m
 memory: 1Gi
 requests:
 cpu: 500m
 memory: 100Mi
 crashcollector:
 limits:
- cpu: 500m
 memory: 60Mi
 requests:
 cpu: 100m
 memory: 60Mi
+ exporter:
+ limits:
+ memory: 128Mi
+ requests:
+ cpu: 50m
+ memory: 50Mi
 logcollector:
 limits:
- cpu: 500m
 memory: 1Gi
 requests:
 cpu: 100m
 memory: 100Mi
 mgr:
 limits:
- cpu: 1000m
 memory: 1Gi
 requests:
 cpu: 500m
 memory: 512Mi
 mgr-sidecar:
 limits:
- cpu: 500m
 memory: 100Mi
 requests:
 cpu: 100m
 memory: 40Mi
 mon:
 limits:
- cpu: 2000m
 memory: 2Gi
 requests:
 cpu: 1000m
 memory: 1Gi
 osd:
 limits:
- cpu: 2000m
 memory: 4Gi
 requests:
 cpu: 1000m
@@ -747,6 +667,7 @@
 name: k8s-worker03
 useAllDevices: false
 useAllNodes: false
+ upgradeOSDRequiresHealthyPGs: false
 waitTimeoutForHealthyOSDInMinutes: 10
 ---
 # Source: rook-ceph-cluster/templates/cephfilesystem.yaml
@@ -754,6 +675,7 @@
 kind: CephFilesystem
 metadata:
 name: ceph-filesystem
+ namespace: default # namespace:cluster
 spec:
 dataPools:
 - failureDomain: host
@@ -769,37 +691,55 @@
 priorityClassName: system-cluster-critical
 resources:
 limits:
- cpu: 2000m
 memory: 4Gi
 requests:
 cpu: 1000m
 memory: 4Gi
 ---
+# Source: rook-ceph-cluster/templates/cephfilesystem.yaml
+apiVersion: ceph.rook.io/v1
+kind: CephFilesystemSubVolumeGroup
+metadata:
+ name: ceph-filesystem-csi # lets keep the svg crd name same as `filesystem name + csi` for the default csi svg
+ namespace: default # namespace:cluster
+spec:
+ # The name of the subvolume group. If not set, the default is the name of the subvolumeGroup CR.
+ name: csi
+ # filesystemName is the metadata name of the CephFilesystem CR where the subvolume group will be created
+ filesystemName: ceph-filesystem
+ # reference https://docs.ceph.com/en/latest/cephfs/fs-volumes/#pinning-subvolumes-and-subvolume-groups
+ # only one out of (export, distributed, random) can be set at a time
+ # by default pinning is set with value: distributed=1
+ # for disabling default values set (distributed=0)
+ pinning:
+ distributed: 1 # distributed=<0, 1> (disabled=0)
+ # export: # export=<0-256> (disabled=-1)
+ # random: # random=[0.0, 1.0](disabled=0.0)
+---
 # Source: rook-ceph-cluster/templates/cephobjectstore.yaml
 apiVersion: ceph.rook.io/v1
 kind: CephObjectStore
 metadata:
 name: ceph-objectstore
+ namespace: default # namespace:cluster
 spec:
 dataPool:
 erasureCoded:
 codingChunks: 1
 dataChunks: 2
 failureDomain: host
+ parameters:
+ bulk: "true"
 gateway:
 instances: 1
 port: 80
 priorityClassName: system-cluster-critical
 resources:
 limits:
- cpu: 2000m
 memory: 2Gi
 requests:
 cpu: 1000m
 memory: 1Gi
- healthCheck:
- bucket:
- interval: 60s
 metadataPool:
 failureDomain: host
 replicated:
@@ -817,810 +757,881 @@
 namespace: default
 spec:
 # Import the raw prometheus rules since they have descriptions that should not be processed with the helm templates
- # copied from https://github.com/ceph/ceph/blob/master/monitoring/ceph-mixin/prometheus_alerts.yml
+ # Copied from https://github.com/ceph/ceph/blob/master/monitoring/ceph-mixin/prometheus_alerts.yml
+ # Attention: This is not a 1:1 copy of ceph-mixin alerts. This file contains several Rook-related adjustments.
+ # List of main adjustments:
+ # - Alerts related to cephadm are excluded
+ # - The PrometheusJobMissing alert is adjusted for the rook-ceph-mgr job, and the PrometheusJobExporterMissing alert is added
 groups:
- - name: cluster health
- rules:
- - alert: CephHealthError
- expr: ceph_health_status == 2
- for: 5m
- labels:
- severity: critical
- type: ceph_default
- oid: 1.3.6.1.4.1.50495.1.2.1.2.1
- annotations:
- summary: Cluster is in the ERROR state
- description: >
- The cluster state has been HEALTH_ERROR for more than 5 minutes. Please check "ceph health detail" for more information.
-
- - alert: CephHealthWarning
- expr: ceph_health_status == 1
- for: 15m
- labels:
- severity: warning
- type: ceph_default
- annotations:
- summary: Cluster is in the WARNING state
- description: >
- The cluster state has been HEALTH_WARN for more than 15 minutes. Please check "ceph health detail" for more information.
-
- - name: mon
+ - name: "cluster health"
 rules:
- - alert: CephMonDownQuorumAtRisk
- expr: ((ceph_health_detail{name="MON_DOWN"} == 1) LICENSE README.md Taskfile.yml cluster default docs hack mkdocs.yml scripts talos on() (count(ceph_mon_quorum_status == 1) == bool (floor(count(ceph_mon_metadata) / 2) + 1))) == 1
- for: 30s
- labels:
- severity: critical
- type: ceph_default
- oid: 1.3.6.1.4.1.50495.1.2.1.3.1
+ - alert: "CephHealthError"
 annotations:
- documentation: https://docs.ceph.com/en/latest/rados/operations/health-checks#mon-down
- summary: Monitor quorum is at risk
- description: |
- {{ $min := query "floor(count(ceph_mon_metadata) / 2) +1" | first | value }}Quorum requires a majority of monitors (x {{ $min }}) to be active
- Without quorum the cluster will become inoperable, affecting all services and connected clients.
-
- The following monitors are down:
- {{- range query "(ceph_mon_quorum_status == 0) + on(ceph_daemon) group_left(hostname) (ceph_mon_metadata LICENSE README.md Taskfile.yml cluster default docs hack mkdocs.yml scripts talos 0)" }}
- - {{ .Labels.ceph_daemon }} on {{ .Labels.hostname }}
- {{- end }}
- - alert: CephMonDown
- expr: (count(ceph_mon_quorum_status == 0) <= (count(ceph_mon_metadata) - floor(count(ceph_mon_metadata) / 2) + 1))
- for: 30s
- labels:
- severity: warning
- type: ceph_default
- annotations:
- documentation: https://docs.ceph.com/en/latest/rados/operations/health-checks#mon-down
- summary: One or more monitors down
- description: |
- {{ $down := query "count(ceph_mon_quorum_status == 0)" | first | value }}{{ $s := "" }}{{ if gt $down 1.0 }}{{ $s = "s" }}{{ end }}There are {{ $down }} monitor{{ $s }} down.
- Quorum is still intact, but the loss of an additional monitor will make your cluster inoperable.
-
- The following monitors are down:
- {{- range query "(ceph_mon_quorum_status == 0) + on(ceph_daemon) group_left(hostname) (ceph_mon_metadata LICENSE README.md Taskfile.yml cluster default docs hack mkdocs.yml scripts talos 0)" }}
- - {{ .Labels.ceph_daemon }} on {{ .Labels.hostname }}
- {{- end }}
- - alert: CephMonDiskspaceCritical
- expr: ceph_health_detail{name="MON_DISK_CRIT"} == 1
- for: 1m
- labels:
- severity: critical
- type: ceph_default
- oid: 1.3.6.1.4.1.50495.1.2.1.3.2
- annotations:
- documentation: https://docs.ceph.com/en/latest/rados/operations/health-checks#mon-disk-crit
- summary: Filesystem space on at least one monitor is critically low
- description: |
- The free space available to a monitor's store is critically low.
- You should increase the space available to the monitor(s). The default directory
- is /var/lib/ceph/mon-*/data/store.db on traditional deployments, and under
- /var/lib/rook/mon-*/data/store.db on the mon pod's worker node for Rook.
- Look for old, rotated versions of *.log and MANIFEST*. Do NOT touch any *.sst files.
- Also check any other directories under /var/lib/rook and other directories on the
- same filesystem, often /var/log and /var/tmp are culprits. Your monitor hosts are;
- {{- range query "ceph_mon_metadata"}}
- - {{ .Labels.hostname }}
- {{- end }}
- - alert: CephMonDiskspaceLow
- expr: ceph_health_detail{name="MON_DISK_LOW"} == 1
- for: 5m
- labels:
- severity: warning
- type: ceph_default
- annotations:
- documentation: https://docs.ceph.com/en/latest/rados/operations/health-checks#mon-disk-low
- summary: Disk space on at least one monitor is approaching full
- description: |
- The space available to a monitor's store is approaching full (>70% is the default).
- You should increase the space available to the monitor(s). The default directory
- is /var/lib/ceph/mon-*/data/store.db on traditional deployments, and under
- /var/lib/rook/mon-*/data/store.db on the mon pod's worker node for Rook.
- Look for old, rotated versions of *.log and MANIFEST*. Do NOT touch any *.sst files.
- Also check any other directories under /var/lib/rook and other directories on the
- same filesystem, often /var/log and /var/tmp are culprits. Your monitor hosts are;
- {{- range query "ceph_mon_metadata"}}
- - {{ .Labels.hostname }}
- {{- end }}
- - alert: CephMonClockSkew
- expr: ceph_health_detail{name="MON_CLOCK_SKEW"} == 1
- for: 1m
- labels:
- severity: warning
- type: ceph_default
- annotations:
- documentation: https://docs.ceph.com/en/latest/rados/operations/health-checks#mon-clock-skew
- summary: Clock skew detected among monitors
- description: |
- Ceph monitors rely on closely synchronized time to maintain
- quorum and cluster consistency. This event indicates that time on at least
- one mon has drifted too far from the lead mon.
-
- Review cluster status with ceph -s. This will show which monitors
- are affected. Check the time sync status on each monitor host with
- "ceph time-sync-status" and the state and peers of your ntpd or chrony daemon.
- - name: osd
+ description: "The cluster state has been HEALTH_ERROR for more than 5 minutes. Please check 'ceph health detail' for more information."
+ summary: "Ceph is in the ERROR state"
+ expr: "ceph_health_status == 2"
+ for: "5m"
+ labels:
+ oid: "1.3.6.1.4.1.50495.1.2.1.2.1"
+ severity: "critical"
+ type: "ceph_default"
+ - alert: "CephHealthWarning"
+ annotations:
+ description: "The cluster state has been HEALTH_WARN for more than 15 minutes. Please check 'ceph health detail' for more information."
+ summary: "Ceph is in the WARNING state"
+ expr: "ceph_health_status == 1"
+ for: "15m"
+ labels:
+ severity: "warning"
+ type: "ceph_default"
+ - name: "mon"
 rules:
- - alert: CephOSDDownHigh
- expr: count(ceph_osd_up == 0) / count(ceph_osd_up) LICENSE README.md Taskfile.yml cluster default docs hack mkdocs.yml scripts talos 100 >= 10
- labels:
- severity: critical
- type: ceph_default
- oid: 1.3.6.1.4.1.50495.1.2.1.4.1
+ - alert: "CephMonDownQuorumAtRisk"
 annotations:
- summary: More than 10% of OSDs are down
- description: |
- {{ $value | humanize }}% or {{ with query "count(ceph_osd_up == 0)" }}{{ . | first | value }}{{ end }} of {{ with query "count(ceph_osd_up)" }}{{ . | first | value }}{{ end }} OSDs are down (>= 10%).
-
- The following OSDs are down:
- {{- range query "(ceph_osd_up LICENSE README.md Taskfile.yml cluster default docs hack mkdocs.yml scripts talos on(ceph_daemon) group_left(hostname) ceph_osd_metadata) == 0" }}
- - {{ .Labels.ceph_daemon }} on {{ .Labels.hostname }}
- {{- end }}
- - alert: CephOSDHostDown
- expr: ceph_health_detail{name="OSD_HOST_DOWN"} == 1
- for: 5m
- labels:
- severity: warning
- type: ceph_default
- oid: 1.3.6.1.4.1.50495.1.2.1.4.8
- annotations:
- summary: An OSD host is offline
- description: |
- The following OSDs are down:
- {{- range query "(ceph_osd_up LICENSE README.md Taskfile.yml cluster default docs hack mkdocs.yml scripts talos on(ceph_daemon) group_left(hostname) ceph_osd_metadata) == 0" }}
- - {{ .Labels.hostname }} : {{ .Labels.ceph_daemon }}
- {{- end }}
- - alert: CephOSDDown
- expr: ceph_health_detail{name="OSD_DOWN"} == 1
- for: 5m
- labels:
- severity: warning
- type: ceph_default
- oid: 1.3.6.1.4.1.50495.1.2.1.4.2
- annotations:
- documentation: https://docs.ceph.com/en/latest/rados/operations/health-checks#osd-down
- summary: An OSD has been marked down
- description: |
- {{ $num := query "count(ceph_osd_up == 0)" | first | value }}{{ $s := "" }}{{ if gt $num 1.0 }}{{ $s = "s" }}{{ end }}{{ $num }} OSD{{ $s }} down for over 5mins.
-
- The following OSD{{ $s }} {{ if eq $s "" }}is{{ else }}are{{ end }} down:
- {{- range query "(ceph_osd_up LICENSE README.md Taskfile.yml cluster default docs hack mkdocs.yml scripts talos on(ceph_daemon) group_left(hostname) ceph_osd_metadata) == 0"}}
- - {{ .Labels.ceph_daemon }} on {{ .Labels.hostname }}
- {{- end }}
- - alert: CephOSDNearFull
- expr: ceph_health_detail{name="OSD_NEARFULL"} == 1
- for: 5m
- labels:
- severity: warning
- type: ceph_default
- oid: 1.3.6.1.4.1.50495.1.2.1.4.3
- annotations:
- documentation: https://docs.ceph.com/en/latest/rados/operations/health-checks#osd-nearfull
- summary: OSD(s) running low on free space (NEARFULL)
- description: |
- One or more OSDs have reached the NEARFULL threshold
-
- Use 'ceph health detail' and 'ceph osd df' to identify the problem.
- To resolve, add capacity to the affected OSD's failure domain, restore down/out OSDs, or delete unwanted data.
- - alert: CephOSDFull
- expr: ceph_health_detail{name="OSD_FULL"} > 0
- for: 1m
- labels:
- severity: critical
- type: ceph_default
- oid: 1.3.6.1.4.1.50495.1.2.1.4.6
- annotations:
- documentation: https://docs.ceph.com/en/latest/rados/operations/health-checks#osd-full
- summary: OSD full, writes blocked
- description: |
- An OSD has reached the FULL threshold. Writes to pools that share the
- affected OSD will be blocked.
-
- Use 'ceph health detail' and 'ceph osd df' to identify the problem.
- To resolve, add capacity to the affected OSD's failure domain, restore down/out OSDs, or delete unwanted data.
- - alert: CephOSDBackfillFull
- expr: ceph_health_detail{name="OSD_BACKFILLFULL"} > 0
- for: 1m
- labels:
- severity: warning
- type: ceph_default
- annotations:
- documentation: https://docs.ceph.com/en/latest/rados/operations/health-checks#osd-backfillfull
- summary: OSD(s) too full for backfill operations
- description: "An OSD has reached the BACKFILL FULL threshold. This will prevent rebalance operations\nfrom completing. \nUse 'ceph health detail' and 'ceph osd df' to identify the problem.\n\nTo resolve, add capacity to the affected OSD's failure domain, restore down/out OSDs, or delete unwanted data.\n"
- - alert: CephOSDTooManyRepairs
- expr: ceph_health_detail{name="OSD_TOO_MANY_REPAIRS"} == 1
- for: 30s
- labels:
- severity: warning
- type: ceph_default
- annotations:
- documentation: https://docs.ceph.com/en/latest/rados/operations/health-checks#osd-too-many-repairs
- summary: OSD reports a high number of read errors
- description: |
- Reads from an OSD have used a secondary PG to return data to the client, indicating
- a potential failing disk.
- - alert: CephOSDTimeoutsPublicNetwork
- expr: ceph_health_detail{name="OSD_SLOW_PING_TIME_FRONT"} == 1
- for: 1m
- labels:
- severity: warning
- type: ceph_default
- annotations:
- summary: Network issues delaying OSD heartbeats (public network)
- description: |
- OSD heartbeats on the cluster's 'public' network (frontend) are running slow. Investigate the network
- for latency or loss issues. Use 'ceph health detail' to show the affected OSDs.
- - alert: CephOSDTimeoutsClusterNetwork
- expr: ceph_health_detail{name="OSD_SLOW_PING_TIME_BACK"} == 1
- for: 1m
- labels:
- severity: warning
- type: ceph_default
- annotations:
- summary: Network issues delaying OSD heartbeats (cluster network)
- description: |
- OSD heartbeats on the cluster's 'cluster' network (backend) are running slow. Investigate the network
- for latency or loss issues. Use 'ceph health detail' to show the affected OSDs.
- - alert: CephOSDInternalDiskSizeMismatch
- expr: ceph_health_detail{name="BLUESTORE_DISK_SIZE_MISMATCH"} == 1
- for: 1m
- labels:
- severity: warning
- type: ceph_default
+ description: "{{ $min := query \"floor(count(ceph_mon_metadata) / 2) + 1\" | first | value }}Quorum requires a majority of monitors (x {{ $min }}) to be active. Without quorum the cluster will become inoperable, affecting all services and connected clients. The following monitors are down: {{- range query \"(ceph_mon_quorum_status == 0) + on(ceph_daemon) group_left(hostname) (ceph_mon_metadata LICENSE README.md Taskfile.yml cluster default docs hack mkdocs.yml scripts talos 0)\" }} - {{ .Labels.ceph_daemon }} on {{ .Labels.hostname }} {{- end }}"
+ documentation: "https://docs.ceph.com/en/latest/rados/operations/health-checks#mon-down"
+ summary: "Monitor quorum is at risk"
+ expr: |
+ (
+ (ceph_health_detail{name="MON_DOWN"} == 1) LICENSE README.md Taskfile.yml cluster default docs hack mkdocs.yml scripts talos on() (
+ count(ceph_mon_quorum_status == 1) == bool (floor(count(ceph_mon_metadata) / 2) + 1)
+ )
+ ) == 1
+ for: "30s"
+ labels:
+ oid: "1.3.6.1.4.1.50495.1.2.1.3.1"
+ severity: "critical"
+ type: "ceph_default"
+ - alert: "CephMonDown"
 annotations:
- documentation: https://docs.ceph.com/en/latest/rados/operations/health-checks#bluestore-disk-size-mismatch
- summary: OSD size inconsistency error
 description: |
- One or more OSDs have an internal inconsistency between metadata and the size of the device.
- This could lead to the OSD(s) crashing in future. You should redeploy the affected OSDs.
- - alert: CephDeviceFailurePredicted
- expr: ceph_health_detail{name="DEVICE_HEALTH"} == 1
- for: 1m
+ {{ $down := query "count(ceph_mon_quorum_status == 0)" | first | value }}{{ $s := "" }}{{ if gt $down 1.0 }}{{ $s = "s" }}{{ end }}You have {{ $down }} monitor{{ $s }} down. Quorum is still intact, but the loss of an additional monitor will make your cluster inoperable. The following monitors are down: {{- range query "(ceph_mon_quorum_status == 0) + on(ceph_daemon) group_left(hostname) (ceph_mon_metadata LICENSE README.md Taskfile.yml cluster default docs hack mkdocs.yml scripts talos 0)" }} - {{ .Labels.ceph_daemon }} on {{ .Labels.hostname }} {{- end }}
+ documentation: "https://docs.ceph.com/en/latest/rados/operations/health-checks#mon-down"
+ summary: "One or more monitors down"
+ expr: |
+ count(ceph_mon_quorum_status == 0) <= (count(ceph_mon_metadata) - floor(count(ceph_mon_metadata) / 2) + 1)
+ for: "30s"
 labels:
- severity: warning
- type: ceph_default
- annotations:
- documentation: https://docs.ceph.com/en/latest/rados/operations/health-checks#id2
- summary: Device(s) predicted to fail soon
- description: |
- The device health module has determined that one or more devices will fail
- soon. To review device status use 'ceph device ls'. To show a specific
- device use 'ceph device info <dev id>'.
-
- Mark the OSD out so that data may migrate to other OSDs. Once
- the OSD has drained, destroy the OSD, replace the device, and redeploy the OSD.
- - alert: CephDeviceFailurePredictionTooHigh
- expr: ceph_health_detail{name="DEVICE_HEALTH_TOOMANY"} == 1
- for: 1m
- labels:
- severity: critical
- type: ceph_default
- oid: 1.3.6.1.4.1.50495.1.2.1.4.7
+ severity: "warning"
+ type: "ceph_default"
+ - alert: "CephMonDiskspaceCritical"
+ annotations:
+ description: "The free space available to a monitor's store is critically low. You should increase the space available to the monitor(s). The default directory is /var/lib/ceph/mon-*/data/store.db on traditional deployments, and /var/lib/rook/mon-*/data/store.db on the mon pod's worker node for Rook. Look for old, rotated versions of *.log and MANIFEST*. Do NOT touch any *.sst files. Also check any other directories under /var/lib/rook and other directories on the same filesystem, often /var/log and /var/tmp are culprits. Your monitor hosts are; {{- range query \"ceph_mon_metadata\"}} - {{ .Labels.hostname }} {{- end }}"
+ documentation: "https://docs.ceph.com/en/latest/rados/operations/health-checks#mon-disk-crit"
+ summary: "Filesystem space on at least one monitor is critically low"
+ expr: "ceph_health_detail{name=\"MON_DISK_CRIT\"} == 1"
+ for: "1m"
+ labels:
+ oid: "1.3.6.1.4.1.50495.1.2.1.3.2"
+ severity: "critical"
+ type: "ceph_default"
+ - alert: "CephMonDiskspaceLow"
+ annotations:
+ description: "The space available to a monitor's store is approaching full (>70% is the default). You should increase the space available to the monitor(s). The default directory is /var/lib/ceph/mon-*/data/store.db on traditional deployments, and /var/lib/rook/mon-*/data/store.db on the mon pod's worker node for Rook. Look for old, rotated versions of *.log and MANIFEST*. Do NOT touch any *.sst files. Also check any other directories under /var/lib/rook and other directories on the same filesystem, often /var/log and /var/tmp are culprits. Your monitor hosts are; {{- range query \"ceph_mon_metadata\"}} - {{ .Labels.hostname }} {{- end }}"
+ documentation: "https://docs.ceph.com/en/latest/rados/operations/health-checks#mon-disk-low"
+ summary: "Drive space on at least one monitor is approaching full"
+ expr: "ceph_health_detail{name=\"MON_DISK_LOW\"} == 1"
+ for: "5m"
+ labels:
+ severity: "warning"
+ type: "ceph_default"
+ - alert: "CephMonClockSkew"
+ annotations:
+ description: "Ceph monitors rely on closely synchronized time to maintain quorum and cluster consistency. This event indicates that the time on at least one mon has drifted too far from the lead mon. Review cluster status with ceph -s. This will show which monitors are affected. Check the time sync status on each monitor host with 'ceph time-sync-status' and the state and peers of your ntpd or chrony daemon."
+ documentation: "https://docs.ceph.com/en/latest/rados/operations/health-checks#mon-clock-skew"
+ summary: "Clock skew detected among monitors"
+ expr: "ceph_health_detail{name=\"MON_CLOCK_SKEW\"} == 1"
+ for: "1m"
+ labels:
+ severity: "warning"
+ type: "ceph_default"
+ - name: "osd"
+ rules:
+ - alert: "CephOSDDownHigh"
 annotations:
- documentation: https://docs.ceph.com/en/latest/rados/operations/health-checks#device-health-toomany
- summary: Too many devices are predicted to fail, unable to resolve
- description: |
- The device health module has determined that devices predicted to
- fail can not be remediated automatically, since too many OSDs would be removed from the
- cluster to ensure performance and availabililty. Prevent data
- integrity issues by adding new OSDs so that data may be relocated.
- - alert: CephDeviceFailureRelocationIncomplete
- expr: ceph_health_detail{name="DEVICE_HEALTH_IN_USE"} == 1
- for: 1m
- labels:
- severity: warning
- type: ceph_default
+ description: "{{ $value | humanize }}% or {{ with query \"count(ceph_osd_up == 0)\" }}{{ . | first | value }}{{ end }} of {{ with query \"count(ceph_osd_up)\" }}{{ . | first | value }}{{ end }} OSDs are down (>= 10%). The following OSDs are down: {{- range query \"(ceph_osd_up LICENSE README.md Taskfile.yml cluster default docs hack mkdocs.yml scripts talos on(ceph_daemon) group_left(hostname) ceph_osd_metadata) == 0\" }} - {{ .Labels.ceph_daemon }} on {{ .Labels.hostname }} {{- end }}"
+ summary: "More than 10% of OSDs are down"
+ expr: "count(ceph_osd_up == 0) / count(ceph_osd_up) LICENSE README.md Taskfile.yml cluster default docs hack mkdocs.yml scripts talos 100 >= 10"
+ for: "5m"
+ labels:
+ oid: "1.3.6.1.4.1.50495.1.2.1.4.1"
+ severity: "critical"
+ type: "ceph_default"
+ - alert: "CephOSDHostDown"
+ annotations:
+ description: "The following OSDs are down: {{- range query \"(ceph_osd_up LICENSE README.md Taskfile.yml cluster default docs hack mkdocs.yml scripts talos on(ceph_daemon) group_left(hostname) ceph_osd_metadata) == 0\" }} - {{ .Labels.hostname }} : {{ .Labels.ceph_daemon }} {{- end }}"
+ summary: "An OSD host is offline"
+ expr: "ceph_health_detail{name=\"OSD_HOST_DOWN\"} == 1"
+ for: "5m"
+ labels:
+ oid: "1.3.6.1.4.1.50495.1.2.1.4.8"
+ severity: "warning"
+ type: "ceph_default"
+ - alert: "CephOSDDown"
+ annotations:
+ description: |
+ {{ $num := query "count(ceph_osd_up == 0)" | first | value }}{{ $s := "" }}{{ if gt $num 1.0 }}{{ $s = "s" }}{{ end }}{{ $num }} OSD{{ $s }} down for over 5mins. The following OSD{{ $s }} {{ if eq $s "" }}is{{ else }}are{{ end }} down: {{- range query "(ceph_osd_up LICENSE README.md Taskfile.yml cluster default docs hack mkdocs.yml scripts talos on(ceph_daemon) group_left(hostname) ceph_osd_metadata) == 0"}} - {{ .Labels.ceph_daemon }} on {{ .Labels.hostname }} {{- end }}
+ documentation: "https://docs.ceph.com/en/latest/rados/operations/health-checks#osd-down"
+ summary: "An OSD has been marked down"
+ expr: "ceph_health_detail{name=\"OSD_DOWN\"} == 1"
+ for: "5m"
+ labels:
+ oid: "1.3.6.1.4.1.50495.1.2.1.4.2"
+ severity: "warning"
+ type: "ceph_default"
+ - alert: "CephOSDNearFull"
+ annotations:
+ description: "One or more OSDs have reached the NEARFULL threshold. Use 'ceph health detail' and 'ceph osd df' to identify the problem. To resolve, add capacity to the affected OSD's failure domain, restore down/out OSDs, or delete unwanted data."
+ documentation: "https://docs.ceph.com/en/latest/rados/operations/health-checks#osd-nearfull"
+ summary: "OSD(s) running low on free space (NEARFULL)"
+ expr: "ceph_health_detail{name=\"OSD_NEARFULL\"} == 1"
+ for: "5m"
+ labels:
+ oid: "1.3.6.1.4.1.50495.1.2.1.4.3"
+ severity: "warning"
+ type: "ceph_default"
+ - alert: "CephOSDFull"
+ annotations:
+ description: "An OSD has reached the FULL threshold. Writes to pools that share the affected OSD will be blocked. Use 'ceph health detail' and 'ceph osd df' to identify the problem. To resolve, add capacity to the affected OSD's failure domain, restore down/out OSDs, or delete unwanted data."
+ documentation: "https://docs.ceph.com/en/latest/rados/operations/health-checks#osd-full"
+ summary: "OSD full, writes blocked"
+ expr: "ceph_health_detail{name=\"OSD_FULL\"} > 0"
+ for: "1m"
+ labels:
+ oid: "1.3.6.1.4.1.50495.1.2.1.4.6"
+ severity: "critical"
+ type: "ceph_default"
+ - alert: "CephOSDBackfillFull"
+ annotations:
+ description: "An OSD has reached the BACKFILL FULL threshold. This will prevent rebalance operations from completing. Use 'ceph health detail' and 'ceph osd df' to identify the problem. To resolve, add capacity to the affected OSD's failure domain, restore down/out OSDs, or delete unwanted data."
+ documentation: "https://docs.ceph.com/en/latest/rados/operations/health-checks#osd-backfillfull"
+ summary: "OSD(s) too full for backfill operations"
+ expr: "ceph_health_detail{name=\"OSD_BACKFILLFULL\"} > 0"
+ for: "1m"
+ labels:
+ severity: "warning"
+ type: "ceph_default"
+ - alert: "CephOSDTooManyRepairs"
+ annotations:
+ description: "Reads from an OSD have used a secondary PG to return data to the client, indicating a potential failing drive."
+ documentation: "https://docs.ceph.com/en/latest/rados/operations/health-checks#osd-too-many-repairs"
+ summary: "OSD reports a high number of read errors"
+ expr: "ceph_health_detail{name=\"OSD_TOO_MANY_REPAIRS\"} == 1"
+ for: "30s"
+ labels:
+ severity: "warning"
+ type: "ceph_default"
+ - alert: "CephOSDTimeoutsPublicNetwork"
+ annotations:
+ description: "OSD heartbeats on the cluster's 'public' network (frontend) are running slow. Investigate the network for latency or loss issues. Use 'ceph health detail' to show the affected OSDs."
+ summary: "Network issues delaying OSD heartbeats (public network)"
+ expr: "ceph_health_detail{name=\"OSD_SLOW_PING_TIME_FRONT\"} == 1"
+ for: "1m"
+ labels:
+ severity: "warning"
+ type: "ceph_default"
+ - alert: "CephOSDTimeoutsClusterNetwork"
+ annotations:
+ description: "OSD heartbeats on the cluster's 'cluster' network (backend) are slow. Investigate the network for latency issues on this subnet. Use 'ceph health detail' to show the affected OSDs."
+ summary: "Network issues delaying OSD heartbeats (cluster network)"
+ expr: "ceph_health_detail{name=\"OSD_SLOW_PING_TIME_BACK\"} == 1"
+ for: "1m"
+ labels:
+ severity: "warning"
+ type: "ceph_default"
+ - alert: "CephOSDInternalDiskSizeMismatch"
+ annotations:
+ description: "One or more OSDs have an internal inconsistency between metadata and the size of the device. This could lead to the OSD(s) crashing in future. You should redeploy the affected OSDs."
+ documentation: "https://docs.ceph.com/en/latest/rados/operations/health-checks#bluestore-disk-size-mismatch"
+ summary: "OSD size inconsistency error"
+ expr: "ceph_health_detail{name=\"BLUESTORE_DISK_SIZE_MISMATCH\"} == 1"
+ for: "1m"
+ labels:
+ severity: "warning"
+ type: "ceph_default"
+ - alert: "CephDeviceFailurePredicted"
+ annotations:
+ description: "The device health module has determined that one or more devices will fail soon. To review device status use 'ceph device ls'. To show a specific device use 'ceph device info <dev id>'. Mark the OSD out so that data may migrate to other OSDs. Once the OSD has drained, destroy the OSD, replace the device, and redeploy the OSD."
+ documentation: "https://docs.ceph.com/en/latest/rados/operations/health-checks#id2"
+ summary: "Device(s) predicted to fail soon"
+ expr: "ceph_health_detail{name=\"DEVICE_HEALTH\"} == 1"
+ for: "1m"
+ labels:
+ severity: "warning"
+ type: "ceph_default"
+ - alert: "CephDeviceFailurePredictionTooHigh"
+ annotations:
+ description: "The device health module has determined that devices predicted to fail can not be remediated automatically, since too many OSDs would be removed from the cluster to ensure performance and availability. Prevent data integrity issues by adding new OSDs so that data may be relocated."
+ documentation: "https://docs.ceph.com/en/latest/rados/operations/health-checks#device-health-toomany"
+ summary: "Too many devices are predicted to fail, unable to resolve"
+ expr: "ceph_health_detail{name=\"DEVICE_HEALTH_TOOMANY\"} == 1"
+ for: "1m"
+ labels:
+ oid: "1.3.6.1.4.1.50495.1.2.1.4.7"
+ severity: "critical"
+ type: "ceph_default"
+ - alert: "CephDeviceFailureRelocationIncomplete"
+ annotations:
+ description: "The device health module has determined that one or more devices will fail soon, but the normal process of relocating the data on the device to other OSDs in the cluster is blocked. \nEnsure that the cluster has available free space. It may be necessary to add capacity to the cluster to allow data from the failing device to successfully migrate, or to enable the balancer."
+ documentation: "https://docs.ceph.com/en/latest/rados/operations/health-checks#device-health-in-use"
+ summary: "Device failure is predicted, but unable to relocate data"
+ expr: "ceph_health_detail{name=\"DEVICE_HEALTH_IN_USE\"} == 1"
+ for: "1m"
+ labels:
+ severity: "warning"
+ type: "ceph_default"
+ - alert: "CephOSDFlapping"
+ annotations:
+ description: "OSD {{ $labels.ceph_daemon }} on {{ $labels.hostname }} was marked down and back up {{ $value | humanize }} times once a minute for 5 minutes. This may indicate a network issue (latency, packet loss, MTU mismatch) on the cluster network, or the public network if no cluster network is deployed. Check the network stats on the listed host(s)."
+ documentation: "https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-osd#flapping-osds"
+ summary: "Network issues are causing OSDs to flap (mark each other down)"
+ expr: "(rate(ceph_osd_up[5m]) LICENSE README.md Taskfile.yml cluster default docs hack mkdocs.yml scripts talos on(ceph_daemon) group_left(hostname) ceph_osd_metadata) LICENSE README.md Taskfile.yml cluster default docs hack mkdocs.yml scripts talos 60 > 1"
+ labels:
+ oid: "1.3.6.1.4.1.50495.1.2.1.4.4"
+ severity: "warning"
+ type: "ceph_default"
+ - alert: "CephOSDReadErrors"
+ annotations:
+ description: "An OSD has encountered read errors, but the OSD has recovered by retrying the reads. This may indicate an issue with hardware or the kernel."
+ documentation: "https://docs.ceph.com/en/latest/rados/operations/health-checks#bluestore-spurious-read-errors"
+ summary: "Device read errors detected"
+ expr: "ceph_health_detail{name=\"BLUESTORE_SPURIOUS_READ_ERRORS\"} == 1"
+ for: "30s"
+ labels:
+ severity: "warning"
+ type: "ceph_default"
+ - alert: "CephPGImbalance"
 annotations:
- documentation: https://docs.ceph.com/en/latest/rados/operations/health-checks#device-health-in-use
- summary: Device failure is predicted, but unable to relocate data
- description: |
- The device health module has determined that one or more devices will fail
- soon, but the normal process of relocating the data on the device to other
- OSDs in the cluster is blocked.
-
- Ensure that the cluster has available free space. It may be necessary to add
- capacity to the cluster to allow the data from the failing device to
- successfully migrate, or to enable the balancer.
- - alert: CephOSDFlapping
- expr: |
- (
- rate(ceph_osd_up[5m])
- LICENSE README.md Taskfile.yml cluster default docs hack mkdocs.yml scripts talos on(ceph_daemon) group_left(hostname) ceph_osd_metadata
- ) LICENSE README.md Taskfile.yml cluster default docs hack mkdocs.yml scripts talos 60 > 1
- labels:
- severity: warning
- type: ceph_default
- oid: 1.3.6.1.4.1.50495.1.2.1.4.4
- annotations:
- documentation: https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-osd#flapping-osds
- summary: Network issues are causing OSDs to flap (mark each other down)
- description: >
- OSD {{ $labels.ceph_daemon }} on {{ $labels.hostname }} was marked down and back up {{ $value | humanize }} times once a minute for 5 minutes. This may indicate a network issue (latency, packet loss, MTU mismatch) on the cluster network, or the public network if no cluster network is deployed. Check network stats on the listed host(s).
-
- - alert: CephOSDReadErrors
- expr: ceph_health_detail{name="BLUESTORE_SPURIOUS_READ_ERRORS"} == 1
- for: 30s
- labels:
- severity: warning
- type: ceph_default
- annotations:
- documentation: https://docs.ceph.com/en/latest/rados/operations/health-checks#bluestore-spurious-read-errors
- summary: Device read errors detected
- description: >
- An OSD has encountered read errors, but the OSD has recovered by retrying the reads. This may indicate an issue with hardware or the kernel.
-
- # alert on high deviation from average PG count
- - alert: CephPGImbalance
+ description: "OSD {{ $labels.ceph_daemon }} on {{ $labels.hostname }} deviates by more than 30% from average PG count."
+ summary: "PGs are not balanced across OSDs"
 expr: |
 abs(
- (
- (ceph_osd_numpg > 0) - on (job) group_left avg(ceph_osd_numpg > 0) by (job)
- ) / on (job) group_left avg(ceph_osd_numpg > 0) by (job)
+ ((ceph_osd_numpg > 0) - on (job) group_left avg(ceph_osd_numpg > 0) by (job)) /
+ on (job) group_left avg(ceph_osd_numpg > 0) by (job)
 ) LICENSE README.md Taskfile.yml cluster default docs hack mkdocs.yml scripts talos on (ceph_daemon) group_left(hostname) ceph_osd_metadata > 0.30
- for: 5m
+ for: "5m"
 labels:
- severity: warning
- type: ceph_default
- oid: 1.3.6.1.4.1.50495.1.2.1.4.5
- annotations:
- summary: PGs are not balanced across OSDs
- description: >
- OSD {{ $labels.ceph_daemon }} on {{ $labels.hostname }} deviates by more than 30% from average PG count.
-
- # alert on high commit latency...but how high is too high
- - name: mds
+ oid: "1.3.6.1.4.1.50495.1.2.1.4.5"
+ severity: "warning"
+ type: "ceph_default"
+ - name: "mds"
 rules:
- - alert: CephFilesystemDamaged
- expr: ceph_health_detail{name="MDS_DAMAGE"} > 0
- for: 1m
- labels:
- severity: critical
- type: ceph_default
- oid: 1.3.6.1.4.1.50495.1.2.1.5.1
- annotations:
- documentation: https://docs.ceph.com/en/latest/cephfs/health-messages#cephfs-health-messages
- summary: CephFS filesystem is damaged.
- description: >
- Filesystem metadata has been corrupted. Data may be inaccessible. Analyze metrics from the MDS daemon admin socket, or escalate to support.
-
- - alert: CephFilesystemOffline
- expr: ceph_health_detail{name="MDS_ALL_DOWN"} > 0
- for: 1m
- labels:
- severity: critical
- type: ceph_default
- oid: 1.3.6.1.4.1.50495.1.2.1.5.3
- annotations:
- documentation: https://docs.ceph.com/en/latest/cephfs/health-messages/#mds-all-down
- summary: CephFS filesystem is offline
- description: >
- All MDS ranks are unavailable. The MDS daemons managing metadata are down, rendering the filesystem offline.
-
- - alert: CephFilesystemDegraded
- expr: ceph_health_detail{name="FS_DEGRADED"} > 0
- for: 1m
- labels:
- severity: critical
- type: ceph_default
- oid: 1.3.6.1.4.1.50495.1.2.1.5.4
- annotations:
- documentation: https://docs.ceph.com/en/latest/cephfs/health-messages/#fs-degraded
- summary: CephFS filesystem is degraded
- description: >
- One or more metadata daemons (MDS ranks) are failed or in a damaged state. At best the filesystem is partially available, at worst the filesystem is completely unusable.
-
- - alert: CephFilesystemMDSRanksLow
- expr: ceph_health_detail{name="MDS_UP_LESS_THAN_MAX"} > 0
- for: 1m
- labels:
- severity: warning
- type: ceph_default
- annotations:
- documentation: https://docs.ceph.com/en/latest/cephfs/health-messages/#mds-up-less-than-max
- summary: MDS daemon count is lower than configured
- description: >
- The filesystem's "max_mds" setting defines the number of MDS ranks in the filesystem. The current number of active MDS daemons is less than this value.
-
- - alert: CephFilesystemInsufficientStandby
- expr: ceph_health_detail{name="MDS_INSUFFICIENT_STANDBY"} > 0
- for: 1m
- labels:
- severity: warning
- type: ceph_default
- annotations:
- documentation: https://docs.ceph.com/en/latest/cephfs/health-messages/#mds-insufficient-standby
- summary: Ceph filesystem standby daemons too few
- description: >
- The minimum number of standby daemons required by standby_count_wanted is less than the current number of standby daemons. Adjust the standby count or increase the number of MDS daemons.
-
- - alert: CephFilesystemFailureNoStandby
- expr: ceph_health_detail{name="FS_WITH_FAILED_MDS"} > 0
- for: 1m
- labels:
- severity: critical
- type: ceph_default
- oid: 1.3.6.1.4.1.50495.1.2.1.5.5
- annotations:
- documentation: https://docs.ceph.com/en/latest/cephfs/health-messages/#fs-with-failed-mds
- summary: MDS daemon failed, no further standby available
- description: >
- An MDS daemon has failed, leaving only one active rank and no available standby. Investigate the cause of the failure or add a standby MDS.
-
- - alert: CephFilesystemReadOnly
- expr: ceph_health_detail{name="MDS_HEALTH_READ_ONLY"} > 0
- for: 1m
- labels:
- severity: critical
- type: ceph_default
- oid: 1.3.6.1.4.1.50495.1.2.1.5.2
- annotations:
- documentation: https://docs.ceph.com/en/latest/cephfs/health-messages#cephfs-health-messages
- summary: CephFS filesystem in read only mode due to write error(s)
- description: >
- The filesystem has switched to READ ONLY due to an unexpected error when writing to the metadata pool.
-
- Analyze the output from the MDS daemon admin socket, or escalate to support.
-
- - name: mgr
+ - alert: "CephFilesystemDamaged"
+ annotations:
+ description: "Filesystem metadata has been corrupted. Data may be inaccessible. Analyze metrics from the MDS daemon admin socket, or escalate to support."
+ documentation: "https://docs.ceph.com/en/latest/cephfs/health-messages#cephfs-health-messages"
+ summary: "CephFS filesystem is damaged."
+ expr: "ceph_health_detail{name=\"MDS_DAMAGE\"} > 0"
+ for: "1m"
+ labels:
+ oid: "1.3.6.1.4.1.50495.1.2.1.5.1"
+ severity: "critical"
+ type: "ceph_default"
+ - alert: "CephFilesystemOffline"
+ annotations:
+ description: "All MDS ranks are unavailable. The MDS daemons managing metadata are down, rendering the filesystem offline."
+ documentation: "https://docs.ceph.com/en/latest/cephfs/health-messages/#mds-all-down"
+ summary: "CephFS filesystem is offline"
+ expr: "ceph_health_detail{name=\"MDS_ALL_DOWN\"} > 0"
+ for: "1m"
+ labels:
+ oid: "1.3.6.1.4.1.50495.1.2.1.5.3"
+ severity: "critical"
+ type: "ceph_default"
+ - alert: "CephFilesystemDegraded"
+ annotations:
+ description: "One or more metadata daemons (MDS ranks) are failed or in a damaged state. At best the filesystem is partially available, at worst the filesystem is completely unusable."
+ documentation: "https://docs.ceph.com/en/latest/cephfs/health-messages/#fs-degraded"
+ summary: "CephFS filesystem is degraded"
+ expr: "ceph_health_detail{name=\"FS_DEGRADED\"} > 0"
+ for: "1m"
+ labels:
+ oid: "1.3.6.1.4.1.50495.1.2.1.5.4"
+ severity: "critical"
+ type: "ceph_default"
+ - alert: "CephFilesystemMDSRanksLow"
+ annotations:
+ description: "The filesystem's 'max_mds' setting defines the number of MDS ranks in the filesystem. The current number of active MDS daemons is less than this value."
+ documentation: "https://docs.ceph.com/en/latest/cephfs/health-messages/#mds-up-less-than-max"
+ summary: "Ceph MDS daemon count is lower than configured"
+ expr: "ceph_health_detail{name=\"MDS_UP_LESS_THAN_MAX\"} > 0"
+ for: "1m"
+ labels:
+ severity: "warning"
+ type: "ceph_default"
+ - alert: "CephFilesystemInsufficientStandby"
+ annotations:
+ description: "The minimum number of standby daemons required by standby_count_wanted is less than the current number of standby daemons. Adjust the standby count or increase the number of MDS daemons."
+ documentation: "https://docs.ceph.com/en/latest/cephfs/health-messages/#mds-insufficient-standby"
+ summary: "Ceph filesystem standby daemons too few"
+ expr: "ceph_health_detail{name=\"MDS_INSUFFICIENT_STANDBY\"} > 0"
+ for: "1m"
+ labels:
+ severity: "warning"
+ type: "ceph_default"
+ - alert: "CephFilesystemFailureNoStandby"
+ annotations:
+ description: "An MDS daemon has failed, leaving only one active rank and no available standby. Investigate the cause of the failure or add a standby MDS."
+ documentation: "https://docs.ceph.com/en/latest/cephfs/health-messages/#fs-with-failed-mds"
+ summary: "MDS daemon failed, no further standby available"
+ expr: "ceph_health_detail{name=\"FS_WITH_FAILED_MDS\"} > 0"
+ for: "1m"
+ labels:
+ oid: "1.3.6.1.4.1.50495.1.2.1.5.5"
+ severity: "critical"
+ type: "ceph_default"
+ - alert: "CephFilesystemReadOnly"
+ annotations:
+ description: "The filesystem has switched to READ ONLY due to an unexpected error when writing to the metadata pool. Either analyze the output from the MDS daemon admin socket, or escalate to support."
+ documentation: "https://docs.ceph.com/en/latest/cephfs/health-messages#cephfs-health-messages"
+ summary: "CephFS filesystem in read only mode due to write error(s)"
+ expr: "ceph_health_detail{name=\"MDS_HEALTH_READ_ONLY\"} > 0"
+ for: "1m"
+ labels:
+ oid: "1.3.6.1.4.1.50495.1.2.1.5.2"
+ severity: "critical"
+ type: "ceph_default"
+ - name: "mgr"
 rules:
- - alert: CephMgrModuleCrash
- expr: ceph_health_detail{name="RECENT_MGR_MODULE_CRASH"} == 1
- for: 5m
- labels:
- severity: critical
- type: ceph_default
- oid: 1.3.6.1.4.1.50495.1.2.1.6.1
- annotations:
- documentation: https://docs.ceph.com/en/latest/rados/operations/health-checks#recent-mgr-module-crash
- summary: A manager module has recently crashed
- description: >
- One or more mgr modules have crashed and have yet to be acknowledged by an administrator. A crashed module may impact functionality within the cluster. Use the 'ceph crash' command to determine which module has failed, and archive it to acknowledge the failure.
-
- - alert: CephMgrPrometheusModuleInactive
- expr: up{job="ceph"} == 0
- for: 1m
- labels:
- severity: critical
- type: ceph_default
- oid: 1.3.6.1.4.1.50495.1.2.1.6.2
- annotations:
- summary: The mgr/prometheus module is not available
- description: >
- The mgr/prometheus module at {{ $labels.instance }} is unreachable. This could mean that the module has been disabled or the mgr daemon itself is down.
-
- Without the mgr/prometheus module metrics and alerts will no longer function. Open a shell to an admin node or toolbox pod and use 'ceph -s' to to determine whether the mgr is active. If the mgr is not active, restart it, otherwise you can determine the mgr/prometheus module status with 'ceph mgr module ls'. If it is not listed as enabled, enable it with 'ceph mgr module enable prometheus'.
-
- - name: pgs
+ - alert: "CephMgrModuleCrash"
+ annotations:
+ description: "One or more mgr modules have crashed and have yet to be acknowledged by an administrator. A crashed module may impact functionality within the cluster. Use the 'ceph crash' command to determine which module has failed, and archive it to acknowledge the failure."
+ documentation: "https://docs.ceph.com/en/latest/rados/operations/health-checks#recent-mgr-module-crash"
+ summary: "A manager module has recently crashed"
+ expr: "ceph_health_detail{name=\"RECENT_MGR_MODULE_CRASH\"} == 1"
+ for: "5m"
+ labels:
+ oid: "1.3.6.1.4.1.50495.1.2.1.6.1"
+ severity: "critical"
+ type: "ceph_default"
+ - alert: "CephMgrPrometheusModuleInactive"
+ annotations:
+ description: "The mgr/prometheus module at {{ $labels.instance }} is unreachable. This could mean that the module has been disabled or the mgr daemon itself is down. Without the mgr/prometheus module metrics and alerts will no longer function. Open a shell to an admin node or toolbox pod and use 'ceph -s' to to determine whether the mgr is active. If the mgr is not active, restart it, otherwise you can determine module status with 'ceph mgr module ls'. If it is not listed as enabled, enable it with 'ceph mgr module enable prometheus'."
+ summary: "The mgr/prometheus module is not available"
+ expr: "up{job=\"ceph\"} == 0"
+ for: "1m"
+ labels:
+ oid: "1.3.6.1.4.1.50495.1.2.1.6.2"
+ severity: "critical"
+ type: "ceph_default"
+ - name: "pgs"
 rules:
- - alert: CephPGsInactive
- expr: ceph_pool_metadata LICENSE README.md Taskfile.yml cluster default docs hack mkdocs.yml scripts talos on(pool_id,instance) group_left() (ceph_pg_total - ceph_pg_active) > 0
- for: 5m
- labels:
- severity: critical
- type: ceph_default
- oid: 1.3.6.1.4.1.50495.1.2.1.7.1
- annotations:
- summary: One or more placement groups are inactive
- description: >
- {{ $value }} PGs have been inactive for more than 5 minutes in pool {{ $labels.name }}. Inactive placement groups are not able to serve read/write requests.
-
- - alert: CephPGsUnclean
- expr: ceph_pool_metadata LICENSE README.md Taskfile.yml cluster default docs hack mkdocs.yml scripts talos on(pool_id,instance) group_left() (ceph_pg_total - ceph_pg_clean) > 0
- for: 15m
- labels:
- severity: warning
- type: ceph_default
- oid: 1.3.6.1.4.1.50495.1.2.1.7.2
- annotations:
- summary: One or more placement groups are marked unclean
- description: >
- {{ $value }} PGs have been unclean for more than 15 minutes in pool {{ $labels.name }}. Unclean PGs have not recovered from a previous failure.
-
- - alert: CephPGsDamaged
- expr: ceph_health_detail{name=~"PG_DAMAGED|OSD_SCRUB_ERRORS"} == 1
- for: 5m
- labels:
- severity: critical
- type: ceph_default
- oid: 1.3.6.1.4.1.50495.1.2.1.7.4
- annotations:
- documentation: https://docs.ceph.com/en/latest/rados/operations/health-checks#pg-damaged
- summary: Placement group damaged; manual intervention needed
- description: >
- Scrubs have flagged at least one PG as damaged or inconsistent.
-
- Check to see which PG is affected, and attempt a manual repair if necessary. To list problematic placement groups, use 'ceph health detail' or 'rados list-inconsistent-pg <pool>'. To repair PGs use the 'ceph pg repair <pg_num>' command.
-
- - alert: CephPGRecoveryAtRisk
- expr: ceph_health_detail{name="PG_RECOVERY_FULL"} == 1
- for: 1m
- labels:
- severity: critical
- type: ceph_default
- oid: 1.3.6.1.4.1.50495.1.2.1.7.5
- annotations:
- documentation: https://docs.ceph.com/en/latest/rados/operations/health-checks#pg-recovery-full
- summary: OSDs are too full for recovery
- description: >
- Data redundancy is at risk since one or more OSDs are at or above the 'full' threshold. Add capacity to the cluster, restore down/out OSDs, or delete unwanted data.
-
- - alert: CephPGUnavailableBlockingIO
- # PG_AVAILABILITY, but an OSD is not in a DOWN state
- expr: ((ceph_health_detail{name="PG_AVAILABILITY"} == 1) - scalar(ceph_health_detail{name="OSD_DOWN"})) == 1
- for: 1m
- labels:
- severity: critical
- type: ceph_default
- oid: 1.3.6.1.4.1.50495.1.2.1.7.3
- annotations:
- documentation: https://docs.ceph.com/en/latest/rados/operations/health-checks#pg-availability
- summary: PG is unavailable, blocking I/O
- description: >
- Data availability is reduced, impacting the cluster's ability to service I/O. One or more placement groups (PGs) are in a state that blocks I/O.
-
- - alert: CephPGBackfillAtRisk
- expr: ceph_health_detail{name="PG_BACKFILL_FULL"} == 1
- for: 1m
- labels:
- severity: critical
- type: ceph_default
- oid: 1.3.6.1.4.1.50495.1.2.1.7.6
- annotations:
- documentation: https://docs.ceph.com/en/latest/rados/operations/health-checks#pg-backfill-full
- summary: Backfill operations are blocked due to lack of free space
- description: >
- Data redundancy may be at risk due to lack of free space within the cluster. One or more OSDs have breached their 'backfillfull' threshold. Add more capacity, or delete unwanted data.
-
- - alert: CephPGNotScrubbed
- expr: ceph_health_detail{name="PG_NOT_SCRUBBED"} == 1
- for: 5m
- labels:
- severity: warning
- type: ceph_default
+ - alert: "CephPGsInactive"
 annotations:
- documentation: https://docs.ceph.com/en/latest/rados/operations/health-checks#pg-not-scrubbed
- summary: Placement group(s) have not been scrubbed
- description: |
- One or more PGs have not been scrubbed recently. Scrubs check metadata integrity,
- protecting against bit-rot. They check that metadata
- is consistent across data replicas. When PGs miss their scrub interval, it may
- indicate that the scrub window is too small, or PGs were not in a 'clean' state during the
- scrub window.
-
- You can manually initiate a scrub with: ceph pg scrub <pgid>
- - alert: CephPGsHighPerOSD
- expr: ceph_health_detail{name="TOO_MANY_PGS"} == 1
- for: 1m
- labels:
- severity: warning
- type: ceph_default
+ description: "{{ $value }} PGs have been inactive for more than 5 minutes in pool {{ $labels.name }}. Inactive placement groups are not able to serve read/write requests."
+ summary: "One or more placement groups are inactive"
+ expr: "ceph_pool_metadata LICENSE README.md Taskfile.yml cluster default docs hack mkdocs.yml scripts talos on(pool_id,instance) group_left() (ceph_pg_total - ceph_pg_active) > 0"
+ for: "5m"
+ labels:
+ oid: "1.3.6.1.4.1.50495.1.2.1.7.1"
+ severity: "critical"
+ type: "ceph_default"
+ - alert: "CephPGsUnclean"
+ annotations:
+ description: "{{ $value }} PGs have been unclean for more than 15 minutes in pool {{ $labels.name }}. Unclean PGs have not recovered from a previous failure."
+ summary: "One or more placement groups are marked unclean"
+ expr: "ceph_pool_metadata LICENSE README.md Taskfile.yml cluster default docs hack mkdocs.yml scripts talos on(pool_id,instance) group_left() (ceph_pg_total - ceph_pg_clean) > 0"
+ for: "15m"
+ labels:
+ oid: "1.3.6.1.4.1.50495.1.2.1.7.2"
+ severity: "warning"
+ type: "ceph_default"
+ - alert: "CephPGsDamaged"
+ annotations:
+ description: "During data consistency checks (scrub), at least one PG has been flagged as being damaged or inconsistent. Check to see which PG is affected, and attempt a manual repair if necessary. To list problematic placement groups, use 'rados list-inconsistent-pg <pool>'. To repair PGs use the 'ceph pg repair <pg_num>' command."
+ documentation: "https://docs.ceph.com/en/latest/rados/operations/health-checks#pg-damaged"
+ summary: "Placement group damaged, manual intervention needed"
+ expr: "ceph_health_detail{name=~\"PG_DAMAGED|OSD_SCRUB_ERRORS\"} == 1"
+ for: "5m"
+ labels:
+ oid: "1.3.6.1.4.1.50495.1.2.1.7.4"
+ severity: "critical"
+ type: "ceph_default"
+ - alert: "CephPGRecoveryAtRisk"
+ annotations:
+ description: "Data redundancy is at risk since one or more OSDs are at or above the 'full' threshold. Add more capacity to the cluster, restore down/out OSDs, or delete unwanted data."
+ documentation: "https://docs.ceph.com/en/latest/rados/operations/health-checks#pg-recovery-full"
+ summary: "OSDs are too full for recovery"
+ expr: "ceph_health_detail{name=\"PG_RECOVERY_FULL\"} == 1"
+ for: "1m"
+ labels:
+ oid: "1.3.6.1.4.1.50495.1.2.1.7.5"
+ severity: "critical"
+ type: "ceph_default"
+ - alert: "CephPGUnavailableBlockingIO"
+ annotations:
+ description: "Data availability is reduced, impacting the cluster's ability to service I/O. One or more placement groups (PGs) are in a state that blocks I/O."
+ documentation: "https://docs.ceph.com/en/latest/rados/operations/health-checks#pg-availability"
+ summary: "PG is unavailable, blocking I/O"
+ expr: "((ceph_health_detail{name=\"PG_AVAILABILITY\"} == 1) - scalar(ceph_health_detail{name=\"OSD_DOWN\"})) == 1"
+ for: "1m"
+ labels:
+ oid: "1.3.6.1.4.1.50495.1.2.1.7.3"
+ severity: "critical"
+ type: "ceph_default"
+ - alert: "CephPGBackfillAtRisk"
+ annotations:
+ description: "Data redundancy may be at risk due to lack of free space within the cluster. One or more OSDs have reached the 'backfillfull' threshold. Add more capacity, or delete unwanted data."
+ documentation: "https://docs.ceph.com/en/latest/rados/operations/health-checks#pg-backfill-full"
+ summary: "Backfill operations are blocked due to lack of free space"
+ expr: "ceph_health_detail{name=\"PG_BACKFILL_FULL\"} == 1"
+ for: "1m"
+ labels:
+ oid: "1.3.6.1.4.1.50495.1.2.1.7.6"
+ severity: "critical"
+ type: "ceph_default"
+ - alert: "CephPGNotScrubbed"
+ annotations:
+ description: "One or more PGs have not been scrubbed recently. Scrubs check metadata integrity, protecting against bit-rot. They check that metadata is consistent across data replicas. When PGs miss their scrub interval, it may indicate that the scrub window is too small, or PGs were not in a 'clean' state during the scrub window. You can manually initiate a scrub with: ceph pg scrub <pgid>"
+ documentation: "https://docs.ceph.com/en/latest/rados/operations/health-checks#pg-not-scrubbed"
+ summary: "Placement group(s) have not been scrubbed"
+ expr: "ceph_health_detail{name=\"PG_NOT_SCRUBBED\"} == 1"
+ for: "5m"
+ labels:
+ severity: "warning"
+ type: "ceph_default"
+ - alert: "CephPGsHighPerOSD"
+ annotations:
+ description: "The number of placement groups per OSD is too high (exceeds the mon_max_pg_per_osd setting).\n Check that the pg_autoscaler has not been disabled for any pools with 'ceph osd pool autoscale-status', and that the profile selected is appropriate. You may also adjust the target_size_ratio of a pool to guide the autoscaler based on the expected relative size of the pool ('ceph osd pool set cephfs.cephfs.meta target_size_ratio .1') or set the pg_autoscaler mode to 'warn' and adjust pg_num appropriately for one or more pools."
+ documentation: "https://docs.ceph.com/en/latest/rados/operations/health-checks/#too-many-pgs"
+ summary: "Placement groups per OSD is too high"
+ expr: "ceph_health_detail{name=\"TOO_MANY_PGS\"} == 1"
+ for: "1m"
+ labels:
+ severity: "warning"
+ type: "ceph_default"
+ - alert: "CephPGNotDeepScrubbed"
+ annotations:
+ description: "One or more PGs have not been deep scrubbed recently. Deep scrubs protect against bit-rot. They compare data replicas to ensure consistency. When PGs miss their deep scrub interval, it may indicate that the window is too small or PGs were not in a 'clean' state during the deep-scrub window."
+ documentation: "https://docs.ceph.com/en/latest/rados/operations/health-checks#pg-not-deep-scrubbed"
+ summary: "Placement group(s) have not been deep scrubbed"
+ expr: "ceph_health_detail{name=\"PG_NOT_DEEP_SCRUBBED\"} == 1"
+ for: "5m"
+ labels:
+ severity: "warning"
+ type: "ceph_default"
+ - name: "nodes"
+ rules:
+ - alert: "CephNodeRootFilesystemFull"
 annotations:
- documentation: https://docs.ceph.com/en/latest/rados/operations/health-checks/#too-many-pgs
- summary: Placement groups per OSD is too high
- description: |
- The number of placement groups per OSD is too high (exceeds the mon_max_pg_per_osd setting).
-
- Check that the pg_autoscaler has not been disabled for any pools with 'ceph osd pool autoscale-status',
- and that the profile selected is appropriate. You may also adjust the target_size_ratio of a pool to guide
- the autoscaler based on the expected relative size of the pool
- ('ceph osd pool set cephfs.cephfs.meta target_size_ratio .1') or set the pg_autoscaler
- mode to "warn" and adjust pg_num appropriately for one or more pools.
- - alert: CephPGNotDeepScrubbed
- expr: ceph_health_detail{name="PG_NOT_DEEP_SCRUBBED"} == 1
- for: 5m
- labels:
- severity: warning
- type: ceph_default
+ description: "Root volume is dangerously full: {{ $value | humanize }}% free."
+ summary: "Root filesystem is dangerously full"
+ expr: "node_filesystem_avail_bytes{mountpoint=\"/\"} / node_filesystem_size_bytes{mountpoint=\"/\"} LICENSE README.md Taskfile.yml cluster default docs hack mkdocs.yml scripts talos 100 < 5"
+ for: "5m"
+ labels:
+ oid: "1.3.6.1.4.1.50495.1.2.1.8.1"
+ severity: "critical"
+ type: "ceph_default"
+ - alert: "CephNodeNetworkPacketDrops"
 annotations:
- documentation: https://docs.ceph.com/en/latest/rados/operations/health-checks#pg-not-deep-scrubbed
- summary: Placement group(s) have not been deep scrubbed
- description: |
- One or more PGs have not been deep scrubbed recently. Deep scrubs
- protect against bit-rot. They compare data
- replicas to ensure consistency. When PGs miss their deep scrub interval, it may indicate
- that the window is too small or PGs were not in a 'clean' state during the deep-scrub
- window.
-
- You can manually initiate a deep scrub with: ceph pg deep-scrub <pgid>
- - name: nodes
- rules:
- - alert: CephNodeRootFilesystemFull
- expr: node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"} LICENSE README.md Taskfile.yml cluster default docs hack mkdocs.yml scripts talos 100 < 5
- for: 5m
- labels:
- severity: critical
- type: ceph_default
- oid: 1.3.6.1.4.1.50495.1.2.1.8.1
- annotations:
- summary: Root filesystem is dangerously full
- description: >
- Root volume is dangerously full: {{ $value | humanize }}% free.
-
- # alert on packet errors and drop rate
- - alert: CephNodeNetworkPacketDrops
+ description: "Node {{ $labels.instance }} experiences packet drop > 0.5% or > 10 packets/s on interface {{ $labels.device }}."
+ summary: "One or more NICs reports packet drops"
 expr: |
 (
- increase(node_network_receive_drop_total{device!="lo"}[1m]) +
- increase(node_network_transmit_drop_total{device!="lo"}[1m])
+ rate(node_network_receive_drop_total{device!="lo"}[1m]) +
+ rate(node_network_transmit_drop_total{device!="lo"}[1m])
 ) / (
- increase(node_network_receive_packets_total{device!="lo"}[1m]) +
- increase(node_network_transmit_packets_total{device!="lo"}[1m])
- ) >= 0.0001 or (
- increase(node_network_receive_drop_total{device!="lo"}[1m]) +
- increase(node_network_transmit_drop_total{device!="lo"}[1m])
+ rate(node_network_receive_packets_total{device!="lo"}[1m]) +
+ rate(node_network_transmit_packets_total{device!="lo"}[1m])
+ ) >= 0.0050000000000000001 and (
+ rate(node_network_receive_drop_total{device!="lo"}[1m]) +
+ rate(node_network_transmit_drop_total{device!="lo"}[1m])
 ) >= 10
 labels:
- severity: warning
- type: ceph_default
- oid: 1.3.6.1.4.1.50495.1.2.1.8.2
- annotations:
- summary: One or more NICs reports packet drops
- description: >
- Node {{ $labels.instance }} experiences packet drop > 0.01% or > 10 packets/s on interface {{ $labels.device }}.
-
- - alert: CephNodeNetworkPacketErrors
+ oid: "1.3.6.1.4.1.50495.1.2.1.8.2"
+ severity: "warning"
+ type: "ceph_default"
+ - alert: "CephNodeNetworkPacketErrors"
+ annotations:
+ description: "Node {{ $labels.instance }} experiences packet errors > 0.01% or > 10 packets/s on interface {{ $labels.device }}."
+ summary: "One or more NICs reports packet errors"
 expr: |
 (
- increase(node_network_receive_errs_total{device!="lo"}[1m]) +
- increase(node_network_transmit_errs_total{device!="lo"}[1m])
+ rate(node_network_receive_errs_total{device!="lo"}[1m]) +
+ rate(node_network_transmit_errs_total{device!="lo"}[1m])
 ) / (
- increase(node_network_receive_packets_total{device!="lo"}[1m]) +
- increase(node_network_transmit_packets_total{device!="lo"}[1m])
+ rate(node_network_receive_packets_total{device!="lo"}[1m]) +
+ rate(node_network_transmit_packets_total{device!="lo"}[1m])
 ) >= 0.0001 or (
- increase(node_network_receive_errs_total{device!="lo"}[1m]) +
- increase(node_network_transmit_errs_total{device!="lo"}[1m])
+ rate(node_network_receive_errs_total{device!="lo"}[1m]) +
+ rate(node_network_transmit_errs_total{device!="lo"}[1m])
 ) >= 10
 labels:
- severity: warning
- type: ceph_default
- oid: 1.3.6.1.4.1.50495.1.2.1.8.3
- annotations:
- summary: One or more NICs reports packet errors
- description: >
- Node {{ $labels.instance }} experiences packet errors > 0.01% or > 10 packets/s on interface {{ $labels.device }}.
-
- # Restrict to device names beginning with '/' to skip false alarms from
- # tmpfs, overlay type filesystems
- - alert: CephNodeDiskspaceWarning
+ oid: "1.3.6.1.4.1.50495.1.2.1.8.3"
+ severity: "warning"
+ type: "ceph_default"
+ - alert: "CephNodeNetworkBondDegraded"
+ annotations:
+ description: "Bond {{ $labels.master }} is degraded on Node {{ $labels.instance }}."
+ summary: "Degraded Bond on Node {{ $labels.instance }}"
 expr: |
- predict_linear(node_filesystem_free_bytes{device=~"/.*"}[2d], 3600 LICENSE README.md Taskfile.yml cluster default docs hack mkdocs.yml scripts talos 24 LICENSE README.md Taskfile.yml cluster default docs hack mkdocs.yml scripts talos 5) *
- on(instance) group_left(nodename) node_uname_info < 0
- labels:
- severity: warning
- type: ceph_default
- oid: 1.3.6.1.4.1.50495.1.2.1.8.4
- annotations:
- summary: Host filesystem free space is low
- description: >
- Mountpoint {{ $labels.mountpoint }} on {{ $labels.nodename }} will be full in less than 5 days based on the 48 hour trailing fill rate.
-
- - alert: CephNodeInconsistentMTU
- expr: node_network_mtu_bytes{device!="lo"} LICENSE README.md Taskfile.yml cluster default docs hack mkdocs.yml scripts talos (node_network_up{device!="lo"} > 0) != on() group_left() (quantile(0.5, node_network_mtu_bytes{device!="lo"}))
+ node_bonding_slaves - node_bonding_active != 0
 labels:
- severity: warning
- type: ceph_default
+ severity: "warning"
+ type: "ceph_default"
+ - alert: "CephNodeDiskspaceWarning"
+ annotations:
+ description: "Mountpoint {{ $labels.mountpoint }} on {{ $labels.nodename }} will be full in less than 5 days based on the 48 hour trailing fill rate."
+ summary: "Host filesystem free space is getting low"
+ expr: "predict_linear(node_filesystem_free_bytes{device=~\"/.*\"}[2d], 3600 LICENSE README.md Taskfile.yml cluster default docs hack mkdocs.yml scripts talos 24 LICENSE README.md Taskfile.yml cluster default docs hack mkdocs.yml scripts talos 5) *on(instance) group_left(nodename) node_uname_info < 0"
+ labels:
+ oid: "1.3.6.1.4.1.50495.1.2.1.8.4"
+ severity: "warning"
+ type: "ceph_default"
+ - alert: "CephNodeInconsistentMTU"
+ annotations:
+ description: "Node {{ $labels.instance }} has a different MTU size ({{ $value }}) than the median of devices named {{ $labels.device }}."
+ summary: "MTU settings across Ceph hosts are inconsistent"
+ expr: "node_network_mtu_bytes LICENSE README.md Taskfile.yml cluster default docs hack mkdocs.yml scripts talos (node_network_up{device!=\"lo\"} > 0) == scalar( max by (device) (node_network_mtu_bytes LICENSE README.md Taskfile.yml cluster default docs hack mkdocs.yml scripts talos (node_network_up{device!=\"lo\"} > 0)) != quantile by (device) (.5, node_network_mtu_bytes LICENSE README.md Taskfile.yml cluster default docs hack mkdocs.yml scripts talos (node_network_up{device!=\"lo\"} > 0)) )or node_network_mtu_bytes LICENSE README.md Taskfile.yml cluster default docs hack mkdocs.yml scripts talos (node_network_up{device!=\"lo\"} > 0) == scalar( min by (device) (node_network_mtu_bytes LICENSE README.md Taskfile.yml cluster default docs hack mkdocs.yml scripts talos (node_network_up{device!=\"lo\"} > 0)) != quantile by (device) (.5, node_network_mtu_bytes LICENSE README.md Taskfile.yml cluster default docs hack mkdocs.yml scripts talos (node_network_up{device!=\"lo\"} > 0)) )"
+ labels:
+ severity: "warning"
+ type: "ceph_default"
+ - name: "pools"
+ rules:
+ - alert: "CephPoolGrowthWarning"
 annotations:
- summary: MTU settings across hosts are inconsistent
- description: >
- Node {{ $labels.instance }} has a different MTU size ({{ $value }}) than the median value on device {{ $labels.device }}.
-
- - name: pools
+ description: "Pool '{{ $labels.name }}' will be full in less than 5 days assuming the average fill-up rate of the past 48 hours."
+ summary: "Pool growth rate may soon exceed capacity"
+ expr: "(predict_linear(ceph_pool_percent_used[2d], 3600 LICENSE README.md Taskfile.yml cluster default docs hack mkdocs.yml scripts talos 24 LICENSE README.md Taskfile.yml cluster default docs hack mkdocs.yml scripts talos 5) LICENSE README.md Taskfile.yml cluster default docs hack mkdocs.yml scripts talos on(pool_id, instance, pod) group_right() ceph_pool_metadata) >= 95"
+ labels:
+ oid: "1.3.6.1.4.1.50495.1.2.1.9.2"
+ severity: "warning"
+ type: "ceph_default"
+ - alert: "CephPoolBackfillFull"
+ annotations:
+ description: "A pool is approaching the near full threshold, which will prevent recovery/backfill operations from completing. Consider adding more capacity."
+ summary: "Free space in a pool is too low for recovery/backfill"
+ expr: "ceph_health_detail{name=\"POOL_BACKFILLFULL\"} > 0"
+ labels:
+ severity: "warning"
+ type: "ceph_default"
+ - alert: "CephPoolFull"
+ annotations:
+ description: "A pool has reached its MAX quota, or OSDs supporting the pool have reached the FULL threshold. Until this is resolved, writes to the pool will be blocked. Pool Breakdown (top 5) {{- range query \"topk(5, sort_desc(ceph_pool_percent_used LICENSE README.md Taskfile.yml cluster default docs hack mkdocs.yml scripts talos on(pool_id) group_right ceph_pool_metadata))\" }} - {{ .Labels.name }} at {{ .Value }}% {{- end }} Increase the pool's quota, or add capacity to the cluster first then increase the pool's quota (e.g. ceph osd pool set quota <pool_name> max_bytes <bytes>)"
+ documentation: "https://docs.ceph.com/en/latest/rados/operations/health-checks#pool-full"
+ summary: "Pool is full - writes are blocked"
+ expr: "ceph_health_detail{name=\"POOL_FULL\"} > 0"
+ for: "1m"
+ labels:
+ oid: "1.3.6.1.4.1.50495.1.2.1.9.1"
+ severity: "critical"
+ type: "ceph_default"
+ - alert: "CephPoolNearFull"
+ annotations:
+ description: "A pool has exceeded the warning (percent full) threshold, or OSDs supporting the pool have reached the NEARFULL threshold. Writes may continue, but you are at risk of the pool going read-only if more capacity isn't made available. Determine the affected pool with 'ceph df detail', looking at QUOTA BYTES and STORED. Increase the pool's quota, or add capacity to the cluster first then increase the pool's quota (e.g. ceph osd pool set quota <pool_name> max_bytes <bytes>). Also ensure that the balancer is active."
+ summary: "One or more Ceph pools are nearly full"
+ expr: "ceph_health_detail{name=\"POOL_NEAR_FULL\"} > 0"
+ for: "5m"
+ labels:
+ severity: "warning"
+ type: "ceph_default"
+ - name: "healthchecks"
 rules:
- - alert: CephPoolGrowthWarning
- expr: |
- (predict_linear((max(ceph_pool_percent_used) without (pod, instance))[2d:1h], 3600 LICENSE README.md Taskfile.yml cluster default docs hack mkdocs.yml scripts talos 24 LICENSE README.md Taskfile.yml cluster default docs hack mkdocs.yml scripts talos 5) LICENSE README.md Taskfile.yml cluster default docs hack mkdocs.yml scripts talos on(pool_id)
- group_right ceph_pool_metadata) >= 95
- labels:
- severity: warning
- type: ceph_default
- oid: 1.3.6.1.4.1.50495.1.2.1.9.2
- annotations:
- summary: Pool growth rate may soon exceed capacity
- description: >
- Pool '{{ $labels.name }}' will be full in less than 5 days assuming the average fill-up rate of the past 48 hours.
-
- - alert: CephPoolBackfillFull
- expr: ceph_health_detail{name="POOL_BACKFILLFULL"} > 0
- labels:
- severity: warning
- type: ceph_default
+ - alert: "CephSlowOps"
 annotations:
- summary: Free space in a pool is too low for recovery/backfill
- description: >
- A pool is approaching the near full threshold, which will prevent recovery/backfill from completing. Consider adding more capacity.
-
- - alert: CephPoolFull
- expr: ceph_health_detail{name="POOL_FULL"} > 0
- for: 1m
- labels:
- severity: critical
- type: ceph_default
- oid: 1.3.6.1.4.1.50495.1.2.1.9.1
+ description: "{{ $value }} OSD requests are taking too long to process (osd_op_complaint_time exceeded)"
+ documentation: "https://docs.ceph.com/en/latest/rados/operations/health-checks#slow-ops"
+ summary: "OSD operations are slow to complete"
+ expr: "ceph_healthcheck_slow_ops > 0"
+ for: "30s"
+ labels:
+ severity: "warning"
+ type: "ceph_default"
+ - alert: "CephDaemonSlowOps"
+ annotations:
+ description: "{{ $labels.ceph_daemon }} operations are taking too long to process (complaint time exceeded)"
+ documentation: "https://docs.ceph.com/en/latest/rados/operations/health-checks#slow-ops"
+ summary: "{{ $labels.ceph_daemon }} operations are slow to complete"
+ expr: "ceph_daemon_health_metrics{type=\"SLOW_OPS\"} > 0"
+ for: "30s"
+ labels:
+ severity: "warning"
+ type: "ceph_default"
+ - name: "hardware"
+ rules:
+ - alert: "HardwareStorageError"
 annotations:
- documentation: https://docs.ceph.com/en/latest/rados/operations/health-checks#pool-full
- summary: Pool is full - writes are blocked
- description: |
- A pool has reached its MAX quota, or OSDs supporting the pool
- have reached the FULL threshold. Until this is resolved, writes to
- the pool will be blocked.
- Pool Breakdown (top 5)
- {{- range query "topk(5, sort_desc(ceph_pool_percent_used LICENSE README.md Taskfile.yml cluster default docs hack mkdocs.yml scripts talos on(pool_id) group_right ceph_pool_metadata))" }}
- - {{ .Labels.name }} at {{ .Value }}%
- {{- end }}
- Increase the pool's quota, or add capacity to the cluster
- then increase the pool's quota (e.g. ceph osd pool set quota <pool_name> max_bytes <bytes>)
- - alert: CephPoolNearFull
- expr: ceph_health_detail{name="POOL_NEAR_FULL"} > 0
- for: 5m
- labels:
- severity: warning
- type: ceph_default
+ description: "Some storage devices are in error. Check `ceph health detail`."
+ summary: "Storage devices error(s) detected"
+ expr: "ceph_health_detail{name=\"HARDWARE_STORAGE\"} > 0"
+ for: "30s"
+ labels:
+ oid: "1.3.6.1.4.1.50495.1.2.1.13.1"
+ severity: "critical"
+ type: "ceph_default"
+ - alert: "HardwareMemoryError"
+ annotations:
+ description: "DIMM error(s) detected. Check `ceph health detail`."
+ summary: "DIMM error(s) detected"
+ expr: "ceph_health_detail{name=\"HARDWARE_MEMORY\"} > 0"
+ for: "30s"
+ labels:
+ oid: "1.3.6.1.4.1.50495.1.2.1.13.2"
+ severity: "critical"
+ type: "ceph_default"
+ - alert: "HardwareProcessorError"
+ annotations:
+ description: "Processor error(s) detected. Check `ceph health detail`."
+ summary: "Processor error(s) detected"
+ expr: "ceph_health_detail{name=\"HARDWARE_PROCESSOR\"} > 0"
+ for: "30s"
+ labels:
+ oid: "1.3.6.1.4.1.50495.1.2.1.13.3"
+ severity: "critical"
+ type: "ceph_default"
+ - alert: "HardwareNetworkError"
+ annotations:
+ description: "Network error(s) detected. Check `ceph health detail`."
+ summary: "Network error(s) detected"
+ expr: "ceph_health_detail{name=\"HARDWARE_NETWORK\"} > 0"
+ for: "30s"
+ labels:
+ oid: "1.3.6.1.4.1.50495.1.2.1.13.4"
+ severity: "critical"
+ type: "ceph_default"
+ - alert: "HardwarePowerError"
+ annotations:
+ description: "Power supply error(s) detected. Check `ceph health detail`."
+ summary: "Power supply error(s) detected"
+ expr: "ceph_health_detail{name=\"HARDWARE_POWER\"} > 0"
+ for: "30s"
+ labels:
+ oid: "1.3.6.1.4.1.50495.1.2.1.13.5"
+ severity: "critical"
+ type: "ceph_default"
+ - alert: "HardwareFanError"
+ annotations:
+ description: "Fan error(s) detected. Check `ceph health detail`."
+ summary: "Fan error(s) detected"
+ expr: "ceph_health_detail{name=\"HARDWARE_FANS\"} > 0"
+ for: "30s"
+ labels:
+ oid: "1.3.6.1.4.1.50495.1.2.1.13.6"
+ severity: "critical"
+ type: "ceph_default"
+ - name: "PrometheusServer"
+ rules:
+ - alert: "PrometheusJobMissing"
 annotations:
- summary: One or more Ceph pools are nearly full
- description: |
- A pool has exceeded the warning (percent full) threshold, or OSDs
- supporting the pool have reached the NEARFULL threshold. Writes may
- continue, but you are at risk of the pool going read-only if more capacity
- isn't made available.
-
- Determine the affected pool with 'ceph df detail', looking
- at QUOTA BYTES and STORED. Increase the pool's quota, or add
- capacity to the cluster then increase the pool's quota
- (e.g. ceph osd pool set quota <pool_name> max_bytes <bytes>).
- Also ensure that the balancer is active.
- - name: healthchecks
+ description: "The prometheus job that scrapes from Ceph MGR is no longer defined, this will effectively mean you'll have no metrics or alerts for the cluster. Please review the job definitions in the prometheus.yml file of the prometheus instance."
+ summary: "The scrape job for Ceph MGR is missing from Prometheus"
+ expr: "absent(up{job=\"rook-ceph-mgr\"})"
+ for: "30s"
+ labels:
+ oid: "1.3.6.1.4.1.50495.1.2.1.12.1"
+ severity: "critical"
+ type: "ceph_default"
+ - alert: "PrometheusJobExporterMissing"
+ annotations:
+ description: "The prometheus job that scrapes from Ceph Exporter is no longer defined, this will effectively mean you'll have no metrics or alerts for the cluster. Please review the job definitions in the prometheus.yml file of the prometheus instance."
+ summary: "The scrape job for Ceph Exporter is missing from Prometheus"
+ expr: "sum(absent(up{job=\"rook-ceph-exporter\"})) and sum(ceph_osd_metadata{ceph_version=~\"^ceph version (1[89]|[2-9][0-9]).*\"}) > 0"
+ for: "30s"
+ labels:
+ oid: "1.3.6.1.4.1.50495.1.2.1.12.1"
+ severity: "critical"
+ type: "ceph_default"
+ - name: "rados"
 rules:
- - alert: CephSlowOps
- expr: ceph_healthcheck_slow_ops > 0
- for: 30s
- labels:
- severity: warning
- type: ceph_default
- annotations:
- documentation: https://docs.ceph.com/en/latest/rados/operations/health-checks#slow-ops
- summary: OSD operations are slow to complete
- description: >
- {{ $value }} OSD requests are taking too long to process (osd_op_complaint_time exceeded)
-
- # Object related events
- - name: rados
+ - alert: "CephObjectMissing"
+ annotations:
+ description: "The latest version of a RADOS object can not be found, even though all OSDs are up. I/O requests for this object from clients will block (hang). Resolving this issue may require the object to be rolled back to a prior version manually, and manually verified."
+ documentation: "https://docs.ceph.com/en/latest/rados/operations/health-checks#object-unfound"
+ summary: "Object(s) marked UNFOUND"
+ expr: "(ceph_health_detail{name=\"OBJECT_UNFOUND\"} == 1) LICENSE README.md Taskfile.yml cluster default docs hack mkdocs.yml scripts talos on() (count(ceph_osd_up == 1) == bool count(ceph_osd_metadata)) == 1"
+ for: "30s"
+ labels:
+ oid: "1.3.6.1.4.1.50495.1.2.1.10.1"
+ severity: "critical"
+ type: "ceph_default"
+ - name: "generic"
 rules:
- - alert: CephObjectMissing
- expr: (ceph_health_detail{name="OBJECT_UNFOUND"} == 1) LICENSE README.md Taskfile.yml cluster default docs hack mkdocs.yml scripts talos on() (count(ceph_osd_up == 1) == bool count(ceph_osd_metadata)) == 1
- for: 30s
- labels:
- severity: critical
- type: ceph_default
- oid: 1.3.6.1.4.1.50495.1.2.1.10.1
+ - alert: "CephDaemonCrash"
 annotations:
- documentation: https://docs.ceph.com/en/latest/rados/operations/health-checks#object-unfound
- summary: Object(s) marked UNFOUND
- description: |
- The latest version of a RADOS object can not be found, even though all OSDs are up. I/O
- requests for this object from clients will block (hang). Resolving this issue may
- require the object to be rolled back to a prior version manually, and manually verified.
- # Generic
- - name: generic
+ description: "One or more daemons have crashed recently, and need to be acknowledged. This notification ensures that software crashes do not go unseen. To acknowledge a crash, use the 'ceph crash archive <id>' command."
+ documentation: "https://docs.ceph.com/en/latest/rados/operations/health-checks/#recent-crash"
+ summary: "One or more Ceph daemons have crashed, and are pending acknowledgement"
+ expr: "ceph_health_detail{name=\"RECENT_CRASH\"} == 1"
+ for: "1m"
+ labels:
+ oid: "1.3.6.1.4.1.50495.1.2.1.1.2"
+ severity: "critical"
+ type: "ceph_default"
+ - name: "rbdmirror"
 rules:
- - alert: CephDaemonCrash
- expr: ceph_health_detail{name="RECENT_CRASH"} == 1
- for: 1m
- labels:
- severity: critical
- type: ceph_default
- oid: 1.3.6.1.4.1.50495.1.2.1.1.2
+ - alert: "CephRBDMirrorImagesPerDaemonHigh"
 annotations:
- documentation: https://docs.ceph.com/en/latest/rados/operations/health-checks/#recent-crash
- summary: One or more Ceph daemons have crashed, and are pending acknowledgement
- description: |
- One or more daemons have crashed recently, and need to be acknowledged. This notification
- ensures that software crashes do not go unseen. To acknowledge a crash, use the
- 'ceph crash archive <id>' command.
+ description: "Number of image replications per daemon is not supposed to go beyond threshold 100"
+ summary: "Number of image replications are now above 100"
+ expr: "sum by (ceph_daemon, namespace) (ceph_rbd_mirror_snapshot_image_snapshots) > 100"
+ for: "1m"
+ labels:
+ oid: "1.3.6.1.4.1.50495.1.2.1.10.2"
+ severity: "critical"
+ type: "ceph_default"
+ - alert: "CephRBDMirrorImagesNotInSync"
+ annotations:
+ description: "Both local and remote RBD mirror images should be in sync."
+ summary: "Some of the RBD mirror images are not in sync with the remote counter parts."
+ expr: "sum by (ceph_daemon, image, namespace, pool) (topk by (ceph_daemon, image, namespace, pool) (1, ceph_rbd_mirror_snapshot_image_local_timestamp) - topk by (ceph_daemon, image, namespace, pool) (1, ceph_rbd_mirror_snapshot_image_remote_timestamp)) != 0"
+ for: "1m"
+ labels:
+ oid: "1.3.6.1.4.1.50495.1.2.1.10.3"
+ severity: "critical"
+ type: "ceph_default"
+ - alert: "CephRBDMirrorImagesNotInSyncVeryHigh"
+ annotations:
+ description: "More than 10% of the images have synchronization problems"
+ summary: "Number of unsynchronized images are very high."
+ expr: "count by (ceph_daemon) ((topk by (ceph_daemon, image, namespace, pool) (1, ceph_rbd_mirror_snapshot_image_local_timestamp) - topk by (ceph_daemon, image, namespace, pool) (1, ceph_rbd_mirror_snapshot_image_remote_timestamp)) != 0) > (sum by (ceph_daemon) (ceph_rbd_mirror_snapshot_snapshots)*.1)"
+ for: "1m"
+ labels:
+ oid: "1.3.6.1.4.1.50495.1.2.1.10.4"
+ severity: "critical"
+ type: "ceph_default"
+ - alert: "CephRBDMirrorImageTransferBandwidthHigh"
+ annotations:
+ description: "Detected a heavy increase in bandwidth for rbd replications (over 80%) in the last 30 min. This might not be a problem, but it is good to review the number of images being replicated simultaneously"
+ summary: "The replication network usage has been increased over 80% in the last 30 minutes. Review the number of images being replicated. This alert will be cleaned automatically after 30 minutes"
+ expr: "rate(ceph_rbd_mirror_journal_replay_bytes[30m]) > 0.80"
+ for: "1m"
+ labels:
+ oid: "1.3.6.1.4.1.50495.1.2.1.10.5"
+ severity: "warning"
+ type: "ceph_default"
+ - name: "nvmeof"
+ rules:
+ - alert: "NVMeoFSubsystemNamespaceLimit"
+ annotations:
+ description: "Subsystems have a max namespace limit defined at creation time. This alert means that no more namespaces can be added to {{ $labels.nqn }}"
+ summary: "{{ $labels.nqn }} subsystem has reached its maximum number of namespaces "
+ expr: "(count by(nqn) (ceph_nvmeof_subsystem_namespace_metadata)) >= ceph_nvmeof_subsystem_namespace_limit"
+ for: "1m"
+ labels:
+ severity: "warning"
+ type: "ceph_default"
+ - alert: "NVMeoFTooManyGateways"
+ annotations:
+ description: "You may create many gateways, but 4 is the tested limit"
+ summary: "Max supported gateways exceeded "
+ expr: "count(ceph_nvmeof_gateway_info) > 4.00"
+ for: "1m"
+ labels:
+ severity: "warning"
+ type: "ceph_default"
+ - alert: "NVMeoFMaxGatewayGroupSize"
+ annotations:
+ description: "You may create many gateways in a gateway group, but 2 is the tested limit"
+ summary: "Max gateways within a gateway group ({{ $labels.group }}) exceeded "
+ expr: "count by(group) (ceph_nvmeof_gateway_info) > 2.00"
+ for: "1m"
+ labels:
+ severity: "warning"
+ type: "ceph_default"
+ - alert: "NVMeoFSingleGatewayGroup"
+ annotations:
+ description: "Although a single member gateway group is valid, it should only be used for test purposes"
+ summary: "The gateway group {{ $labels.group }} consists of a single gateway - HA is not possible "
+ expr: "count by(group) (ceph_nvmeof_gateway_info) == 1"
+ for: "5m"
+ labels:
+ severity: "warning"
+ type: "ceph_default"
+ - alert: "NVMeoFHighGatewayCPU"
+ annotations:
+ description: "Typically, high CPU may indicate degraded performance. Consider increasing the number of reactor cores"
+ summary: "CPU used by {{ $labels.instance }} NVMe-oF Gateway is high "
+ expr: "label_replace(avg by(instance) (rate(ceph_nvmeof_reactor_seconds_total{mode=\"busy\"}[1m])),\"instance\",\"$1\",\"instance\",\"(.*):.*\") > 80.00"
+ for: "10m"
+ labels:
+ severity: "warning"
+ type: "ceph_default"
+ - alert: "NVMeoFGatewayOpenSecurity"
+ annotations:
+ description: "It is good practice to ensure subsystems use host security to reduce the risk of unexpected data loss"
+ summary: "Subsystem {{ $labels.nqn }} has been defined without host level security "
+ expr: "ceph_nvmeof_subsystem_metadata{allow_any_host=\"yes\"}"
+ for: "5m"
+ labels:
+ severity: "warning"
+ type: "ceph_default"
+ - alert: "NVMeoFTooManySubsystems"
+ annotations:
+ description: "Although you may continue to create subsystems in {{ $labels.gateway_host }}, the configuration may not be supported"
+ summary: "The number of subsystems defined to the gateway exceeds supported values "
+ expr: "count by(gateway_host) (label_replace(ceph_nvmeof_subsystem_metadata,\"gateway_host\",\"$1\",\"instance\",\"(.*):.*\")) > 16.00"
+ for: "1m"
+ labels:
+ severity: "warning"
+ type: "ceph_default"
+ - alert: "NVMeoFVersionMismatch"
+ annotations:
+ description: "This may indicate an issue with deployment. Check cephadm logs"
+ summary: "The cluster has different NVMe-oF gateway releases active "
+ expr: "count(count by(version) (ceph_nvmeof_gateway_info)) > 1"
+ for: "1h"
+ labels:
+ severity: "warning"
+ type: "ceph_default"
+ - alert: "NVMeoFHighClientCount"
+ annotations:
+ description: "The supported limit for clients connecting to a subsystem is 32"
+ summary: "The number of clients connected to {{ $labels.nqn }} is too high "
+ expr: "ceph_nvmeof_subsystem_host_count > 32.00"
+ for: "1m"
+ labels:
+ severity: "warning"
+ type: "ceph_default"
+ - alert: "NVMeoFHighHostCPU"
+ annotations:
+ description: "High CPU on a gateway host can lead to CPU contention and performance degradation"
+ summary: "The CPU is high ({{ $value }}%) on NVMeoF Gateway host ({{ $labels.host }}) "
+ expr: "100-((100*(avg by(host) (label_replace(rate(node_cpu_seconds_total{mode=\"idle\"}[5m]),\"host\",\"$1\",\"instance\",\"(.*):.*\")) LICENSE README.md Taskfile.yml cluster default docs hack mkdocs.yml scripts talos on(host) group_right label_replace(ceph_nvmeof_gateway_info,\"host\",\"$1\",\"instance\",\"(.*):.*\")))) >= 80.00"
+ for: "10m"
+ labels:
+ severity: "warning"
+ type: "ceph_default"
+ - alert: "NVMeoFInterfaceDown"
+ annotations:
+ description: "A NIC used by one or more subsystems is in a down state"
+ summary: "Network interface {{ $labels.device }} is down "
+ expr: "ceph_nvmeof_subsystem_listener_iface_info{operstate=\"down\"}"
+ for: "30s"
+ labels:
+ oid: "1.3.6.1.4.1.50495.1.2.1.14.1"
+ severity: "warning"
+ type: "ceph_default"
+ - alert: "NVMeoFInterfaceDuplex"
+ annotations:
+ description: "Until this is resolved, performance from the gateway will be degraded"
+ summary: "Network interface {{ $labels.device }} is not running in full duplex mode "
+ expr: "ceph_nvmeof_subsystem_listener_iface_info{duplex!=\"full\"}"
+ for: "30s"
+ labels:
+ severity: "warning"
+ type: "ceph_default"
+ - alert: "NVMeoFHighReadLatency"
+ annotations:
+ description: "High latencies may indicate a constraint within the cluster e.g. CPU, network. Please investigate"
+ summary: "The average read latency over the last 5 mins has reached 10 ms or more on {{ $labels.gateway }}"
+ expr: "label_replace((avg by(instance) ((rate(ceph_nvmeof_bdev_read_seconds_total[1m]) / rate(ceph_nvmeof_bdev_reads_completed_total[1m])))),\"gateway\",\"$1\",\"instance\",\"(.*):.*\") > 0.01"
+ for: "5m"
+ labels:
+ severity: "warning"
+ type: "ceph_default"
+ - alert: "NVMeoFHighWriteLatency"
+ annotations:
+ description: "High latencies may indicate a constraint within the cluster e.g. CPU, network. Please investigate"
+ summary: "The average write latency over the last 5 mins has reached 20 ms or more on {{ $labels.gateway }}"
+ expr: "label_replace((avg by(instance) ((rate(ceph_nvmeof_bdev_write_seconds_total[5m]) / rate(ceph_nvmeof_bdev_writes_completed_total[5m])))),\"gateway\",\"$1\",\"instance\",\"(.*):.*\") > 0.02"
+ for: "5m"
+ labels:
+ severity: "warning"
+ type: "ceph_default"
 ---
 
 ---
-apiVersion: snapshot.storage.k8s.io/v1beta1
+apiVersion: snapshot.storage.k8s.io/v1
 kind: VolumeSnapshotClass
 metadata:
 name: csi-rbdplugin-snapclass

@chii-bot
Copy link
Contributor Author

chii-bot bot commented Aug 31, 2022

Path: cluster/core/rook-ceph/operator/helm-release.yaml
Version: v1.9.12 -> v1.16.0

@@ -1,86 +1,4 @@
 ---
-# Source: rook-ceph/templates/psp.yaml
-# We expect most Kubernetes teams to follow the Kubernetes docs and have these PSPs.
-# LICENSE README.md Taskfile.yml cluster default docs hack mkdocs.yml scripts talos privileged (for kube-system namespace)
-# LICENSE README.md Taskfile.yml cluster default docs hack mkdocs.yml scripts talos restricted (for all logged in users)
-#
-# PSPs are applied based on the first match alphabetically. `rook-ceph-operator` comes after
-# `restricted` alphabetically, so we name this `00-rook-privileged`, so it stays somewhere
-# close to the top and so `rook-system` gets the intended PSP. This may need to be renamed in
-# environments with other `00`-prefixed PSPs.
-#
-# More on PSP ordering: https://kubernetes.io/docs/concepts/policy/pod-security-policy/#policy-order
-apiVersion: policy/v1beta1
-kind: PodSecurityPolicy
-metadata:
- name: 00-rook-privileged
- annotations:
- seccomp.security.alpha.kubernetes.io/allowedProfileNames: 'runtime/default'
- seccomp.security.alpha.kubernetes.io/defaultProfileName: 'runtime/default'
-spec:
- privileged: true
- allowedCapabilities:
- # required by CSI
- - SYS_ADMIN
- - MKNOD
- fsGroup:
- rule: RunAsAny
- # runAsUser, supplementalGroups - Rook needs to run some pods as root
- # Ceph pods could be run as the Ceph user, but that user isn't always known ahead of time
- runAsUser:
- rule: RunAsAny
- supplementalGroups:
- rule: RunAsAny
- # seLinux - seLinux context is unknown ahead of time; set if this is well-known
- seLinux:
- rule: RunAsAny
- volumes:
- # recommended minimum set
- - configMap
- - downwardAPI
- - emptyDir
- - persistentVolumeClaim
- - secret
- - projected
- # required for Rook
- - hostPath
- # allowedHostPaths can be set to Rook's known host volume mount points when they are fully-known
- # allowedHostPaths:
- # - pathPrefix: "/run/udev" # for OSD prep
- # readOnly: false
- # - pathPrefix: "/dev" # for OSD prep
- # readOnly: false
- # - pathPrefix: "/var/lib/rook" # or whatever the dataDirHostPath value is set to
- # readOnly: false
- # Ceph requires host IPC for setting up encrypted devices
- hostIPC: true
- # Ceph OSDs need to share the same PID namespace
- hostPID: true
- # hostNetwork can be set to 'false' if host networking isn't used
- hostNetwork: true
- hostPorts:
- # Ceph messenger protocol v1
- - min: 6789
- max: 6790 # <- support old default port
- # Ceph messenger protocol v2
- - min: 3300
- max: 3300
- # Ceph RADOS ports for OSDs, MDSes
- - min: 6800
- max: 7300
- # # Ceph dashboard port HTTP (not recommended)
- # - min: 7000
- # max: 7000
- # Ceph dashboard port HTTPS
- - min: 8443
- max: 8443
- # Ceph mgr Prometheus Metrics
- - min: 9283
- max: 9283
- # port for CSIAddons
- - min: 9070
- max: 9070
----
 # Source: rook-ceph/templates/cluster-rbac.yaml
 # Service account for Ceph OSDs
 apiVersion: v1
@@ -155,6 +73,19 @@
 # imagePullSecrets:
 # - name: my-registry-secret
 ---
+# Source: rook-ceph/templates/cluster-rbac.yaml
+# Service account for other components
+apiVersion: v1
+kind: ServiceAccount
+metadata:
+ name: rook-ceph-default
+ namespace: default # namespace:cluster
+ labels:
+ operator: rook
+ storage-backend: ceph
+# imagePullSecrets:
+# - name: my-registry-secret
+---
 # Source: rook-ceph/templates/serviceaccount.yaml
 # Service account for the Rook-Ceph operator
 apiVersion: v1
@@ -211,6 +142,20 @@
 # imagePullSecrets:
 # - name: my-registry-secret
 ---
+# Source: rook-ceph/templates/serviceaccount.yaml
+# Service account for Ceph COSI driver
+apiVersion: v1
+kind: ServiceAccount
+metadata:
+ name: objectstorage-provisioner
+ namespace: default # namespace:operator
+ labels:
+ app.kubernetes.io/part-of: container-object-storage-interface
+ app.kubernetes.io/component: driver-ceph
+ app.kubernetes.io/name: cosi-driver-ceph
+# imagePullSecrets:
+# - name: my-registry-secret
+---
 # Source: rook-ceph/templates/configmap.yaml
 # Operator settings that can be updated without an operator restart
 # Operator settings that require an operator restart are found in the operator env vars
@@ -218,36 +163,53 @@
 apiVersion: v1
 metadata:
 name: rook-ceph-operator-config
+ namespace: default # namespace:operator
 data:
 ROOK_LOG_LEVEL: "INFO"
 ROOK_CEPH_COMMANDS_TIMEOUT_SECONDS: "15"
 ROOK_OBC_WATCH_OPERATOR_NAMESPACE: "true"
+ ROOK_CEPH_ALLOW_LOOP_DEVICES: "false"
+ ROOK_ENABLE_DISCOVERY_DAEMON: "false"
 ROOK_CSI_ENABLE_RBD: "true"
 ROOK_CSI_ENABLE_CEPHFS: "true"
+ ROOK_CSI_DISABLE_DRIVER: "false"
 CSI_ENABLE_CEPHFS_SNAPSHOTTER: "true"
+ CSI_ENABLE_NFS_SNAPSHOTTER: "true"
 CSI_ENABLE_RBD_SNAPSHOTTER: "true"
 CSI_PLUGIN_ENABLE_SELINUX_HOST_MOUNT: "false"
 CSI_ENABLE_ENCRYPTION: "false"
 CSI_ENABLE_OMAP_GENERATOR: "false"
 CSI_ENABLE_HOST_NETWORK: "true"
+ CSI_ENABLE_METADATA: "false"
+ CSI_ENABLE_VOLUME_GROUP_SNAPSHOT: "true"
 CSI_PLUGIN_PRIORITY_CLASSNAME: "system-node-critical"
 CSI_PROVISIONER_PRIORITY_CLASSNAME: "system-cluster-critical"
- CSI_RBD_FSGROUPPOLICY: "ReadWriteOnceWithFSType"
- CSI_CEPHFS_FSGROUPPOLICY: "ReadWriteOnceWithFSType"
- CSI_NFS_FSGROUPPOLICY: "ReadWriteOnceWithFSType"
- ROOK_CSI_ENABLE_GRPC_METRICS: "false"
- CSI_ENABLE_VOLUME_REPLICATION: "false"
+ CSI_RBD_FSGROUPPOLICY: "File"
+ CSI_CEPHFS_FSGROUPPOLICY: "File"
+ CSI_NFS_FSGROUPPOLICY: "File"
+ ROOK_CSI_CEPH_IMAGE: "quay.io/cephcsi/cephcsi:v3.13.0"
+ ROOK_CSI_REGISTRAR_IMAGE: "registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.11.1"
+ ROOK_CSI_PROVISIONER_IMAGE: "registry.k8s.io/sig-storage/csi-provisioner:v5.0.1"
+ ROOK_CSI_SNAPSHOTTER_IMAGE: "registry.k8s.io/sig-storage/csi-snapshotter:v8.0.1"
+ ROOK_CSI_ATTACHER_IMAGE: "registry.k8s.io/sig-storage/csi-attacher:v4.6.1"
+ ROOK_CSI_RESIZER_IMAGE: "registry.k8s.io/sig-storage/csi-resizer:v1.11.1"
+ ROOK_CSI_IMAGE_PULL_POLICY: "IfNotPresent"
 CSI_ENABLE_CSIADDONS: "false"
+ ROOK_CSIADDONS_IMAGE: "quay.io/csiaddons/k8s-sidecar:v0.11.0"
+ CSI_ENABLE_TOPOLOGY: "false"
 ROOK_CSI_ENABLE_NFS: "false"
 CSI_FORCE_CEPHFS_KERNEL_CLIENT: "true"
 CSI_GRPC_TIMEOUT_SECONDS: "150"
 CSI_PROVISIONER_REPLICAS: "2"
- CSI_RBD_PROVISIONER_RESOURCE: "- name : csi-provisioner\n resource:\n requests:\n memory: 128Mi\n cpu: 100m\n limits:\n memory: 256Mi\n cpu: 200m\n- name : csi-resizer\n resource:\n requests:\n memory: 128Mi\n cpu: 100m\n limits:\n memory: 256Mi\n cpu: 200m\n- name : csi-attacher\n resource:\n requests:\n memory: 128Mi\n cpu: 100m\n limits:\n memory: 256Mi\n cpu: 200m\n- name : csi-snapshotter\n resource:\n requests:\n memory: 128Mi\n cpu: 100m\n limits:\n memory: 256Mi\n cpu: 200m\n- name : csi-rbdplugin\n resource:\n requests:\n memory: 512Mi\n cpu: 250m\n limits:\n memory: 1Gi\n cpu: 500m\n- name : csi-omap-generator\n resource:\n requests:\n memory: 512Mi\n cpu: 250m\n limits:\n memory: 1Gi\n cpu: 500m\n- name : liveness-prometheus\n resource:\n requests:\n memory: 128Mi\n cpu: 50m\n limits:\n memory: 256Mi\n cpu: 100m\n"
- CSI_RBD_PLUGIN_RESOURCE: "- name : driver-registrar\n resource:\n requests:\n memory: 128Mi\n cpu: 50m\n limits:\n memory: 256Mi\n cpu: 100m\n- name : csi-rbdplugin\n resource:\n requests:\n memory: 512Mi\n cpu: 250m\n limits:\n memory: 1Gi\n cpu: 500m\n- name : liveness-prometheus\n resource:\n requests:\n memory: 128Mi\n cpu: 50m\n limits:\n memory: 256Mi\n cpu: 100m\n"
- CSI_CEPHFS_PROVISIONER_RESOURCE: "- name : csi-provisioner\n resource:\n requests:\n memory: 128Mi\n cpu: 100m\n limits:\n memory: 256Mi\n cpu: 200m\n- name : csi-resizer\n resource:\n requests:\n memory: 128Mi\n cpu: 100m\n limits:\n memory: 256Mi\n cpu: 200m\n- name : csi-attacher\n resource:\n requests:\n memory: 128Mi\n cpu: 100m\n limits:\n memory: 256Mi\n cpu: 200m\n- name : csi-snapshotter\n resource:\n requests:\n memory: 128Mi\n cpu: 100m\n limits:\n memory: 256Mi\n cpu: 200m\n- name : csi-cephfsplugin\n resource:\n requests:\n memory: 512Mi\n cpu: 250m\n limits:\n memory: 1Gi\n cpu: 500m\n- name : liveness-prometheus\n resource:\n requests:\n memory: 128Mi\n cpu: 50m\n limits:\n memory: 256Mi\n cpu: 100m\n"
- CSI_CEPHFS_PLUGIN_RESOURCE: "- name : driver-registrar\n resource:\n requests:\n memory: 128Mi\n cpu: 50m\n limits:\n memory: 256Mi\n cpu: 100m\n- name : csi-cephfsplugin\n resource:\n requests:\n memory: 512Mi\n cpu: 250m\n limits:\n memory: 1Gi\n cpu: 500m\n- name : liveness-prometheus\n resource:\n requests:\n memory: 128Mi\n cpu: 50m\n limits:\n memory: 256Mi\n cpu: 100m\n"
- CSI_NFS_PROVISIONER_RESOURCE: "- name : csi-provisioner\n resource:\n requests:\n memory: 128Mi\n cpu: 100m\n limits:\n memory: 256Mi\n cpu: 200m\n- name : csi-nfsplugin\n resource:\n requests:\n memory: 512Mi\n cpu: 250m\n limits:\n memory: 1Gi\n cpu: 500m\n"
- CSI_NFS_PLUGIN_RESOURCE: "- name : driver-registrar\n resource:\n requests:\n memory: 128Mi\n cpu: 50m\n limits:\n memory: 256Mi\n cpu: 100m\n- name : csi-nfsplugin\n resource:\n requests:\n memory: 512Mi\n cpu: 250m\n limits:\n memory: 1Gi\n cpu: 500m\n"
+ CSI_RBD_PROVISIONER_RESOURCE: "- name : csi-provisioner\n resource:\n requests:\n memory: 128Mi\n cpu: 100m\n limits:\n memory: 256Mi\n- name : csi-resizer\n resource:\n requests:\n memory: 128Mi\n cpu: 100m\n limits:\n memory: 256Mi\n- name : csi-attacher\n resource:\n requests:\n memory: 128Mi\n cpu: 100m\n limits:\n memory: 256Mi\n- name : csi-snapshotter\n resource:\n requests:\n memory: 128Mi\n cpu: 100m\n limits:\n memory: 256Mi\n- name : csi-rbdplugin\n resource:\n requests:\n memory: 512Mi\n limits:\n memory: 1Gi\n- name : csi-omap-generator\n resource:\n requests:\n memory: 512Mi\n cpu: 250m\n limits:\n memory: 1Gi\n- name : liveness-prometheus\n resource:\n requests:\n memory: 128Mi\n cpu: 50m\n limits:\n memory: 256Mi\n"
+ CSI_RBD_PLUGIN_RESOURCE: "- name : driver-registrar\n resource:\n requests:\n memory: 128Mi\n cpu: 50m\n limits:\n memory: 256Mi\n- name : csi-rbdplugin\n resource:\n requests:\n memory: 512Mi\n cpu: 250m\n limits:\n memory: 1Gi\n- name : liveness-prometheus\n resource:\n requests:\n memory: 128Mi\n cpu: 50m\n limits:\n memory: 256Mi\n"
+ CSI_CEPHFS_PROVISIONER_RESOURCE: "- name : csi-provisioner\n resource:\n requests:\n memory: 128Mi\n cpu: 100m\n limits:\n memory: 256Mi\n- name : csi-resizer\n resource:\n requests:\n memory: 128Mi\n cpu: 100m\n limits:\n memory: 256Mi\n- name : csi-attacher\n resource:\n requests:\n memory: 128Mi\n cpu: 100m\n limits:\n memory: 256Mi\n- name : csi-snapshotter\n resource:\n requests:\n memory: 128Mi\n cpu: 100m\n limits:\n memory: 256Mi\n- name : csi-cephfsplugin\n resource:\n requests:\n memory: 512Mi\n cpu: 250m\n limits:\n memory: 1Gi\n- name : liveness-prometheus\n resource:\n requests:\n memory: 128Mi\n cpu: 50m\n limits:\n memory: 256Mi\n"
+ CSI_CEPHFS_PLUGIN_RESOURCE: "- name : driver-registrar\n resource:\n requests:\n memory: 128Mi\n cpu: 50m\n limits:\n memory: 256Mi\n- name : csi-cephfsplugin\n resource:\n requests:\n memory: 512Mi\n cpu: 250m\n limits:\n memory: 1Gi\n- name : liveness-prometheus\n resource:\n requests:\n memory: 128Mi\n cpu: 50m\n limits:\n memory: 256Mi\n"
+ CSI_NFS_PROVISIONER_RESOURCE: "- name : csi-provisioner\n resource:\n requests:\n memory: 128Mi\n cpu: 100m\n limits:\n memory: 256Mi\n- name : csi-nfsplugin\n resource:\n requests:\n memory: 512Mi\n cpu: 250m\n limits:\n memory: 1Gi\n- name : csi-attacher\n resource:\n requests:\n memory: 512Mi\n cpu: 250m\n limits:\n memory: 1Gi\n"
+ CSI_NFS_PLUGIN_RESOURCE: "- name : driver-registrar\n resource:\n requests:\n memory: 128Mi\n cpu: 50m\n limits:\n memory: 256Mi\n- name : csi-nfsplugin\n resource:\n requests:\n memory: 512Mi\n cpu: 250m\n limits:\n memory: 1Gi\n"
+ CSI_CEPHFS_ATTACH_REQUIRED: "true"
+ CSI_RBD_ATTACH_REQUIRED: "true"
+ CSI_NFS_ATTACH_REQUIRED: "true"
 ---
 # Source: rook-ceph/templates/clusterrole.yaml
 kind: ClusterRole
@@ -271,9 +233,24 @@
 - apiGroups: [""]
 resources: ["pods/exec"]
 verbs: ["create"]
- - apiGroups: ["admissionregistration.k8s.io"]
- resources: ["validatingwebhookconfigurations"]
- verbs: ["create", "get", "delete", "update"]
+ - apiGroups: ["csiaddons.openshift.io"]
+ resources: ["networkfences"]
+ verbs: ["create", "get", "update", "delete", "watch", "list", "deletecollection"]
+ - apiGroups: ["apiextensions.k8s.io"]
+ resources: ["customresourcedefinitions"]
+ verbs: ["get"]
+ - apiGroups: ["csi.ceph.io"]
+ resources: ["cephconnections"]
+ verbs: ["create", "delete", "get", "list", "update", "watch"]
+ - apiGroups: ["csi.ceph.io"]
+ resources: ["clientprofiles"]
+ verbs: ["create", "delete", "get", "list", "update", "watch"]
+ - apiGroups: ["csi.ceph.io"]
+ resources: ["operatorconfigs"]
+ verbs: ["create", "delete", "get", "list", "update", "watch"]
+ - apiGroups: ["csi.ceph.io"]
+ resources: ["drivers"]
+ verbs: ["create", "delete", "get", "list", "update", "watch"]
 ---
 # Source: rook-ceph/templates/clusterrole.yaml
 # The cluster role for managing all the cluster-specific resources in a namespace
@@ -332,9 +309,8 @@
 # Node access is needed for determining nodes where mons should run
 - nodes
 - nodes/proxy
- - services
 # Rook watches secrets which it uses to configure access to external resources.
- # e.g., external Ceph cluster; TLS certificates for the admission controller or object store
+ # e.g., external Ceph cluster or object store
 - secrets
 # Rook watches for changes to the rook-operator-config configmap
 - configmaps
@@ -352,6 +328,7 @@
 - persistentvolumeclaims
 # Rook creates endpoints for mgr and object store access
 - endpoints
+ - services
 verbs:
 - get
 - list
@@ -380,6 +357,7 @@
 - create
 - update
 - delete
+ - deletecollection
 # The Rook operator must be able to watch all ceph.rook.io resources to reconcile them.
 - apiGroups: ["ceph.rook.io"]
 resources:
@@ -399,6 +377,7 @@
 - cephfilesystemmirrors
 - cephfilesystemsubvolumegroups
 - cephblockpoolradosnamespaces
+ - cephcosidrivers
 verbs:
 - get
 - list
@@ -467,6 +446,14 @@
 - delete
 - deletecollection
 - apiGroups:
+ - apps
+ resources:
+ # This is to add osd deployment owner ref on key rotation
+ # cron jobs.
+ - deployments/finalizers
+ verbs:
+ - update
+ - apiGroups:
 - healthchecking.openshift.io
 resources:
 - machinedisruptionbudgets
@@ -651,19 +638,19 @@
 rules:
 - apiGroups: [""]
 resources: ["nodes"]
- verbs: ["get", "list", "watch"]
- - apiGroups: [""]
- resources: ["namespaces"]
- verbs: ["get", "list"]
+ verbs: ["get"]
 - apiGroups: [""]
- resources: ["persistentvolumes"]
- verbs: ["get", "list", "watch", "update"]
- - apiGroups: ["storage.k8s.io"]
- resources: ["volumeattachments"]
- verbs: ["get", "list", "watch", "update"]
+ resources: ["secrets"]
+ verbs: ["get"]
 - apiGroups: [""]
 resources: ["configmaps"]
- verbs: ["get", "list"]
+ verbs: ["get"]
+ - apiGroups: [""]
+ resources: ["serviceaccounts"]
+ verbs: ["get"]
+ - apiGroups: [""]
+ resources: ["serviceaccounts/token"]
+ verbs: ["create"]
 ---
 # Source: rook-ceph/templates/clusterrole.yaml
 kind: ClusterRole
@@ -675,11 +662,20 @@
 resources: ["secrets"]
 verbs: ["get", "list"]
 - apiGroups: [""]
+ resources: ["configmaps"]
+ verbs: ["get"]
+ - apiGroups: [""]
+ resources: ["nodes"]
+ verbs: ["get", "list", "watch"]
+ - apiGroups: ["storage.k8s.io"]
+ resources: ["csinodes"]
+ verbs: ["get", "list", "watch"]
+ - apiGroups: [""]
 resources: ["persistentvolumes"]
- verbs: ["get", "list", "watch", "create", "delete", "update", "patch"]
+ verbs: ["get", "list", "watch", "create", "update", "delete", "patch"]
 - apiGroups: [""]
 resources: ["persistentvolumeclaims"]
- verbs: ["get", "list", "watch", "update"]
+ verbs: ["get", "list", "watch", "patch", "update"]
 - apiGroups: ["storage.k8s.io"]
 resources: ["storageclasses"]
 verbs: ["get", "list", "watch"]
@@ -688,31 +684,40 @@
 verbs: ["list", "watch", "create", "update", "patch"]
 - apiGroups: ["storage.k8s.io"]
 resources: ["volumeattachments"]
- verbs: ["get", "list", "watch", "update", "patch"]
+ verbs: ["get", "list", "watch", "patch"]
 - apiGroups: ["storage.k8s.io"]
 resources: ["volumeattachments/status"]
 verbs: ["patch"]
 - apiGroups: [""]
- resources: ["nodes"]
- verbs: ["get", "list", "watch"]
- - apiGroups: [""]
 resources: ["persistentvolumeclaims/status"]
- verbs: ["update", "patch"]
+ verbs: ["patch"]
 - apiGroups: ["snapshot.storage.k8s.io"]
 resources: ["volumesnapshots"]
- verbs: ["get", "list", "watch", "update", "patch"]
- - apiGroups: ["snapshot.storage.k8s.io"]
- resources: ["volumesnapshotcontents"]
- verbs: ["create", "get", "list", "watch", "update", "delete", "patch"]
+ verbs: ["get", "list", "watch", "update", "patch", "create"]
 - apiGroups: ["snapshot.storage.k8s.io"]
 resources: ["volumesnapshotclasses"]
 verbs: ["get", "list", "watch"]
 - apiGroups: ["snapshot.storage.k8s.io"]
+ resources: ["volumesnapshotcontents"]
+ verbs: ["get", "list", "watch", "patch", "update", "create"]
+ - apiGroups: ["snapshot.storage.k8s.io"]
 resources: ["volumesnapshotcontents/status"]
 verbs: ["update", "patch"]
- - apiGroups: ["snapshot.storage.k8s.io"]
- resources: ["volumesnapshots/status"]
+ - apiGroups: ["groupsnapshot.storage.k8s.io"]
+ resources: ["volumegroupsnapshotclasses"]
+ verbs: ["get", "list", "watch"]
+ - apiGroups: ["groupsnapshot.storage.k8s.io"]
+ resources: ["volumegroupsnapshotcontents"]
+ verbs: ["get", "list", "watch", "update", "patch"]
+ - apiGroups: ["groupsnapshot.storage.k8s.io"]
+ resources: ["volumegroupsnapshotcontents/status"]
 verbs: ["update", "patch"]
+ - apiGroups: [""]
+ resources: ["serviceaccounts"]
+ verbs: ["get"]
+ - apiGroups: [""]
+ resources: ["serviceaccounts/token"]
+ verbs: ["create"]
 ---
 # Source: rook-ceph/templates/clusterrole.yaml
 kind: ClusterRole
@@ -730,26 +735,23 @@
 resources: ["secrets"]
 verbs: ["get", "list"]
 - apiGroups: [""]
- resources: ["nodes"]
- verbs: ["get", "list", "watch"]
- - apiGroups: [""]
- resources: ["namespaces"]
- verbs: ["get", "list"]
- - apiGroups: [""]
 resources: ["persistentvolumes"]
- verbs: ["get", "list", "watch", "update"]
+ verbs: ["get", "list"]
 - apiGroups: ["storage.k8s.io"]
 resources: ["volumeattachments"]
- verbs: ["get", "list", "watch", "update"]
+ verbs: ["get", "list"]
 - apiGroups: [""]
 resources: ["configmaps"]
- verbs: ["get", "list"]
+ verbs: ["get"]
 - apiGroups: [""]
 resources: ["serviceaccounts"]
 verbs: ["get"]
 - apiGroups: [""]
 resources: ["serviceaccounts/token"]
 verbs: ["create"]
+ - apiGroups: [""]
+ resources: ["nodes"]
+ verbs: ["get"]
 ---
 # Source: rook-ceph/templates/clusterrole.yaml
 kind: ClusterRole
@@ -762,13 +764,19 @@
 verbs: ["get", "list", "watch"]
 - apiGroups: [""]
 resources: ["persistentvolumes"]
- verbs: ["get", "list", "watch", "create", "delete", "update", "patch"]
+ verbs: ["get", "list", "watch", "create", "update", "delete", "patch"]
 - apiGroups: [""]
 resources: ["persistentvolumeclaims"]
 verbs: ["get", "list", "watch", "update"]
 - apiGroups: ["storage.k8s.io"]
+ resources: ["storageclasses"]
+ verbs: ["get", "list", "watch"]
+ - apiGroups: [""]
+ resources: ["events"]
+ verbs: ["list", "watch", "create", "update", "patch"]
+ - apiGroups: ["storage.k8s.io"]
 resources: ["volumeattachments"]
- verbs: ["get", "list", "watch", "update", "patch"]
+ verbs: ["get", "list", "watch", "patch"]
 - apiGroups: ["storage.k8s.io"]
 resources: ["volumeattachments/status"]
 verbs: ["patch"]
@@ -776,71 +784,64 @@
 resources: ["nodes"]
 verbs: ["get", "list", "watch"]
 - apiGroups: ["storage.k8s.io"]
- resources: ["storageclasses"]
+ resources: ["csinodes"]
 verbs: ["get", "list", "watch"]
 - apiGroups: [""]
- resources: ["events"]
- verbs: ["list", "watch", "create", "update", "patch"]
+ resources: ["persistentvolumeclaims/status"]
+ verbs: ["patch"]
 - apiGroups: ["snapshot.storage.k8s.io"]
 resources: ["volumesnapshots"]
- verbs: ["get", "list", "watch", "update", "patch"]
- - apiGroups: ["snapshot.storage.k8s.io"]
- resources: ["volumesnapshotcontents"]
- verbs: ["create", "get", "list", "watch", "update", "delete", "patch"]
+ verbs: ["get", "list", "watch", "update", "patch", "create"]
 - apiGroups: ["snapshot.storage.k8s.io"]
 resources: ["volumesnapshotclasses"]
 verbs: ["get", "list", "watch"]
 - apiGroups: ["snapshot.storage.k8s.io"]
- resources: ["volumesnapshotcontents/status"]
- verbs: ["update", "patch"]
+ resources: ["volumesnapshotcontents"]
+ verbs: ["get", "list", "watch", "patch", "update", "create"]
 - apiGroups: ["snapshot.storage.k8s.io"]
- resources: ["volumesnapshots/status"]
+ resources: ["volumesnapshotcontents/status"]
 verbs: ["update", "patch"]
- - apiGroups: [""]
- resources: ["persistentvolumeclaims/status"]
+ - apiGroups: ["groupsnapshot.storage.k8s.io"]
+ resources: ["volumegroupsnapshotclasses"]
+ verbs: ["get", "list", "watch"]
+ - apiGroups: ["groupsnapshot.storage.k8s.io"]
+ resources: ["volumegroupsnapshotcontents"]
+ verbs: ["get", "list", "watch", "update", "patch"]
+ - apiGroups: ["groupsnapshot.storage.k8s.io"]
+ resources: ["volumegroupsnapshotcontents/status"]
 verbs: ["update", "patch"]
 - apiGroups: [""]
 resources: ["configmaps"]
 verbs: ["get"]
- - apiGroups: ["replication.storage.openshift.io"]
- resources: ["volumereplications", "volumereplicationclasses"]
- verbs: ["create", "delete", "get", "list", "patch", "update", "watch"]
- - apiGroups: ["replication.storage.openshift.io"]
- resources: ["volumereplications/finalizers"]
- verbs: ["update"]
- - apiGroups: ["replication.storage.openshift.io"]
- resources: ["volumereplications/status"]
- verbs: ["get", "patch", "update"]
- - apiGroups: ["replication.storage.openshift.io"]
- resources: ["volumereplicationclasses/status"]
- verbs: ["get"]
 - apiGroups: [""]
 resources: ["serviceaccounts"]
 verbs: ["get"]
 - apiGroups: [""]
 resources: ["serviceaccounts/token"]
 verbs: ["create"]
+ - apiGroups: [""]
+ resources: ["nodes"]
+ verbs: ["get", "list", "watch"]
 ---
-# Source: rook-ceph/templates/psp.yaml
-apiVersion: rbac.authorization.k8s.io/v1
+# Source: rook-ceph/templates/clusterrole.yaml
 kind: ClusterRole
+apiVersion: rbac.authorization.k8s.io/v1
 metadata:
- name: 'psp:rook'
+ name: objectstorage-provisioner-role
 labels:
- operator: rook
- storage-backend: ceph
- app.kubernetes.io/part-of: rook-ceph-operator
- app.kubernetes.io/managed-by: Helm
- app.kubernetes.io/created-by: helm
-rules:
- - apiGroups:
- - policy
- resources:
- - podsecuritypolicies
- resourceNames:
- - 00-rook-privileged
- verbs:
- - use
+ app.kubernetes.io/part-of: container-object-storage-interface
+ app.kubernetes.io/component: driver-ceph
+ app.kubernetes.io/name: cosi-driver-ceph
+rules:
+ - apiGroups: ["objectstorage.k8s.io"]
+ resources: ["buckets", "bucketaccesses", "bucketclaims", "bucketaccessclasses", "buckets/status", "bucketaccesses/status", "bucketclaims/status", "bucketaccessclasses/status"]
+ verbs: ["get", "list", "watch", "update", "create", "delete"]
+ - apiGroups: ["coordination.k8s.io"]
+ resources: ["leases"]
+ verbs: ["get", "watch", "list", "delete", "update", "create"]
+ - apiGroups: [""]
+ resources: ["secrets", "events"]
+ verbs: ["get", "delete", "update", "create"]
 ---
 # Source: rook-ceph/templates/cluster-rbac.yaml
 # Allow the ceph mgr to access cluster-wide resources necessary for the mgr modules
@@ -946,28 +947,30 @@
 kind: ClusterRoleBinding
 apiVersion: rbac.authorization.k8s.io/v1
 metadata:
- name: cephfs-csi-nodeplugin
+ name: cephfs-csi-provisioner-role
 subjects:
 - kind: ServiceAccount
- name: rook-csi-cephfs-plugin-sa
+ name: rook-csi-cephfs-provisioner-sa
 namespace: default # namespace:operator
 roleRef:
 kind: ClusterRole
- name: cephfs-csi-nodeplugin
+ name: cephfs-external-provisioner-runner
 apiGroup: rbac.authorization.k8s.io
 ---
 # Source: rook-ceph/templates/clusterrolebinding.yaml
+# This is required by operator-sdk to map the cluster/clusterrolebindings with SA
+# otherwise operator-sdk will create a individual file for these.
 kind: ClusterRoleBinding
 apiVersion: rbac.authorization.k8s.io/v1
 metadata:
- name: cephfs-csi-provisioner-role
+ name: cephfs-csi-nodeplugin-role
 subjects:
 - kind: ServiceAccount
- name: rook-csi-cephfs-provisioner-sa
+ name: rook-csi-cephfs-plugin-sa
 namespace: default # namespace:operator
 roleRef:
 kind: ClusterRole
- name: cephfs-external-provisioner-runner
+ name: cephfs-csi-nodeplugin
 apiGroup: rbac.authorization.k8s.io
 ---
 # Source: rook-ceph/templates/clusterrolebinding.yaml
@@ -984,81 +987,24 @@
 name: rbd-external-provisioner-runner
 apiGroup: rbac.authorization.k8s.io
 ---
-# Source: rook-ceph/templates/psp.yaml
-apiVersion: rbac.authorization.k8s.io/v1
-kind: ClusterRoleBinding
-metadata:
- name: rook-ceph-system-psp
- labels:
- operator: rook
- storage-backend: ceph
- app.kubernetes.io/part-of: rook-ceph-operator
- app.kubernetes.io/managed-by: Helm
- app.kubernetes.io/created-by: helm
-roleRef:
- apiGroup: rbac.authorization.k8s.io
- kind: ClusterRole
- name: 'psp:rook'
-subjects:
- - kind: ServiceAccount
- name: rook-ceph-system
- namespace: default # namespace:operator
----
-# Source: rook-ceph/templates/psp.yaml
-apiVersion: rbac.authorization.k8s.io/v1
+# Source: rook-ceph/templates/clusterrolebinding.yaml
+# RBAC for ceph cosi driver service account
 kind: ClusterRoleBinding
-metadata:
- name: rook-csi-cephfs-provisioner-sa-psp
-roleRef:
- apiGroup: rbac.authorization.k8s.io
- kind: ClusterRole
- name: 'psp:rook'
-subjects:
- - kind: ServiceAccount
- name: rook-csi-cephfs-provisioner-sa
- namespace: default # namespace:operator
----
-# Source: rook-ceph/templates/psp.yaml
 apiVersion: rbac.authorization.k8s.io/v1
-kind: ClusterRoleBinding
 metadata:
- name: rook-csi-cephfs-plugin-sa-psp
-roleRef:
- apiGroup: rbac.authorization.k8s.io
- kind: ClusterRole
- name: 'psp:rook'
+ name: objectstorage-provisioner-role-binding
+ labels:
+ app.kubernetes.io/part-of: container-object-storage-interface
+ app.kubernetes.io/component: driver-ceph
+ app.kubernetes.io/name: cosi-driver-ceph
 subjects:
 - kind: ServiceAccount
- name: rook-csi-cephfs-plugin-sa
+ name: objectstorage-provisioner
 namespace: default # namespace:operator
----
-# Source: rook-ceph/templates/psp.yaml
-apiVersion: rbac.authorization.k8s.io/v1
-kind: ClusterRoleBinding
-metadata:
- name: rook-csi-rbd-plugin-sa-psp
 roleRef:
- apiGroup: rbac.authorization.k8s.io
 kind: ClusterRole
- name: 'psp:rook'
-subjects:
- - kind: ServiceAccount
- name: rook-csi-rbd-plugin-sa
- namespace: default # namespace:operator
----
-# Source: rook-ceph/templates/psp.yaml
-apiVersion: rbac.authorization.k8s.io/v1
-kind: ClusterRoleBinding
-metadata:
- name: rook-csi-rbd-provisioner-sa-psp
-roleRef:
+ name: objectstorage-provisioner-role
 apiGroup: rbac.authorization.k8s.io
- kind: ClusterRole
- name: 'psp:rook'
-subjects:
- - kind: ServiceAccount
- name: rook-csi-rbd-provisioner-sa
- namespace: default # namespace:operator
 ---
 # Source: rook-ceph/templates/cluster-rbac.yaml
 kind: Role
@@ -1068,10 +1014,10 @@
 namespace: default # namespace:cluster
 rules:
 # this is needed for rook's "key-management" CLI to fetch the vault token from the secret when
- # validating the connection details
+ # validating the connection details and for key rotation operations.
 - apiGroups: [""]
 resources: ["secrets"]
- verbs: ["get"]
+ verbs: ["get", "update"]
 - apiGroups: [""]
 resources: ["configmaps"]
 verbs: ["get", "list", "watch", "create", "update", "delete"]
@@ -1080,23 +1026,6 @@
 verbs: ["get", "list", "create", "update", "delete"]
 ---
 # Source: rook-ceph/templates/cluster-rbac.yaml
-kind: Role
-apiVersion: rbac.authorization.k8s.io/v1
-metadata:
- name: rook-ceph-rgw
- namespace: default # namespace:cluster
-rules:
- # Placeholder role so the rgw service account will
- # be generated in the csv. Remove this role and role binding
- # when fixing https://github.com/rook/rook/issues/10141.
- - apiGroups:
- - ""
- resources:
- - configmaps
- verbs:
- - get
----
-# Source: rook-ceph/templates/cluster-rbac.yaml
 # Aspects of ceph-mgr that operate within the cluster's namespace
 kind: Role
 apiVersion: rbac.authorization.k8s.io/v1
@@ -1131,9 +1060,31 @@
 - apiGroups:
 - ceph.rook.io
 resources:
- - "*"
+ - cephclients
+ - cephclusters
+ - cephblockpools
+ - cephfilesystems
+ - cephnfses
+ - cephobjectstores
+ - cephobjectstoreusers
+ - cephobjectrealms
+ - cephobjectzonegroups
+ - cephobjectzones
+ - cephbuckettopics
+ - cephbucketnotifications
+ - cephrbdmirrors
+ - cephfilesystemmirrors
+ - cephfilesystemsubvolumegroups
+ - cephblockpoolradosnamespaces
+ - cephcosidrivers
 verbs:
- - "*"
+ - get
+ - list
+ - watch
+ - create
+ - update
+ - delete
+ - patch
 - apiGroups:
 - apps
 resources:
@@ -1269,6 +1220,7 @@
 - create
 - update
 - delete
+ - deletecollection
 - apiGroups:
 - batch
 resources:
@@ -1284,6 +1236,13 @@
 - get
 - create
 - delete
+ - apiGroups:
+ - multicluster.x-k8s.io
+ resources:
+ - serviceexports
+ verbs:
+ - get
+ - create
 ---
 # Source: rook-ceph/templates/role.yaml
 kind: Role
@@ -1292,12 +1251,6 @@
 name: cephfs-external-provisioner-cfg
 namespace: default # namespace:operator
 rules:
- - apiGroups: [""]
- resources: ["endpoints"]
- verbs: ["get", "watch", "list", "delete", "update", "create"]
- - apiGroups: [""]
- resources: ["configmaps"]
- verbs: ["get", "list", "create", "delete"]
 - apiGroups: ["coordination.k8s.io"]
 resources: ["leases"]
 verbs: ["get", "watch", "list", "delete", "update", "create"]
@@ -1309,113 +1262,11 @@
 name: rbd-external-provisioner-cfg
 namespace: default # namespace:operator
 rules:
- - apiGroups: [""]
- resources: ["endpoints"]
- verbs: ["get", "watch", "list", "delete", "update", "create"]
- - apiGroups: [""]
- resources: ["configmaps"]
- verbs: ["get", "list", "watch", "create", "delete", "update"]
 - apiGroups: ["coordination.k8s.io"]
 resources: ["leases"]
 verbs: ["get", "watch", "list", "delete", "update", "create"]
 ---
 # Source: rook-ceph/templates/cluster-rbac.yaml
-apiVersion: rbac.authorization.k8s.io/v1
-kind: RoleBinding
-metadata:
- name: rook-ceph-default-psp
- namespace: default # namespace:cluster
- labels:
- operator: rook
- storage-backend: ceph
- app.kubernetes.io/part-of: rook-ceph-operator
- app.kubernetes.io/managed-by: Helm
- app.kubernetes.io/created-by: helm
-roleRef:
- apiGroup: rbac.authorization.k8s.io
- kind: ClusterRole
- name: psp:rook
-subjects:
- - kind: ServiceAccount
- name: default
- namespace: default # namespace:cluster
----
-# Source: rook-ceph/templates/cluster-rbac.yaml
-apiVersion: rbac.authorization.k8s.io/v1
-kind: RoleBinding
-metadata:
- name: rook-ceph-osd-psp
- namespace: default # namespace:cluster
-roleRef:
- apiGroup: rbac.authorization.k8s.io
- kind: ClusterRole
- name: psp:rook
-subjects:
- - kind: ServiceAccount
- name: rook-ceph-osd
- namespace: default # namespace:cluster
----
-# Source: rook-ceph/templates/cluster-rbac.yaml
-apiVersion: rbac.authorization.k8s.io/v1
-kind: RoleBinding
-metadata:
- name: rook-ceph-rgw-psp
- namespace: default # namespace:cluster
-roleRef:
- apiGroup: rbac.authorization.k8s.io
- kind: ClusterRole
- name: psp:rook
-subjects:
- - kind: ServiceAccount
- name: rook-ceph-rgw
- namespace: default # namespace:cluster
----
-# Source: rook-ceph/templates/cluster-rbac.yaml
-apiVersion: rbac.authorization.k8s.io/v1
-kind: RoleBinding
-metadata:
- name: rook-ceph-mgr-psp
- namespace: default # namespace:cluster
-roleRef:
- apiGroup: rbac.authorization.k8s.io
- kind: ClusterRole
- name: psp:rook
-subjects:
- - kind: ServiceAccount
- name: rook-ceph-mgr
- namespace: default # namespace:cluster
----
-# Source: rook-ceph/templates/cluster-rbac.yaml
-apiVersion: rbac.authorization.k8s.io/v1
-kind: RoleBinding
-metadata:
- name: rook-ceph-cmd-reporter-psp
- namespace: default # namespace:cluster
-roleRef:
- apiGroup: rbac.authorization.k8s.io
- kind: ClusterRole
- name: psp:rook
-subjects:
- - kind: ServiceAccount
- name: rook-ceph-cmd-reporter
- namespace: default # namespace:cluster
----
-# Source: rook-ceph/templates/cluster-rbac.yaml
-apiVersion: rbac.authorization.k8s.io/v1
-kind: RoleBinding
-metadata:
- name: rook-ceph-purge-osd-psp
- namespace: default # namespace:cluster
-roleRef:
- apiGroup: rbac.authorization.k8s.io
- kind: ClusterRole
- name: psp:rook
-subjects:
- - kind: ServiceAccount
- name: rook-ceph-purge-osd
- namespace: default # namespace:cluster
----
-# Source: rook-ceph/templates/cluster-rbac.yaml
 # Allow the operator to create resources in this cluster's namespace
 kind: RoleBinding
 apiVersion: rbac.authorization.k8s.io/v1
@@ -1448,22 +1299,6 @@
 namespace: default # namespace:cluster
 ---
 # Source: rook-ceph/templates/cluster-rbac.yaml
-# Allow the rgw pods in this namespace to work with configmaps
-kind: RoleBinding
-apiVersion: rbac.authorization.k8s.io/v1
-metadata:
- name: rook-ceph-rgw
- namespace: default # namespace:cluster
-roleRef:
- apiGroup: rbac.authorization.k8s.io
- kind: Role
- name: rook-ceph-rgw
-subjects:
- - kind: ServiceAccount
- name: rook-ceph-rgw
- namespace: default # namespace:cluster
----
-# Source: rook-ceph/templates/cluster-rbac.yaml
 # Allow the ceph mgr to access resources scoped to the CephCluster namespace necessary for mgr modules
 kind: RoleBinding
 apiVersion: rbac.authorization.k8s.io/v1
@@ -1615,6 +1450,7 @@
 kind: Deployment
 metadata:
 name: rook-ceph-operator
+ namespace: default # namespace:operator
 labels:
 operator: rook
 storage-backend: ceph
@@ -1633,39 +1469,37 @@
 labels:
 app: rook-ceph-operator
 spec:
+ tolerations:
+ - effect: NoExecute
+ key: node.kubernetes.io/unreachable
+ operator: Exists
+ tolerationSeconds: 5
 containers:
 - name: rook-ceph-operator
- image: "rook/ceph:v1.9.12"
+ image: "docker.io/rook/ceph:v1.16.0"
 imagePullPolicy: IfNotPresent
 args: ["ceph", "operator"]
 securityContext:
+ capabilities:
+ drop:
+ - ALL
+ runAsGroup: 2016
 runAsNonRoot: true
 runAsUser: 2016
- runAsGroup: 2016
 volumeMounts:
 - mountPath: /var/lib/rook
 name: rook-config
 - mountPath: /etc/ceph
 name: default-config-dir
- - mountPath: /etc/webhook
- name: webhook-cert
- ports:
- - containerPort: 9443
- name: https-webhook
- protocol: TCP
 env:
 - name: ROOK_CURRENT_NAMESPACE_ONLY
 value: "false"
 - name: ROOK_HOSTPATH_REQUIRES_PRIVILEGED
 value: "false"
- - name: ROOK_ENABLE_SELINUX_RELABELING
- value: "true"
 - name: ROOK_DISABLE_DEVICE_HOTPLUG
 value: "false"
- - name: ROOK_ENABLE_DISCOVERY_DAEMON
- value: "false"
- - name: ROOK_DISABLE_ADMISSION_CONTROLLER
- value: "false"
+ - name: ROOK_DISCOVER_DEVICES_INTERVAL
+ value: "60m"
 - name: NODE_NAME
 valueFrom:
 fieldRef:
@@ -1680,7 +1514,6 @@
 fieldPath: metadata.namespace
 resources:
 limits:
- cpu: 500m
 memory: 256Mi
 requests:
 cpu: 10m
@@ -1691,5 +1524,7 @@
 emptyDir: {}
 - name: default-config-dir
 emptyDir: {}
- - name: webhook-cert
- emptyDir: {}
+# Source: rook-ceph/templates/securityContextConstraints.yaml
+# scc for the Rook and Ceph daemons
+# for creating cluster in openshift
+---

@chii-bot
Copy link
Contributor Author

chii-bot bot commented Aug 31, 2022

MegaLinter status: ❌ ERROR

Descriptor Linter Files Fixed Errors Elapsed time
❌ COPYPASTE jscpd yes 2 1.01s
✅ REPOSITORY git_diff yes no 0.02s
✅ REPOSITORY secretlint yes no 1.25s
✅ YAML prettier 4 0 0.66s
✅ YAML yamllint 4 0 0.23s

See errors details in artifact MegaLinter reports on CI Job page
Set VALIDATE_ALL_CODEBASE: true in mega-linter.yml to validate all sources, not only the diff

@chii-bot chii-bot bot changed the title feat(helm): update rook-ceph group to v1.10.0 (minor) feat(helm): update rook-ceph group to v1.10.1 (minor) Sep 9, 2022
@chii-bot chii-bot bot force-pushed the renovate/rook-ceph branch from 4c696c0 to ecc5e00 Compare September 9, 2022 20:22
@chii-bot chii-bot bot changed the title feat(helm): update rook-ceph group to v1.10.1 (minor) feat(helm): update rook-ceph group to v1.10.2 (minor) Sep 27, 2022
@chii-bot chii-bot bot force-pushed the renovate/rook-ceph branch from ecc5e00 to cb07759 Compare September 27, 2022 20:26
@chii-bot chii-bot bot changed the title feat(helm): update rook-ceph group to v1.10.2 (minor) feat(helm): update rook-ceph group to v1.10.3 (minor) Oct 6, 2022
@chii-bot chii-bot bot force-pushed the renovate/rook-ceph branch from cb07759 to 40bd676 Compare October 6, 2022 21:20
@chii-bot chii-bot bot changed the title feat(helm): update rook-ceph group to v1.10.3 (minor) feat(helm): update rook-ceph group to v1.10.4 (minor) Oct 20, 2022
@chii-bot chii-bot bot force-pushed the renovate/rook-ceph branch from 40bd676 to 2b51770 Compare October 20, 2022 20:25
@chii-bot chii-bot bot changed the title feat(helm): update rook-ceph group to v1.10.4 (minor) feat(helm): update rook-ceph group to v1.10.5 (minor) Nov 3, 2022
@chii-bot chii-bot bot force-pushed the renovate/rook-ceph branch from 2b51770 to c1d7a2d Compare November 3, 2022 22:17
@chii-bot chii-bot bot changed the title feat(helm): update rook-ceph group to v1.10.5 (minor) feat(helm): update rook-ceph group to v1.10.6 (minor) Nov 18, 2022
@chii-bot chii-bot bot force-pushed the renovate/rook-ceph branch from c1d7a2d to 9d0fb81 Compare November 18, 2022 01:41
@chii-bot chii-bot bot changed the title feat(helm): update rook-ceph group to v1.10.6 (minor) feat(helm): update rook-ceph group to v1.10.7 (minor) Dec 6, 2022
@chii-bot chii-bot bot force-pushed the renovate/rook-ceph branch from 9d0fb81 to efdb31b Compare December 6, 2022 22:16
@chii-bot chii-bot bot changed the title feat(helm): update rook-ceph group to v1.10.7 (minor) feat(helm): update rook-ceph group to v1.10.8 (minor) Dec 21, 2022
@chii-bot chii-bot bot force-pushed the renovate/rook-ceph branch from efdb31b to 3e46cca Compare December 21, 2022 18:20
@chii-bot chii-bot bot changed the title feat(helm): update rook-ceph group to v1.10.8 (minor) feat(helm): update rook-ceph group to v1.10.9 (minor) Jan 12, 2023
@chii-bot chii-bot bot force-pushed the renovate/rook-ceph branch from 3e46cca to 221e89d Compare January 12, 2023 22:17
@chii-bot chii-bot bot changed the title feat(helm): update rook-ceph group to v1.10.9 (minor) feat(helm): update rook-ceph group to v1.10.10 (minor) Jan 18, 2023
@chii-bot chii-bot bot force-pushed the renovate/rook-ceph branch from 221e89d to 9422233 Compare January 18, 2023 18:21
@chii-bot chii-bot bot changed the title feat(helm): update rook-ceph group to v1.10.10 (minor) feat(helm): update rook-ceph group to v1.10.11 (minor) Feb 10, 2023
@chii-bot chii-bot bot force-pushed the renovate/rook-ceph branch from b8fc3e0 to 0af6374 Compare May 30, 2024 21:18
@chii-bot chii-bot bot changed the title feat(helm): update rook-ceph group to v1.14.4 (minor) feat(helm): update rook-ceph group to v1.14.5 (minor) May 30, 2024
@chii-bot chii-bot bot force-pushed the renovate/rook-ceph branch from 0af6374 to c097aad Compare June 13, 2024 23:18
@chii-bot chii-bot bot changed the title feat(helm): update rook-ceph group to v1.14.5 (minor) feat(helm): update rook-ceph group to v1.14.6 (minor) Jun 13, 2024
@chii-bot chii-bot bot force-pushed the renovate/rook-ceph branch from c097aad to ccdacd2 Compare June 21, 2024 18:22
@chii-bot chii-bot bot changed the title feat(helm): update rook-ceph group to v1.14.6 (minor) feat(helm): update rook-ceph group to v1.14.7 (minor) Jun 21, 2024
@github-advanced-security
Copy link

This pull request sets up GitHub code scanning for this repository. Once the scans have completed and the checks have passed, the analysis results for this pull request branch will appear on this overview. Once you merge this pull request, the 'Security' tab will show more code scanning analysis results (for example, for the default branch). Depending on your configuration and choice of analysis tool, future pull requests will be annotated with code scanning analysis results. For more information about GitHub code scanning, check out the documentation.

@chii-bot chii-bot bot force-pushed the renovate/rook-ceph branch from ccdacd2 to 6568064 Compare July 3, 2024 20:17
@chii-bot chii-bot bot changed the title feat(helm): update rook-ceph group to v1.14.7 (minor) feat(helm): update rook-ceph group to v1.14.8 (minor) Jul 3, 2024
@chii-bot chii-bot bot force-pushed the renovate/rook-ceph branch from 6568064 to 00cf479 Compare July 25, 2024 22:17
@chii-bot chii-bot bot changed the title feat(helm): update rook-ceph group to v1.14.8 (minor) feat(helm): update rook-ceph group to v1.14.9 (minor) Jul 25, 2024
@chii-bot chii-bot bot force-pushed the renovate/rook-ceph branch from 00cf479 to 04cc8d9 Compare August 20, 2024 23:19
@chii-bot chii-bot bot changed the title feat(helm): update rook-ceph group to v1.14.9 (minor) feat(helm): update rook-ceph group to v1.14.10 (minor) Aug 20, 2024
@chii-bot chii-bot bot force-pushed the renovate/rook-ceph branch from 04cc8d9 to 89e3bb5 Compare August 21, 2024 01:12
@chii-bot chii-bot bot changed the title feat(helm): update rook-ceph group to v1.14.10 (minor) feat(helm): update rook-ceph group to v1.15.0 (minor) Aug 21, 2024
@chii-bot chii-bot bot force-pushed the renovate/rook-ceph branch from 89e3bb5 to 61cf10e Compare September 4, 2024 22:17
@chii-bot chii-bot bot changed the title feat(helm): update rook-ceph group to v1.15.0 (minor) feat(helm): update rook-ceph group to v1.15.1 (minor) Sep 4, 2024
@chii-bot chii-bot bot force-pushed the renovate/rook-ceph branch from 61cf10e to b5a4b14 Compare September 19, 2024 21:18
@chii-bot chii-bot bot changed the title feat(helm): update rook-ceph group to v1.15.1 (minor) feat(helm): update rook-ceph group to v1.15.2 (minor) Sep 19, 2024
@chii-bot chii-bot bot force-pushed the renovate/rook-ceph branch from b5a4b14 to 1a5b1d0 Compare October 3, 2024 22:19
@chii-bot chii-bot bot changed the title feat(helm): update rook-ceph group to v1.15.2 (minor) feat(helm): update rook-ceph group to v1.15.3 (minor) Oct 3, 2024
@chii-bot chii-bot bot force-pushed the renovate/rook-ceph branch from 1a5b1d0 to d7eba24 Compare October 17, 2024 21:19
@chii-bot chii-bot bot changed the title feat(helm): update rook-ceph group to v1.15.3 (minor) feat(helm): update rook-ceph group to v1.15.4 (minor) Oct 17, 2024
@chii-bot chii-bot bot force-pushed the renovate/rook-ceph branch from d7eba24 to cafe080 Compare November 6, 2024 21:19
@chii-bot chii-bot bot changed the title feat(helm): update rook-ceph group to v1.15.4 (minor) feat(helm): update rook-ceph group to v1.15.5 (minor) Nov 6, 2024
@chii-bot chii-bot bot force-pushed the renovate/rook-ceph branch from cafe080 to 3a20c80 Compare November 21, 2024 22:20
@chii-bot chii-bot bot changed the title feat(helm): update rook-ceph group to v1.15.5 (minor) feat(helm): update rook-ceph group to v1.15.6 (minor) Nov 21, 2024
| datasource | package           | from    | to      |
| ---------- | ----------------- | ------- | ------- |
| helm       | rook-ceph         | v1.9.12 | v1.16.0 |
| helm       | rook-ceph         | v1.9.12 | v1.16.0 |
| helm       | rook-ceph         | v1.9.12 | v1.16.0 |
| helm       | rook-ceph-cluster | v1.9.12 | v1.16.0 |
| docker     | rook/ceph         | v1.9.13 | v1.16.0 |
| docker     | rook/ceph         | v1.9.13 | v1.16.0 |
@chii-bot chii-bot bot force-pushed the renovate/rook-ceph branch from 3a20c80 to 10ea576 Compare December 17, 2024 21:17
@chii-bot chii-bot bot changed the title feat(helm): update rook-ceph group to v1.15.6 (minor) feat(helm): update rook-ceph group to v1.16.0 (minor) Dec 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cluster Changes made in the cluster directory renovate/container renovate/helm size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. type/minor
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0 participants