You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the feature you'd like to have.
I'm was trying to have existing PrometheusRules to alert me if the volsync cache PVCs are not sized appropriately.
Then I noticed that I do not see PVCs created by volsync visible within Prometheus. Perhaps this is just how volsync uses PVC's and kubelet can't gather metrics on PVCs not actively mounted.
What is the value to the end user? (why is it a priority?)
The docs state This volume contains cached metadata from the backup repository. It must be large enough to hold the non-pruned repository metadata.
I do not know how much space is being used by Restic metadata or how that changes over time
I would like to bump up the cache size before the volume fills up and volsync backups are impacted
How will we know we have a good solution? (acceptance criteria)
I'm going to assume that volsync does not normally mount cache PVCs (and thus kubelet can't not report on it). If this is true, perhaps when trigger.schedule event happens would it be possible for volsync to then emit its own metric with cache capacity? perhaps percent free? Something like volsync_cache_capacity_available
maybe "-1" if unknown (no event triggered), otherwise a number between 0 and 100 as a percentage of capacity left.
Then I can have an alert like:
- alert: VolSyncCacheVolumeCapacityLow
annotation:
summary: >-
{{ $labels.obj_namespace }}/{{ $labels.obj_name }} cache volume space is almost full.
Increase size of cacheCapacity value.
description: >-
{{ $labels.obj_namespace }}/{{ $labels.obj_name }} cache volume space is < 15%.
VALUE = {{ $value }}
expr: |
volsync_cache_capacity_available > -1 and volsync_cache_capacity_available < 15
for: 15m
labels:
severity: critical
The text was updated successfully, but these errors were encountered:
The VolSync controller doesn't mount a restic cache PVC itself, it's mounted to the mover pod from the job that runs during a sync however. Can you see stats for when the mover job is running?
As such, I'm not sure we want to try to capture this usage data and have it sent back to the controller to emit as events.
The kubelet_volume_stats_* series of metrics contain the data I want such as used_bytes or capacity_bytes but none of the PVCs created by volsync are listed. Perhaps the mover pods have the cache volume mounted so briefly it hasn't happened when kubelet is fetching data?
kube_persistentvolume_capacity_bytes does include PVCs created by volsync, but only total capacity of the volume is available. kube_persistentvolume_* series of metric do not contain any use information.
I was unable to locate anything about "VolumeUsage" other than above.
Describe the feature you'd like to have.
I'm was trying to have existing
PrometheusRules
to alert me if the volsync cache PVCs are not sized appropriately.Then I noticed that I do not see PVCs created by volsync visible within Prometheus. Perhaps this is just how volsync uses PVC's and kubelet can't gather metrics on PVCs not actively mounted.
What is the value to the end user? (why is it a priority?)
The docs state
This volume contains cached metadata from the backup repository. It must be large enough to hold the non-pruned repository metadata.
How will we know we have a good solution? (acceptance criteria)
I'm going to assume that volsync does not normally mount cache PVCs (and thus kubelet can't not report on it). If this is true, perhaps when
trigger.schedule
event happens would it be possible for volsync to then emit its own metric with cache capacity? perhaps percent free? Something likevolsync_cache_capacity_available
maybe "-1" if unknown (no event triggered), otherwise a number between 0 and 100 as a percentage of capacity left.
Then I can have an alert like:
The text was updated successfully, but these errors were encountered: