This Grafana dashboard provides an in-depth view of the CSI Driver operations for Linode Block Storage, with real-time data on volume creation, deletion, publication, and expansion. It also tracks persistent volume claims and potential runtime errors. The data is sourced from Prometheus, making it ideal for monitoring and diagnosing issues with CSI Driver operations.
The dashboard is divided into several panels. Each panel focuses on a different aspect of CSI Driver operations, including Create/Delete/Publish Volume requests, runtime operation errors, and Persistent Volume (PV) and Persistent Volume Claim (PVC) events.
- The below graphs are metrics obtained from the following resources
- sidecars
- csi-attacher
- csi-provisioner
- csi-resizer
- csi-linode-plugin
- controller-server
- node-server
- sidecars
- The current graphs show activity during e2e-tests and csi sanity tests.
- In graphs that show time taken per request, the difference between two points on the y-axis is the time taken for that request to process. The flat line on the graph indicates no activity during that time period.
-
Create Volume Requests
- Description: This graph represents the total number of requests made to create volumes in the cluster.
- Query:
csi_sidecar_operations_seconds_count{method_name="/csi.v1.Controller/CreateVolume"}
- X-axis: Time interval for when the requests were made.
- Y-axis: Number of requests per time period.
- Graph:
- Explanation: This graph shows the rate of volume creation requests over time.
-
Total Time Taken to Create Volume
-
Delete Volume Requests
-
Total Time Taken to Delete Volume
-
Expand Volume Requests
-
Total Time Taken to Expand Volume
-
Publish Volume Requests
-
Total Time Taken to Publish Volume
-
Unpublish Volume Requests
- Description: Tracks the number of requests to unpublish volumes.
- Query:
csi_sidecar_operations_seconds_count{method_name="/csi.v1.Controller/ControllerUnpublishVolume"}
- X-axis: : Time intervals of unpublish requests.
- Y-axis: Number of unpublish requests.
- Graph:
- Explanation: This graph shows how frequently volumes are unpublished (detached) from nodes.
-
Total Time Taken to Unpublish Volume
- Description: Displays the total number of PV-related events that the CSI controller processed.
- Query:
workqueue_adds_total{name="volumes"}
- X-axis: Time interval for requests.
- Y-axis: Number of PVRs made over time.
- Graph:
- Description: Tracks the number of PVC-related events that the controller reconciles.
- Query:
workqueue_adds_total{name="claims"}
- X-axis: Time interval for PVC claims.
- Y-axis: Number of PVC's made over time
- Graph:
- Description: Visualizes errors encountered by the CSI Driver during operations.
- Query:
kubelet_runtime_operations_errors_total
- X-axis: Time interval of operations
- Y-axis: Number of errors
- Graph:
- Description: Shows the cumulative time taken for operations handled by CSI sidecars (attacher, provisioner, etc.).
- Query:
csi_sidecar_operations_seconds_sum
- X-axis: Time interval of operations
- Y-axis: Max time taken for each operation
- Graph:
- Description: Shows the time taken for expand volume operation
- Query:
csi_node_expand_duration_seconds_count{functionStatus="true"}
- X-axis: Time intervals when the operations occurred.
- Y-axis: The time taken for each expand operation (measured in milliseconds and seconds).
- Graph:
- Description: Shows the time taken for stage volume operation
- Query:
csi_node_stage_duration_seconds_count{functionStatus="true"}
- X-axis: Timeline of the publish operations.
- Y-axis: The actual time taken per publish request (in seconds).
- Graph:
- Description: Shows the time taken for unstage volume operation
- Query:
csi_node_unstage_duration_seconds_count{functionStatus="true"}
- X-axis: The time when staging operations occurred.
- Y-axis: Time taken per operation (in seconds).
- Graph:
- Description: Shows the time taken for publish volume operation
- Query:
csi_node_publish_duration_seconds_count{functionStatus="true"}
- X-axis: The timeline of unpublish requests.
- Y-axis: Time taken for each request (in seconds).
- Graph:
- Description: Shows the time taken for unpublish volume operation
- Query:
csi_node_unpublish_duration_seconds_count{functionStatus="true"}
- X-axis: The timeline of unstaging requests.
- Y-axis: Time taken for each operation (in seconds).
- Graph:
- Description: This graph tracks the time taken for each "Create Volume" request made by the CSI driver on a Kubernetes node.
- Query:
csi_controller_create_volume_duration_seconds_count{functionStatus="true"}
- X-axis: The X-axis represents the time in minutes. Each tick represents the interval between two consecutive points where a request was made.
- Y-axis: The Y-axis shows the time taken for each operation in seconds or minutes.
- Graph:
- Description: This graph tracks the time taken for "Delete Volume" operations.
- Query:
csi_controller_delete_volume_duration_seconds_count{functionStatus="true"}
- X-axis: Represents the timeline for when the delete operations were executed.
- Y-axis: Shows the time taken for the delete operations in seconds.
- Graph:
- Description: This graph records the time taken to publish (attach) a volume to a node within the Kubernetes cluster.
- Query:
csi_controller_publish_volume_duration_seconds_count{functionStatus="true"}
- X-axis: Represents time intervals when the publish operations occurred.
- Y-axis: Displays the time taken for each publish operation.
- Graph:
- Description: This graph shows the time taken for each "Unpublish Volume" operation, where a volume is detached from the node.
- Query:
csi_controller_unpublish_volume_duration_seconds_count{functionStatus="true"}
- X-axis: Timeline of the unpublish operations.
- Y-axis: Time taken for each unpublish operation.
- Graph: