Skip to content

Commit

Permalink
fixed typo in sys.host.disk.iopsinprogress
Browse files Browse the repository at this point in the history
  • Loading branch information
florence-crl committed Sep 29, 2023
1 parent a132122 commit 63a9943
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 2 deletions.
2 changes: 1 addition & 1 deletion src/current/_includes/v23.1/essential-metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ The **Usage** column explains why each metric is important to visualize in a cus
| sys.host.disk.write.count | {% if include.deployment == 'self-hosted' %}sys.host.disk.write |{% elsif include.deployment == 'dedicated' %}NOT AVAILABLE |{% endif %} Disk write operations across all disks since this process started | This metric reports the effective storage device write IOPS rate. To confirm that storage is sufficiently provisioned, assess the I/O performance rates (IOPS and MBPS) in the context of the sys.host.disk.iopsinprogress metric. |
| sys.host.disk.read.bytes | {% if include.deployment == 'self-hosted' %}sys.host.disk.read.bytes |{% elsif include.deployment == 'dedicated' %}NOT AVAILABLE |{% endif %} Bytes read from all disks since this process started | This metric reports the effective storage device read throughput (MB/s) rate. To confirm that storage is sufficiently provisioned, assess the I/O performance rates (IOPS and MBPS) in the context of the sys.host.disk.iopsinprogress metric. |
| sys.host.disk.read.count | {% if include.deployment == 'self-hosted' %}sys.host.disk.read |{% elsif include.deployment == 'dedicated' %}NOT AVAILABLE |{% endif %} Disk read operations across all disks since this process started | This metric reports the effective storage device read IOPS rate. To confirm that storage is sufficiently provisioned, assess the I/O performance rates (IOPS and MBPS) in the context of the sys.host.disk.iopsinprogress metric. |
| sys.host.disk.iopsinprogress | {% if include.deployment == 'self-hosted' %}ys.host.disk.iopsinprogress |{% elsif include.deployment == 'dedicated' %}NOT AVAILABLE |{% endif %} IO operations currently in progress on this host | This metric gives the average queue length of the storage device. It characterizes the storage device's performance capability. All I/O performance metrics are Linux counters and correspond to the `avgqu-sz` in the Linux `iostat` command output. You need to view the device queue graph in the context of the actual read/write IOPS and MBPS metrics that show the actual device utilization. If the device is not keeping up, the queue will grow. Values over 10 are bad. Values around 5 mean the device is working hard trying to keep up. For internal (on chassis) [NVMe](https://www.wikipedia.org/wiki/NVM_Express) devices, the queue values are typically 0. For network connected devices, such as [AWS EBS volumes](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-volume-types.html), the normal operating range of values is 1 to 2. Spikes in values are OK. They indicate an I/O spike where the device fell behind and then caught up. End users may experience inconsistent response times, but there should be no cluster stability issues. If the queue is greater than 5 for an extended period of time and IOPS or MBPS are low, then the storage is most likely not provisioned per Cockroach Labs guidance. In AWS EBS, it is commonly an EBS type, such as gp2, not suitable as database primary storage. If I/O is low and the queue is low, the most likely scenario is that the CPU is lacking and not driving I/O. One such case is a cluster with nodes with only 2 vcpus which is not supported [sizing]({% link {{ page.version.version }}/recommended-production-settings.md %}#sizing) for production deployments. There are quite a few background processes in the database that take CPU away from the workload, so the workload is just not getting the CPU. Review [storage and disk I/O]({% link {{ page.version.version }}/common-issues-to-monitor.md %}#storage-and-disk-i-o). |
| sys.host.disk.iopsinprogress | {% if include.deployment == 'self-hosted' %}sys.host.disk.iopsinprogress |{% elsif include.deployment == 'dedicated' %}NOT AVAILABLE |{% endif %} IO operations currently in progress on this host | This metric gives the average queue length of the storage device. It characterizes the storage device's performance capability. All I/O performance metrics are Linux counters and correspond to the `avgqu-sz` in the Linux `iostat` command output. You need to view the device queue graph in the context of the actual read/write IOPS and MBPS metrics that show the actual device utilization. If the device is not keeping up, the queue will grow. Values over 10 are bad. Values around 5 mean the device is working hard trying to keep up. For internal (on chassis) [NVMe](https://www.wikipedia.org/wiki/NVM_Express) devices, the queue values are typically 0. For network connected devices, such as [AWS EBS volumes](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-volume-types.html), the normal operating range of values is 1 to 2. Spikes in values are OK. They indicate an I/O spike where the device fell behind and then caught up. End users may experience inconsistent response times, but there should be no cluster stability issues. If the queue is greater than 5 for an extended period of time and IOPS or MBPS are low, then the storage is most likely not provisioned per Cockroach Labs guidance. In AWS EBS, it is commonly an EBS type, such as gp2, not suitable as database primary storage. If I/O is low and the queue is low, the most likely scenario is that the CPU is lacking and not driving I/O. One such case is a cluster with nodes with only 2 vcpus which is not supported [sizing]({% link {{ page.version.version }}/recommended-production-settings.md %}#sizing) for production deployments. There are quite a few background processes in the database that take CPU away from the workload, so the workload is just not getting the CPU. Review [storage and disk I/O]({% link {{ page.version.version }}/common-issues-to-monitor.md %}#storage-and-disk-i-o). |
| sys.host.net.recv.bytes | sys.host.net.recv.bytes | Bytes received on all network interfaces since this process started | This metric gives the node's ingress/egress network transfer rates for flat sections which may indicate insufficiently provisioned networking or high error rates. CockroachDB is using a reliable TCP/IP protocol, so errors result in delivery retries that create a "slow network" effect. |
| sys.host.net.send.bytes | sys.host.net.send.bytes | Bytes sent on all network interfaces since this process started | This metric gives the node's ingress/egress network transfer rates for flat sections which may indicate insufficiently provisioned networking or high error rates. CockroachDB is using a reliable TCP/IP protocol, so errors result in delivery retries that create a "slow network" effect. |
| clock-offset.meannanos | clock.offset.meannanos | Mean clock offset with other nodes | This metric gives the node's clock skew. In a well-configured environment, the actual clock skew would be in the sub-millisecond range. A skew exceeding 5 ms is likely due to a NTP service mis-configuration. Reducing the actual clock skew reduces the probability of uncertainty related conflicts and corresponding retires which has a positive impact on workload performance. Conversely, a larger actual clock skew increases the probability of retries due to uncertainty conflicts, with potentially measurable adverse effects on workload performance. |
Expand Down
2 changes: 1 addition & 1 deletion src/current/_includes/v23.2/essential-metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ The **Usage** column explains why each metric is important to visualize in a cus
| sys.host.disk.write.count | {% if include.deployment == 'self-hosted' %}sys.host.disk.write |{% elsif include.deployment == 'dedicated' %}NOT AVAILABLE |{% endif %} Disk write operations across all disks since this process started | This metric reports the effective storage device write IOPS rate. To confirm that storage is sufficiently provisioned, assess the I/O performance rates (IOPS and MBPS) in the context of the sys.host.disk.iopsinprogress metric. |
| sys.host.disk.read.bytes | {% if include.deployment == 'self-hosted' %}sys.host.disk.read.bytes |{% elsif include.deployment == 'dedicated' %}NOT AVAILABLE |{% endif %} Bytes read from all disks since this process started | This metric reports the effective storage device read throughput (MB/s) rate. To confirm that storage is sufficiently provisioned, assess the I/O performance rates (IOPS and MBPS) in the context of the sys.host.disk.iopsinprogress metric. |
| sys.host.disk.read.count | {% if include.deployment == 'self-hosted' %}sys.host.disk.read |{% elsif include.deployment == 'dedicated' %}NOT AVAILABLE |{% endif %} Disk read operations across all disks since this process started | This metric reports the effective storage device read IOPS rate. To confirm that storage is sufficiently provisioned, assess the I/O performance rates (IOPS and MBPS) in the context of the sys.host.disk.iopsinprogress metric. |
| sys.host.disk.iopsinprogress | {% if include.deployment == 'self-hosted' %}ys.host.disk.iopsinprogress |{% elsif include.deployment == 'dedicated' %}NOT AVAILABLE |{% endif %} IO operations currently in progress on this host | This metric gives the average queue length of the storage device. It characterizes the storage device's performance capability. All I/O performance metrics are Linux counters and correspond to the `avgqu-sz` in the Linux `iostat` command output. You need to view the device queue graph in the context of the actual read/write IOPS and MBPS metrics that show the actual device utilization. If the device is not keeping up, the queue will grow. Values over 10 are bad. Values around 5 mean the device is working hard trying to keep up. For internal (on chassis) [NVMe](https://www.wikipedia.org/wiki/NVM_Express) devices, the queue values are typically 0. For network connected devices, such as [AWS EBS volumes](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-volume-types.html), the normal operating range of values is 1 to 2. Spikes in values are OK. They indicate an I/O spike where the device fell behind and then caught up. End users may experience inconsistent response times, but there should be no cluster stability issues. If the queue is greater than 5 for an extended period of time and IOPS or MBPS are low, then the storage is most likely not provisioned per Cockroach Labs guidance. In AWS EBS, it is commonly an EBS type, such as gp2, not suitable as database primary storage. If I/O is low and the queue is low, the most likely scenario is that the CPU is lacking and not driving I/O. One such case is a cluster with nodes with only 2 vcpus which is not supported [sizing]({% link {{ page.version.version }}/recommended-production-settings.md %}#sizing) for production deployments. There are quite a few background processes in the database that take CPU away from the workload, so the workload is just not getting the CPU. Review [storage and disk I/O]({% link {{ page.version.version }}/common-issues-to-monitor.md %}#storage-and-disk-i-o). |
| sys.host.disk.iopsinprogress | {% if include.deployment == 'self-hosted' %}sys.host.disk.iopsinprogress |{% elsif include.deployment == 'dedicated' %}NOT AVAILABLE |{% endif %} IO operations currently in progress on this host | This metric gives the average queue length of the storage device. It characterizes the storage device's performance capability. All I/O performance metrics are Linux counters and correspond to the `avgqu-sz` in the Linux `iostat` command output. You need to view the device queue graph in the context of the actual read/write IOPS and MBPS metrics that show the actual device utilization. If the device is not keeping up, the queue will grow. Values over 10 are bad. Values around 5 mean the device is working hard trying to keep up. For internal (on chassis) [NVMe](https://www.wikipedia.org/wiki/NVM_Express) devices, the queue values are typically 0. For network connected devices, such as [AWS EBS volumes](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-volume-types.html), the normal operating range of values is 1 to 2. Spikes in values are OK. They indicate an I/O spike where the device fell behind and then caught up. End users may experience inconsistent response times, but there should be no cluster stability issues. If the queue is greater than 5 for an extended period of time and IOPS or MBPS are low, then the storage is most likely not provisioned per Cockroach Labs guidance. In AWS EBS, it is commonly an EBS type, such as gp2, not suitable as database primary storage. If I/O is low and the queue is low, the most likely scenario is that the CPU is lacking and not driving I/O. One such case is a cluster with nodes with only 2 vcpus which is not supported [sizing]({% link {{ page.version.version }}/recommended-production-settings.md %}#sizing) for production deployments. There are quite a few background processes in the database that take CPU away from the workload, so the workload is just not getting the CPU. Review [storage and disk I/O]({% link {{ page.version.version }}/common-issues-to-monitor.md %}#storage-and-disk-i-o). |
| sys.host.net.recv.bytes | sys.host.net.recv.bytes | Bytes received on all network interfaces since this process started | This metric gives the node's ingress/egress network transfer rates for flat sections which may indicate insufficiently provisioned networking or high error rates. CockroachDB is using a reliable TCP/IP protocol, so errors result in delivery retries that create a "slow network" effect. |
| sys.host.net.send.bytes | sys.host.net.send.bytes | Bytes sent on all network interfaces since this process started | This metric gives the node's ingress/egress network transfer rates for flat sections which may indicate insufficiently provisioned networking or high error rates. CockroachDB is using a reliable TCP/IP protocol, so errors result in delivery retries that create a "slow network" effect. |
| clock-offset.meannanos | clock.offset.meannanos | Mean clock offset with other nodes | This metric gives the node's clock skew. In a well-configured environment, the actual clock skew would be in the sub-millisecond range. A skew exceeding 5 ms is likely due to a NTP service mis-configuration. Reducing the actual clock skew reduces the probability of uncertainty related conflicts and corresponding retires which has a positive impact on workload performance. Conversely, a larger actual clock skew increases the probability of retries due to uncertainty conflicts, with potentially measurable adverse effects on workload performance. |
Expand Down

0 comments on commit 63a9943

Please sign in to comment.