Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Harvest should create a KeyPerf collector for ONTAP REST performance counters #3008

Closed
cgrinds opened this issue Jun 21, 2024 · 4 comments · Fixed by #3078, #3194, #3215, #3219 or #3240
Closed

Harvest should create a KeyPerf collector for ONTAP REST performance counters #3008

cgrinds opened this issue Jun 21, 2024 · 4 comments · Fixed by #3078, #3194, #3215, #3219 or #3240
Assignees
Labels
25.02 feature New feature or request status/testme

Comments

@cgrinds
Copy link
Collaborator

cgrinds commented Jun 21, 2024

This issue is about creating a collector for ONTAP objects that includes a statistics or metric field in the ONTAP response. This collector is distinct from the ZapiPerf and RestPerf collectors since the shape of the ONTAP response for statistics and metric is different from the ONTAP responses for ZapiPerf and RestPerf.

In general, the statistics and metric fields include performance metrics for IOPS, latency, and throughput. The statistics metrics are raw performance counters, while the metric counters are samples over one of the predefined ranges (15 seconds, four minutes, five minutes, 30 minutes, two hours, one day).

The statistics field is more general and likely covers all of Harvest's use cases. If that's true, we may only support the statistics field and ignore the metric field. The metric field is difficult to make work with Prometheus since Prometheus controls the timestamp used to stamp the metrics, not Harvest.

Background

The statistics and metric counters are aggregated across all nodes in the cluster. These counters have existed since ONTAP 9.6, and as of ONTAP 9.15.1, the statistics field is available for the following 25 objects. The /application/applications endpoint is different from the other endpoints since /application/applications's response includes statistics but no metric, includes additional fields beyond IOPS, latency, and throughput, and also uses a different naming convention.

cat 10.193.48.154-swagger.yaml | dasel -r yaml -w json | gron | rg -F '.properties.statistics = {};' | rg -v '.xc_'
json.definitions.aggregate.properties.statistics = {};
json.definitions.application.properties.statistics = {};
json.definitions.cifs_service.properties.statistics = {};
json.definitions.cluster.properties.nodes.items.properties.statistics = {};
json.definitions.cluster.properties.statistics = {};
json.definitions.consistency_group.properties.statistics = {};
json.definitions.consistency_group_response.properties.records.items.properties.statistics = {};
json.definitions.fc_interface.properties.statistics = {};
json.definitions.fc_port.properties.statistics = {};
json.definitions.fcp_service.properties.statistics = {};
json.definitions.ip_interface.properties.statistics = {};
json.definitions.iscsi_service.properties.statistics = {};
json.definitions.lun.properties.statistics = {};
json.definitions.monitored_file.properties.statistics = {};
json.definitions.nfs_service.properties.statistics = {};
json.definitions.node.properties.statistics = {};
json.definitions.node_response.properties.records.items.properties.statistics = {};
json.definitions.nvme_namespace.properties.statistics = {};
json.definitions.nvme_service.properties.statistics = {};
json.definitions.port.properties.statistics = {};
json.definitions.qtree.properties.statistics = {};
json.definitions.s3_service.properties.statistics = {};
json.definitions.svm_ip_interface.properties.statistics = {};
json.definitions.switch_port.properties.statistics = {};
json.definitions.volume.properties.statistics = {};

Status field

The collector needs to handle all status enums:

  • ok
  • error
  • partial_no_data
  • partial_no_response
  • partial_other_error
  • negative_delta
  • not_found
  • backfilled_data
  • inconsistent_delta_time
  • inconsistent_old_data
  • partial_no_uuid

Examples

curl -k 'https://10.193.48.154/api/cluster?fields=statistics'
{
  "statistics": {
    "timestamp": "2024-06-21T14:38:22Z",
    "status": "ok",
    "latency_raw": {
      "other": 1516853741,
      "total": 3104533452181,
      "read": 2895738710563,
      "write": 207277887877
    },
    "iops_raw": {
      "read": 7660818902,
      "write": 263263046,
      "other": 4993299,
      "total": 7929075247
    },
    "throughput_raw": {
      "read": 453439274550417,
      "write": 1978439829907,
      "other": 2812081937,
      "total": 455420526462261
    }
  }
}
curl -k 'https://10.193.48.154/api/application/applications?fields=statistics'
 {
      "uuid": "dd2086bb-6289-11ee-868b-00a098d390f2",
      "name": "newvol",
      "statistics": {
        "shared_storage_pool": false,
        "space": {
          "provisioned": 22077440,
          "used": 2244608,
          "used_percent": 10,
          "used_excluding_reserves": 1142784,
          "logical_used": 2244608,
          "reserved_unused": 0,
          "available": 19832832,
          "savings": 0
        },
        "iops": {
          "total": 0,
          "per_tb": 0
        },
        "snapshot": {
          "reserve": 1101824,
          "used": 1945600
        },
        "latency": {
          "raw": 0,
          "average": 0
        },
        "components": [
          {
            "name": "newvol",
            "uuid": "dd30b964-6289-11ee-868b-00a098d390f2",
            "shared_storage_pool": false,
            "storage_service": {
              "name": "extreme",
              "uuid": "0743fa34-43b7-4a87-ba8f-96816a0590a0"
            },
            "space": {
              "provisioned": 22077440,
              "used": 2244608,
              "used_percent": 10,
              "used_excluding_reserves": 1142784,
              "logical_used": 2244608,
              "reserved_unused": 0,
              "available": 19832832,
              "savings": 0
            },
            "iops": {
              "total": 0,
              "per_tb": 0
            },
            "snapshot": {
              "reserve": 1101824,
              "used": 1945600
            },
            "latency": {
              "raw": 0,
              "average": 0
            }
          }
        ]
      }
 }

Alternative names

  • AggregatedPerf
  • DataFlow
  • EfficiencyMetrics
  • IOFlow
  • IOPerf
  • IOProfiling
  • IOVelocityMetrics
  • KeyPerf
  • KeyPerfMetrics
  • KPM (key performance metrics)
  • MetricOps
  • ObjectMetrics
  • ObjectPerf
  • OperationalMetrics
  • OperationalPerformance
  • OpMetrics
  • PerfIO
  • PerfMetrics
  • PerformanceIndicators
  • PerformanceMetrics
  • PerformanceOverview
  • PerfStatistics
  • PerfStream
  • PerfTriad
  • RawPerf
  • SimpleStats
  • SummaryStats
  • SystemEfficiency
  • SystemMetrics
  • ThruLatIOPS
@cgrinds cgrinds added feature New feature or request 24.08 labels Jun 21, 2024
@rahulguptajss rahulguptajss self-assigned this Jun 26, 2024
@rahulguptajss
Copy link
Contributor

rahulguptajss commented Jul 22, 2024

  • Infrastructure Development
  • Unit Tests
  • Asup
  • Template development
  • Dedup logic with restPerf (if required)
  • Documentation
  • Metric Documentation
  • Support filtering
  • object plugins if any
    • Add Top Client/File support in KeyPerf
  • Enable any CI tests
  • Dashboard changes if any
  • Remove unused templates
  • Add tags to dashboards

@rahulguptajss rahulguptajss linked a pull request Jul 31, 2024 that will close this issue
@cgrinds cgrinds changed the title Harvest should create a KeyPerfMetrics collector for ONTAP REST performance counters Harvest should create a KeyPerf collector for ONTAP REST performance counters Aug 2, 2024
@cgrinds cgrinds reopened this Aug 2, 2024
@cgrinds cgrinds removed the 24.08 label Aug 5, 2024
@rahulguptajss rahulguptajss removed their assignment Sep 10, 2024
@rahulguptajss rahulguptajss self-assigned this Sep 26, 2024
@rahulguptajss rahulguptajss linked a pull request Oct 4, 2024 that will close this issue
@rahulguptajss rahulguptajss reopened this Oct 8, 2024
@rahulguptajss rahulguptajss linked a pull request Oct 18, 2024 that will close this issue
@rahulguptajss rahulguptajss reopened this Oct 22, 2024
@rahulguptajss rahulguptajss linked a pull request Oct 23, 2024 that will close this issue
@rahulguptajss rahulguptajss reopened this Oct 23, 2024
@rahulguptajss rahulguptajss added 25.02 and removed 24.11 labels Nov 4, 2024
@rahulguptajss
Copy link
Contributor

moving remaining work to next release

@rahulguptajss rahulguptajss linked a pull request Nov 6, 2024 that will close this issue
@rahulguptajss
Copy link
Contributor

Below dashboards can be excluded from KeyPerf tagging

cmode/external_service_op.json
cmode/headroom.json
cmode/lun.json
cmode/mcc_cluster.json
cmode/namespace.json
cmode/nfs4storePool.json
cmode/workload.json
cmode/vscan.json
cmode/smb.json
cmode/s3ObjectStorage.json
cmode/nfsTroubleshooting.json (only few panels will work)
cmode/nfs4storePool.json
cmode/network.json

@rahulguptajss rahulguptajss linked a pull request Nov 21, 2024 that will close this issue
@cgrinds
Copy link
Collaborator Author

cgrinds commented Dec 2, 2024

Dashboards will be handled in a separate PR. Closing

@cgrinds cgrinds closed this as completed Dec 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment