Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate if Harvest should collect workload_queue_nblade and workload_queue_dblade #2427

Closed
cgrinds opened this issue Oct 13, 2023 · 5 comments
Labels
25.02 feature New feature or request

Comments

@cgrinds
Copy link
Collaborator

cgrinds commented Oct 13, 2023

These objects are smaller and less work to collect than workload_detail_volume. Investigate if they can replace any of the workload templates.

bin/harvest zapi -p sar show counters --object workload_queue_dblade | dasel -r xml -w json
bin/harvest zapi -p sar show counters --object workload_queue_nblade | dasel -r xml -w json
@rahulguptajss
Copy link
Contributor

rahulguptajss commented Nov 16, 2023

The workload_queue_dblade and workload_queue_nblade metrics do not provide file, qtree, and LUN level workload metrics. Instead, they primarily focus on volume workload metrics alone. workload_detail object provides both service time and wait time at each subsystem. In comparison, workload_queue_dblade and workload_queue_nblade mainly offer wait time measurements at various subsystems.

When we perform a ZapiPerf call on this particular object, we encounter a challenge. Each query we execute may return metrics for a different node associated with the relevant instance. This variability makes it difficult to process the raw values effectively. For reference, I have attached a data profile of an instance, which demonstrates the issue by showing a 1-minute gap between data points.

output_nblade.xlsx

Below are the metrics provides by these objects.

NBlade

archive_uuid_hi
archive_uuid_lo
cpu_nblade_residence_time
cpu_nblade_util_service_time
delay_avscan_num_visits
delay_avscan_wait_time
delay_cluster_interconnect_wait_time
delay_network_wait_time
delay_qos_limit_wait_time
delay_qos_min_throughput_wait_time
instance_name
instance_uuid
iops
node_name
node_uuid
process_name
read_data
total_latency
write_data

DBlade

archive_uuid_hi
archive_uuid_lo
cache_miss_rate
cache_miss_rate_base
cpu_dblade_background_service_time
cpu_dblade_residence_time
cpu_dblade_util_service_time
delay_cloud_io_wait_time
delay_cop_num_wait
delay_cop_wait_time
delay_disk_io_wait_time
delay_flexcache_ral_wait_time
delay_flexcache_spinhi_wait_time
delay_nvlog_transfer_wait_time
delay_sync_repl_wait_time
delay_wafl_adm_ctrl_wait_time
delay_wafl_susp_cp_wait_time
delay_wafl_susp_other_wait_time
delay_wem_visits
delay_wem_wait_time
disk_hdd_background_service_time
disk_ssd_background_service_time
instance_name
instance_uuid
node_name
node_uuid
process_name

@rahulguptajss rahulguptajss removed their assignment Nov 16, 2023
@cgrinds
Copy link
Collaborator Author

cgrinds commented Nov 16, 2023

Given @rahulguptajss investigation. This doesn't make sense for Harvest at the moment.

@cgrinds
Copy link
Collaborator Author

cgrinds commented Dec 10, 2024

Reopening to investigate if nblade/dblade counters can be used to provide node-level workload metrics.

@rahulguptajss
Copy link
Contributor

Below is the summary of findings.

  1. The workload_queue_nblade|dblade objects provide data only for a single node. To collect data for all nodes, we need to pass a node filter, which must be done seperately for each node.
  2. The dblade needs to use IOPS from nblade for latency calculation, which will introduce skew in the latency calculations.
  3. Lun, qtree, and file metrics are not available via nblade or dblade as they are volume-scoped.

@rahulguptajss rahulguptajss removed their assignment Dec 17, 2024
@rahulguptajss rahulguptajss linked a pull request Dec 17, 2024 that will close this issue
@rahulguptajss
Copy link
Contributor

This is no longer needed, as mentioned in the discussion here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
25.02 feature New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants