Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

invalid metric name or label names: mcac_hints_hint_delays #88

Open
mboyd1 opened this issue Feb 13, 2023 · 8 comments
Open

invalid metric name or label names: mcac_hints_hint_delays #88

mboyd1 opened this issue Feb 13, 2023 · 8 comments

Comments

@mboyd1
Copy link

mboyd1 commented Feb 13, 2023

prometheus, version 2.42.0
cassandra 4.0.5
datastax-mcac-agent-0.3.4

image

prometheus is not ingesting data from the nodes in my cluster, showing error:

invalid metric name or label names: {__name__="mcac_hints_hint_delays.192.168.1.14.7000_total", cluster="Test Cluster", dc="datacenter1", instance="192.168.1.13", job="mcac", mcac="org.apache.cassandra.metrics.hints_service.hint_delays.192.168.1.14.7000", mcac_filtered="true", rack="rack1"}

I am using the provided prometheus.yml. I believe the relevant section is:

#HintService Metrics - source_labels: ["mcac"] regex: org\.apache\.cassandra\.metrics\.hints_service\.hints_delays\-(\w+) target_label: peer_ip replacement: ${1} - source_labels: ["mcac"] regex: org\.apache\.cassandra\.metrics\.hints_service\.hints_delays\-(\w+) target_label: __name__ replacement: mcac_hints_hints_delays

some metrics from the mcac endpoint:
collectd_mcac_histogram_count_total{mcac="org.apache.cassandra.metrics.hints_service.hint_delays",instance="192.168.1.12",mcac_filtered="true",cluster="Test Cluster",dc="datacenter1",rack="rack1"} 205 1676297608653 collectd_mcac_histogram_count_total{mcac="org.apache.cassandra.metrics.hints_service.hint_delays.192.168.1.13.7000",instance="192.168.1.12",mcac_filtered="true",cluster="Test Cluster",dc="datacenter1",rack="rack1"} 61 1676297608711 collectd_mcac_histogram_count_total{mcac="org.apache.cassandra.metrics.hints_service.hint_delays.192.168.1.14.7000",instance="192.168.1.12",mcac_filtered="true",cluster="Test Cluster",dc="datacenter1",rack="rack1"} 6 1676297608981 collectd_mcac_histogram_count_total{mcac="org.apache.cassandra.metrics.hints_service.hint_delays.192.168.1.16.7000",instance="192.168.1.12",mcac_filtered="true",cluster="Test Cluster",dc="datacenter1",rack="rack1"} 59 1676297608700 collectd_mcac_histogram_count_total{mcac="org.apache.cassandra.metrics.hints_service.hint_delays.192.168.1.17.7000",instance="192.168.1.12",mcac_filtered="true",cluster="Test Cluster",dc="datacenter1",rack="rack1"} 79 1676297609092

first I noticed the metrics show 'hint_delays' while the prometheus.yml show 'hints_delays' (plural hints vs hint), but changing hints to hint in prometheus.yaml and restarting didn't change anything. I'm not sure how to alter the regex or whatever needs fixing to get past this

@icellan
Copy link

icellan commented Mar 1, 2023

+1

@RomainAnselin
Copy link

Hi everyone, I hit this today with the dse metrics collector. Appear that was identified too in the k8ssandra operator.
I was able to fix it on my front applying the change shown in the last part of the commit here on pkg/telemetry/prom_cass_servicemonitor.go into my prometheus.yml:
k8ssandra/k8ssandra-operator@0217829

Namely, change this
regex: org\.apache\.cassandra\.metrics\.hints_service\.hints_delays\-(\w+)
to this
regex: org\.apache\.cassandra\.metrics\.hints_service\.hint_delays[\-\.]([\w\.]+)

Would appreciate if you can attempt this in your env and let me know if it fixes it for you too

@RomainAnselin
Copy link

For information, the above issue appear to be very close to this one:
#52
https://github.com/datastax/metric-collector-for-apache-cassandra/pull/53/commits
Somehow the merge request never made it through.

On DSE metrics collector (the parent of MCAC), I was able to fix the hint_delays by re-ordering the prometheus metrics parser and changing the regex a bit
This is not valid for MCAC as such but for reference, the dse change I got it running with - which leverage both the k8 fix information I used above, along the MCAC changes made in #52 for hints_created. While not directly applicable to MCAC, the change should be, substituting dse strings with mcac in the below snippet.

     #HintService Metrics
     - source_labels: ["dse"]
       regex: org\.apache\.cassandra\.metrics\.hints_service\.([^\-]+)
       target_label: __name__
       replacement: dse_hints_${1}
     - source_labels: ["dse"]
       regex: org\.apache\.cassandra\.metrics\.hints_service\.hint_delays\.([\w\.]+)
       target_label: peer_ip
       replacement: ${1}
     - source_labels: ["dse"]
       regex: org\.apache\.cassandra\.metrics\.hints_service\.hint_delays\.([\w\.]+)
       target_label: __name__
       replacement: dse_hints_hint_delays```

Note I haven't tested further at this time how it reflects on the graph when nodes go down - which is one scenario where these graph snippets would get data as hints would start creating

@zyxep
Copy link

zyxep commented Oct 2, 2023

regex: org.apache.cassandra.metrics.hints_service.hint_delays-.

I just tried that change.

     - source_labels: ["mcac"]
#       regex: org\.apache\.cassandra\.metrics\.hints_service\.hints_delays\-(\w+)
       regex: org\.apache\.cassandra\.metrics\.hints_service\.hint_delays[\-\.]([\w\.]+)
       target_label: peer_ip
       replacement: ${1}
     - source_labels: ["mcac"]
#       regex: org\.apache\.cassandra\.metrics\.hints_service\.hints_delays\-(\w+)
       regex: org\.apache\.cassandra\.metrics\.hints_service\.hint_delays[\-\.]([\w\.]+)
       target_label: __name__
       replacement: mcac_hints_hints_delays

and it did not fix it, i'm on prometheus 2.45.0 and MCAC 0.3.4.
My full scrape job is this:

  - job_name: "mcac"
    scrape_interval: 15s
    scrape_timeout:  15s
    honor_labels: true
    file_sd_configs:
      - files:
        - 'cassandra_target.json'
    metric_relabel_configs:
     #drop metrics we can calculate from prometheus directly
     - source_labels: [__name__]
       regex: .*rate_(mean|1m|5m|15m)
       action: drop
     #save the original name for all metrics
     - source_labels: [__name__]
       regex: (collectd_mcac_.+)
       target_label: prom_name
       replacement: ${1}
     - source_labels: ["prom_name"]
       regex: .+_bucket_(\d+)
       target_label: le
       replacement: ${1}
     - source_labels: ["prom_name"]
       regex: .+_bucket_inf
       target_label: le
       replacement: +Inf
     - source_labels: ["prom_name"]
       regex: .*_histogram_p(\d+)
       target_label: quantile
       replacement: .${1}
     - source_labels: ["prom_name"]
       regex: .*_histogram_min
       target_label: quantile
       replacement: "0"
     - source_labels: ["prom_name"]
       regex: .*_histogram_max
       target_label: quantile
       replacement: "1"
     #Table Metrics *ALL* we can drop
     - source_labels: ["mcac"]
       regex: org\.apache\.cassandra\.metrics\.table\.(\w+)
       action: drop
     #Table Metrics
     - source_labels: ["mcac"]
       regex: org\.apache\.cassandra\.metrics\.table\.(\w+)\.(\w+)\.(\w+)
       target_label: table
       replacement: ${3}
     - source_labels: ["mcac"]
       regex: org\.apache\.cassandra\.metrics\.table\.(\w+)\.(\w+)\.(\w+)
       target_label: keyspace
       replacement: ${2}
     - source_labels: ["mcac"]
       regex: org\.apache\.cassandra\.metrics\.table\.(\w+)\.(\w+)\.(\w+)
       target_label: __name__
       replacement: mcac_table_${1}
     #Keyspace Metrics
     - source_labels: ["mcac"]
       regex: org\.apache\.cassandra\.metrics\.keyspace\.(\w+)\.(\w+)
       target_label: keyspace
       replacement: ${2}
     - source_labels: ["mcac"]
       regex: org\.apache\.cassandra\.metrics\.keyspace\.(\w+)\.(\w+)
       target_label: __name__
       replacement: mcac_keyspace_${1}
     #ThreadPool Metrics (one type is repair.task so we just ignore the second part)
     - source_labels: ["mcac"]
       regex: org\.apache\.cassandra\.metrics\.thread_pools\.(\w+)\.(\w+)\.(\w+).*
       target_label: pool_type
       replacement: ${2}
     - source_labels: ["mcac"]
       regex: org\.apache\.cassandra\.metrics\.thread_pools\.(\w+)\.(\w+)\.(\w+).*
       target_label: pool_name
       replacement: ${3}
     - source_labels: ["mcac"]
       regex: org\.apache\.cassandra\.metrics\.thread_pools\.(\w+)\.(\w+)\.(\w+).*
       target_label: __name__
       replacement: mcac_thread_pools_${1}
     #ClientRequest Metrics
     - source_labels: ["mcac"]
       regex: org\.apache\.cassandra\.metrics\.client_request\.(\w+)\.(\w+)$
       target_label: request_type
       replacement: ${2}
     - source_labels: ["mcac"]
       regex: org\.apache\.cassandra\.metrics\.client_request\.(\w+)\.(\w+)$
       target_label: __name__
       replacement: mcac_client_request_${1}
     - source_labels: ["mcac"]
       regex: org\.apache\.cassandra\.metrics\.client_request\.(\w+)\.(\w+)\.(\w+)$
       target_label: cl
       replacement: ${3}
     - source_labels: ["mcac"]
       regex: org\.apache\.cassandra\.metrics\.client_request\.(\w+)\.(\w+)\.(\w+)$
       target_label: request_type
       replacement: ${2}
     - source_labels: ["mcac"]
       regex: org\.apache\.cassandra\.metrics\.client_request\.(\w+)\.(\w+)\.(\w+)$
       target_label: __name__
       replacement: mcac_client_request_${1}_cl
     #Cache Metrics
     - source_labels: ["mcac"]
       regex: org\.apache\.cassandra\.metrics\.cache\.(\w+)\.(\w+)
       target_label: cache_name
       replacement: ${2}
     - source_labels: ["mcac"]
       regex: org\.apache\.cassandra\.metrics\.cache\.(\w+)\.(\w+)
       target_label: __name__
       replacement: mcac_cache_${1}
     #CQL Metrics
     - source_labels: ["mcac"]
       regex: org\.apache\.cassandra\.metrics\.cql\.(\w+)
       target_label: __name__
       replacement: mcac_cql_${1}
     #Dropped Message Metrics
     - source_labels: ["mcac"]
       regex: org\.apache\.cassandra\.metrics\.dropped_message\.(\w+)\.(\w+)
       target_label: message_type
       replacement: ${2}
     - source_labels: ["mcac"]
       regex: org\.apache\.cassandra\.metrics\.dropped_message\.(\w+)\.(\w+)
       target_label: __name__
       replacement: mcac_dropped_message_${1}
     #Streaming Metrics
     - source_labels: ["mcac"]
       regex: org\.apache\.cassandra\.metrics\.streaming\.(\w+)\.(.+)$
       target_label: peer_ip
       replacement: ${2}
     - source_labels: ["mcac"]
       regex: org\.apache\.cassandra\.metrics\.streaming\.(\w+)\.(.+)$
       target_label: __name__
       replacement: mcac_streaming_${1}
     - source_labels: ["mcac"]
       regex: org\.apache\.cassandra\.metrics\.streaming\.(\w+)$
       target_label: __name__
       replacement: mcac_streaming_${1}
     #CommitLog Metrics
     - source_labels: ["mcac"]
       regex: org\.apache\.cassandra\.metrics\.commit_log\.(\w+)
       target_label: __name__
       replacement: mcac_commit_log_${1}
     #Compaction Metrics
     - source_labels: ["mcac"]
       regex: org\.apache\.cassandra\.metrics\.compaction\.(\w+)
       target_label: __name__
       replacement: mcac_compaction_${1}
     #Storage Metrics
     - source_labels: ["mcac"]
       regex: org\.apache\.cassandra\.metrics\.storage\.(\w+)
       target_label: __name__
       replacement: mcac_storage_${1}
     #Batch Metrics
     - source_labels: ["mcac"]
       regex: org\.apache\.cassandra\.metrics\.batch\.(\w+)
       target_label: __name__
       replacement: mcac_batch_${1}
     #Client Metrics
     - source_labels: ["mcac"]
       regex: org\.apache\.cassandra\.metrics\.client\.(\w+)
       target_label: __name__
       replacement: mcac_client_${1}
     #BufferPool Metrics
     - source_labels: ["mcac"]
       regex: org\.apache\.cassandra\.metrics\.buffer_pool\.(\w+)
       target_label: __name__
       replacement: mcac_buffer_pool_${1}
     #Index Metrics
     - source_labels: ["mcac"]
       regex: org\.apache\.cassandra\.metrics\.index\.(\w+)
       target_label: __name__
       replacement: mcac_sstable_index_${1}
     #HintService Metrics
     - source_labels: ["mcac"]
       regex: org\.apache\.cassandra\.metrics\.hinted_hand_off_manager\.([^\-]+)-(\w+)
       target_label: peer_ip
       replacement: ${2}
     - source_labels: ["mcac"]
       regex: org\.apache\.cassandra\.metrics\.hinted_hand_off_manager\.([^\-]+)-(\w+)
       target_label: __name__
       replacement: mcac_hints_${1}
     #HintService Metrics
     - source_labels: ["mcac"]
#       regex: org\.apache\.cassandra\.metrics\.hints_service\.hints_delays\-(\w+)
       regex: org\.apache\.cassandra\.metrics\.hints_service\.hint_delays[\-\.]([\w\.]+)
       target_label: peer_ip
       replacement: ${1}
     - source_labels: ["mcac"]
#       regex: org\.apache\.cassandra\.metrics\.hints_service\.hints_delays\-(\w+)
       regex: org\.apache\.cassandra\.metrics\.hints_service\.hint_delays[\-\.]([\w\.]+)
       target_label: __name__
       replacement: mcac_hints_hints_delays
     - source_labels: ["mcac"]
       regex: org\.apache\.cassandra\.metrics\.hints_service\.([^\-]+)
       target_label: __name__
       replacement: mcac_hints_${1}
     # Misc
     - source_labels: ["mcac"]
       regex: org\.apache\.cassandra\.metrics\.memtable_pool\.(\w+)
       target_label: __name__
       replacement: mcac_memtable_pool_${1}
     - source_labels: ["mcac"]
       regex: com\.datastax\.bdp\.type\.performance_objects\.name\.cql_slow_log\.metrics\.queries_latency
       target_label: __name__
       replacement: mcac_cql_slow_log_query_latency
     - source_labels: ["mcac"]
       regex: org\.apache\.cassandra\.metrics\.read_coordination\.(.*)
       target_label: read_type
       replacement: $1
     - source_labels: ["mcac"]
       regex: org\.apache\.cassandra\.metrics\.read_coordination\.(.*)
       target_label: __name__
       replacement: mcac_read_coordination_requests
     #GC Metrics
     - source_labels: ["mcac"]
       regex: jvm\.gc\.(\w+)\.(\w+)
       target_label: collector_type
       replacement: ${1}
     - source_labels: ["mcac"]
       regex: jvm\.gc\.(\w+)\.(\w+)
       target_label: __name__
       replacement: mcac_jvm_gc_${2}
     #JVM Metrics
     - source_labels: ["mcac"]
       regex: jvm\.memory\.(\w+)\.(\w+)
       target_label: memory_type
       replacement: ${1}
     - source_labels: ["mcac"]
       regex: jvm\.memory\.(\w+)\.(\w+)
       target_label: __name__
       replacement: mcac_jvm_memory_${2}
     - source_labels: ["mcac"]
       regex: jvm\.memory\.pools\.(\w+)\.(\w+)
       target_label: pool_name
       replacement: ${2}
     - source_labels: ["mcac"]
       regex: jvm\.memory\.pools\.(\w+)\.(\w+)
       target_label: __name__
       replacement: mcac_jvm_memory_pool_${2}
     - source_labels: ["mcac"]
       regex: jvm\.fd\.usage
       target_label: __name__
       replacement: mcac_jvm_fd_usage
     - source_labels: ["mcac"]
       regex: jvm\.buffers\.(\w+)\.(\w+)
       target_label: buffer_type
       replacement: ${1}
     - source_labels: ["mcac"]
       regex: jvm\.buffers\.(\w+)\.(\w+)
       target_label: __name__
       replacement: mcac_jvm_buffer_${2}
     #Append the prom types back to formatted names
     - source_labels: [__name__, "prom_name"]
       regex: (mcac_.*);.*(_micros_bucket|_bucket|_micros_count_total|_count_total|_total|_micros_sum|_sum|_stddev).*
       separator: ;
       target_label: __name__
       replacement: ${1}${2}
     - regex: prom_name
       action: labeldrop

@discostur
Copy link

discostur commented Dec 7, 2023

Problem sill exists ... just tried to change the prometheus relable config with the pull request from @zyxep but still getting the error ...

invalid metric name or label names: {__name__="mcac_hints_hints_created.192.168.1.62.7000_total", cluster="Cassandra Cluster", dc="datacenter1", instance="192.168.1.61", job="cassandra_exporter", mcac="org.apache.cassandra.metrics.hints_service.hints_created.192.168.1.62.7000", mcac_filtered="true", peer_ip="192.168.1.62.7000", rack="rack1"}

@zyxep
Copy link

zyxep commented Dec 7, 2023

Problem sill exists ... just tried to change the prometheus relable config with the pull request from @zyxep but still getting the error ...

invalid metric name or label names: {__name__="mcac_hints_hints_created.192.168.1.62.7000_total", cluster="Cassandra Cluster", dc="datacenter1", instance="192.168.1.61", job="cassandra_exporter", mcac="org.apache.cassandra.metrics.hints_service.hints_created.192.168.1.62.7000", mcac_filtered="true", peer_ip="192.168.1.62.7000", rack="rack1"}

can you post your full config?
might be a order problem.

@discostur
Copy link

ok got it working - my hint service section:

     #HintService Metrics
     - source_labels: ["mcac"]
       regex: org\.apache\.cassandra\.metrics\.hints_service\.hint_delays[\-\.]([\w\.]+)
       target_label: peer_ip
       replacement: ${1}
     - source_labels: ["mcac"]
       regex: org\.apache\.cassandra\.metrics\.hints_service\.hint_delays[\-\.]([\w\.]+)
       target_label: __name__
       replacement: mcac_hints_hints_delays
     - source_labels: [ "mcac" ]
       regex: org\.apache\.cassandra\.metrics\.hints_service\.hints_created[\-\.]([\w\.]+)
       target_label: peer_ip
       replacement: ${1}
     - source_labels: [ "mcac" ]
       regex: org\.apache\.cassandra\.metrics\.hints_service\.hints_created[\-\.]([\w\.]+)
       target_label: __name__
       replacement: mcac_hints_hint_created 
     - source_labels: ["mcac"]
       regex: org\.apache\.cassandra\.metrics\.hints_service\.([^\-]+)
       target_label: __name__
       replacement: mcac_hints_${1}

at the end you have to add the labledrop:
k8ssandra/k8ssandra-operator@0217829#diff-44a79263bc2367e3210dc852d7ae78ccc16feea69d73dd0d0bc3f6e3e69a2d8aR325

     - source_labels: [__name__]
       action: drop
       regex: (.*)\.+(.*)

@zyxep
Copy link

zyxep commented Dec 11, 2023

It could also have been an order of your labels. :) but nice that you got it to work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants