-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cassandra metric endpoint creates duplicates #1345
Cassandra metric endpoint creates duplicates #1345
Comments
Same issue with k8ssandra-operator 1.16.0, default relabelling config and following telemetry config in k8ssandra-cluster custom resource:
|
Adding config apiVersion: k8ssandra.io/v1alpha1
kind: K8ssandraCluster
metadata:
name: cassandra
spec:
cassandra:
metadata:
pods:
annotations:
argocd.argoproj.io/tracking-id: cassandra:k8ssandra.io/K8ssandraCluster:test/cassandra
argocd.argoproj.io/compare-options: IgnoreExtraneous
serverVersion: "4.0.9"
serverImage: k8ssandra/cass-management-api:4.0.9
telemetry:
mcac:
enabled: false
cassandra:
endpoint:
address: 0.0.0.0
port: '9000'
prometheus:
enabled: true
commonLabels:
prometheus: main
resources:
requests:
memory: 6Gi
cpu: "2"
limits:
memory: 8Gi
storageConfig:
cassandraDataVolumeClaimSpec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 200Gi
config:
jvmOptions:
heapSize: 4096M
# Enable internode communication encryption
cassandraYaml:
server_encryption_options:
internode_encryption: all
audit_logging_options:
enabled: true
logger:
- class_name: FileAuditLogger
max_log_size: 1073741824
datacenters:
- metadata:
name: dc1
size: 3
racks:
- name: r1
nodeAffinityLabels:
topology.kubernetes.io/zone: node-a
- name: r2
nodeAffinityLabels:
topology.kubernetes.io/zone: node-b
- name: r3
nodeAffinityLabels:
topology.kubernetes.io/zone: node-c
mgmtAPIHeap: 128Mi
containers:
- name: cassandra
lifecycle:
postStart:
exec:
command:
- /cql-scripts/init-csql.sh
volumeMounts:
- name: cql-scripts
mountPath: /cql-scripts
env:
- name: CQLSH_HOST
value: cassandra-dc1-service
envFrom:
- secretRef:
name: cassandra-superuser
extraVolumes:
volumes:
- name: cql-scripts
configMap:
name: cassandra-schema
defaultMode: 0755
# Enable internode communication encryption
serverEncryptionStores:
# should contain cert and key
keystoreSecretRef:
name: cassandra-server-tls
key: keystore.jks
keystorePasswordSecretRef:
name: cassandra-server-keystore-password
key: keystorePassword
# should contain CA
truststoreSecretRef:
name: cassandra-server-tls
key: truststore.jks
truststorePasswordSecretRef:
name: cassandra-server-keystore-password
key: keystorePassword
reaper:
telemetry:
prometheus:
enabled: true
commonLabels:
prometheus: main
resources:
requests:
memory: 500Mi
cpu: 25m
limits:
memory: 700Mi
cpu: 300m |
@burmanm, could you check this please? |
Is something adding duplicate rewrite rules then? I can't replicate this on my own instances:
|
Hi @burmanm ! On our instances, it's not the only metric concerned and the set of metrics and samples varies per instance & cluster Here is the command I use curl -s http://localhost:9000/metrics | grep -v '^#' | sed 's/ [0-9E.-]*$//' | sort | uniq -c | grep -vE '^\s*1\s' EDIT with some extracts : Example 1
Example 2
Example 3 (truncated)
|
Thanks for those examples, since I'm struggling to repeat this on my own instances - I did verify with all the metrics on my own instances also. Reproducing would make debugging much easier, so I could track where this happens. Did you also use 4.0.* ? |
We're using 4.1.2 currently (with k8ssandra-operator 1.16.0 & cass-operator 1.20.0), medusa & reaper are also enabled. We enable telemetry for both
It could be useful to debug if we could provide the raw samples before/without relabelling I guess, but I'm not sure it's easily feasible ? |
It would require a separate build with quite a lot of modifications to keep both versions available. But I was able to find a node in our example clusters which behaves the same way creating duplicates so I have something to work with. |
So far I've seen on my local instances only 2 duplicates, caused by something I can't explain (nor is it clear to me in the Cassandra source code since they should have different ids). In any case, two different threads registered to identical metrics for some reason and that caused it to emit duplicate ones also. That doesn't necessarily explain why one my instances has 888 copies of a single metric. I'd like to understand why this happens to ensure I'm not missing some vital behavior, but as of now I'm assuming my PR for the management-api should fix this issue without breaking anything existing. One bug would still remain after that, which is the regexp parsing of the metric. Parsing tokens with (\w+) does not catch strings which have a dash in them like some Cassandra metrics, so for example the following test will fail: @Test
public void parseNameWithDash() {
String dropWizardName =
"org.apache.cassandra.metrics.ThreadPools.ActiveTasks.transport.Native-Transport-Requests";
Configuration config = ConfigReader.readConfig();
CassandraMetricNameParser parser =
new CassandraMetricNameParser(Arrays.asList(""), Arrays.asList(""), config);
CassandraMetricDefinition metricDefinition =
parser.parseDropwizardMetric(dropWizardName, "", new ArrayList<>(), new ArrayList<>());
assertEquals(
"org_apache_cassandra_metrics_thread_pools_active_tasks", metricDefinition.getMetricName());
Map<String, String> labels =
labelsToMap(metricDefinition.getLabelNames(), metricDefinition.getLabelValues());
assertEquals("Native-Transport-Requests", labels.get("pool_name"));
} Since - regex: "org\\.apache\\.cassandra\\.metrics\\.ThreadPools\\.(\\w+)\\.(\\w+)\\.(\\w+).*"
replacement: $3
sourceLabels:
- __origname__
targetLabel: pool_name Perhaps the correct one would be |
Hello, |
We can see that cassandra metric endpoits creates duplicate metrics
Operator helm version 1.6.1
The text was updated successfully, but these errors were encountered: