-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Default metrics filters for new metrics agent #834
Default metrics filters for new metrics agent #834
Conversation
Codecov Report
@@ Coverage Diff @@
## main #834 +/- ##
==========================================
- Coverage 57.16% 56.94% -0.23%
==========================================
Files 95 95
Lines 9257 9242 -15
==========================================
- Hits 5292 5263 -29
- Misses 3507 3523 +16
+ Partials 458 456 -2
|
0aa528a
to
a3c1f0b
Compare
0f1b70d
to
dd8226a
Compare
632c746
to
a90c121
Compare
I'm a bit concerned that we introduce a breaking change here. Apparently in the course of delivering the metrics agent work, the config field As a result, I have updated the CR's API so that filters is now I'm also shifting the |
}, | ||
{ | ||
SourceLabels: []string{"__origname__"}, | ||
Regex: "org\\.apache\\.cassandra\\.metrics\\.table.*", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This metric name does not exists in Cassandra.
}, | ||
{ | ||
SourceLabels: []string{"__origname__"}, | ||
Regex: "org\\.apache\\.cassandra\\.metrics\\.table\\.live_ss_table_count", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This metric name does not exists in Cassandra.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am building to the original MCAC filters where that metric name is filtered. Perhaps this metrics differs in name between Cassandra versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, MCAC just modified the names to be non-original. The real documentation (as mentioned here: https://github.com/k8ssandra/management-api-for-apache-cassandra/blob/master/management-api-agent-common/src/main/resources/default-metric-settings.yaml#L2) is https://cassandra.apache.org/doc/latest/cassandra/operating/metrics.html
}, | ||
{ | ||
SourceLabels: []string{"__origname__"}, | ||
Regex: "org\\.apache\\.cassandra\\.metrics\\.table\\.live_disk_space_used", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This metric name does not exists in Cassandra.
}, | ||
{ | ||
SourceLabels: []string{"__origname__"}, | ||
Regex: "org\\.apache\\.cassandra\\.metrics\\.table\\.read", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This metric name does not exists in Cassandra.
}, | ||
{ | ||
SourceLabels: []string{"__origname__"}, | ||
Regex: "org\\.apache\\.cassandra\\.metrics\\.table\\.write", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This metric name does not exists in Cassandra.
}, | ||
{ | ||
SourceLabels: []string{"__origname__"}, | ||
Regex: "org\\.apache\\.cassandra\\.metrics\\.table\\.range", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This metric name does not exists in Cassandra.
}, | ||
{ | ||
SourceLabels: []string{"__origname__"}, | ||
Regex: "org\\.apache\\.cassandra\\.metrics\\.table\\.coordinator", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This metric name does not exists in Cassandra.
}, | ||
{ | ||
SourceLabels: []string{"__origname__"}, | ||
Regex: "org\\.apache\\.cassandra\\.metrics\\.table\\.dropped_mutations", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This metric name does not exists in Cassandra.
cm, err := Cfg.GetTelemetryAgentConfigMap() | ||
println(cm.Data) | ||
assert.NoError(t, err) | ||
assert.Equal(t, expectedCm.Data["metric-collector.yaml"], cm.Data["metric-collector.yaml"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The correct filename is /configs/metrics-collector.yaml
, notice the s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That isn't what is specified here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's MCAC?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops... I'm glad I mentioned this since I need to change some documentation back as I think it refers to MCAC, not the new collector.
Thanks for clearing this up for me anyway, didn't realise this had also changed from the original PR.
@@ -43,7 +131,8 @@ func (c Configurator) GetTelemetryAgentConfigMap() (*corev1.ConfigMap, error) { | |||
var yamlData []byte | |||
var err error | |||
if c.TelemetrySpec.Cassandra != nil { | |||
yamlData, err = yaml.Marshal(&c.TelemetrySpec.Cassandra) | |||
mergedSpec := goalesce.MustDeepMerge(&defaultAgentConfig, c.TelemetrySpec.Cassandra) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adejanovski and I have a slight problem with this. If the fields are merged, what's the ordering here? Can user override everything that's set in the defaults? How are they merged, before, after, what about duplicates?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I confirm that my expectation here is that any custom filter from users would override all the default filters. They shouldn't get merged with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And that's exactly how this works. MustDeepMerge uses an atomic merge for slices by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! I wasn't sure about this because Goalesce is very powerful and has a lot of different merge techniques.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, in this case I'd say the "Merge" is a bit of a confusing (sorry, insanely confusing) name for a method (and perhaps this is why this would require a comment to say we don't ever want to merge these despite calling a function to merge). It's not merging anything. It either returns a copy of defaultAgentConfig (if c.TelemetrySpec.Cassandra is nil) or a copy of c.TelemetrySpec.Cassandra (if it isn't nil), it never merges values from those two.
Now finding out the intended behavior requires to go through 4-5 jumps in files while reading which method to jump to next in the goalesce. And even then next one has no idea what's the intention - was it a bug that it didn't merge when one user asks "why isn't it merging". The fact that the behavior depends of the type (map or slice have different behaviors, so if for whatever reason someone changes the slices to be maps for some preprocessing requirements then the entire behavior will change) makes it even more difficult to remember these.
Port: "9000", | ||
Address: "127.0.0.1", | ||
}, | ||
Relabels: []promapi.RelabelConfig{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adejanovski can you please take a look at this to confirm that I've captured the intention behind the original filters?
Some metric names present quite differently (e.g. org.apache.cassandra.metrics.table.read doesn't appear directly anymore, now it seems to just be read_latency, similar with org.apache.cassandra.metrics.Table.LiveSSTableCount which now appears to be ... live_ss_table_count).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The behavior looks different than what was done by the original filters.
Instead of dropping by default all org_apache_cassandra_metrics_table.*
metrics, you're dropping everything that has a table
label, and then allow specific metrics by changing the should_drop
label value.
Based on my testing, none of the metrics with a table
label survives these filters, despite the following rules.
With the default filters on, I can only see 3 lines when looking for for org_apache_cassandra_metrics_table_live_ss_table_count
:
# HELP org_apache_cassandra_metrics_table_live_ss_table_count_all
# TYPE org_apache_cassandra_metrics_table_live_ss_table_count_all gauge
org_apache_cassandra_metrics_table_live_ss_table_count_all{host="67140d82-7ae5-45a0-b3c8-cac99530fde2",instance="172.24.0.4",cluster="test",datacenter="dc1",rack="default",pod_name="test-dc1-default-sts-1",node_name="k8ssandra-0-worker2",} 12.0
But without the filters, I see 45 such lines. Here's a sample:
# HELP org_apache_cassandra_metrics_table_live_ss_table_count
# TYPE org_apache_cassandra_metrics_table_live_ss_table_count gauge
org_apache_cassandra_metrics_table_live_ss_table_count{host="1f8db44a-2fb4-4a65-8f60-82fdb75e3b07",instance="172.24.0.5",cluster="test",datacenter="dc1",rack="default",pod_name="test-dc1-default-sts-0",node_name="k8ssandra-0-worker3",keyspace="system",table="built_views",} 0.0
org_apache_cassandra_metrics_table_live_ss_table_count{host="1f8db44a-2fb4-4a65-8f60-82fdb75e3b07",instance="172.24.0.5",cluster="test",datacenter="dc1",rack="default",pod_name="test-dc1-default-sts-0",node_name="k8ssandra-0-worker3",keyspace="system",table="view_builds_in_progress",} 0.0
org_apache_cassandra_metrics_table_live_ss_table_count{host="1f8db44a-2fb4-4a65-8f60-82fdb75e3b07",instance="172.24.0.5",cluster="test",datacenter="dc1",rack="default",pod_name="test-dc1-default-sts-0",node_name="k8ssandra-0-worker3",keyspace="system_traces",table="sessions",} 0.0
org_apache_cassandra_metrics_table_live_ss_table_count{host="1f8db44a-2fb4-4a65-8f60-82fdb75e3b07",instance="172.24.0.5",cluster="test",datacenter="dc1",rack="default",pod_name="test-dc1-default-sts-0",node_name="k8ssandra-0-worker3",keyspace="system_schema",table="aggregates",} 2.0
org_apache_cassandra_metrics_table_live_ss_table_count{host="1f8db44a-2fb4-4a65-8f60-82fdb75e3b07",instance="172.24.0.5",cluster="test",datacenter="dc1",rack="default",pod_name="test-dc1-default-sts-0",node_name="k8ssandra-0-worker3",keyspace="system",table="available_ranges",} 0.0
org_apache_cassandra_metrics_table_live_ss_table_count{host="1f8db44a-2fb4-4a65-8f60-82fdb75e3b07",instance="172.24.0.5",cluster="test",datacenter="dc1",rack="default",pod_name="test-dc1-default-sts-0",node_name="k8ssandra-0-worker3",keyspace="system",table="size_estimates",} 0.0
org_apache_cassandra_metrics_table_live_ss_table_count{host="1f8db44a-2fb4-4a65-8f60-82fdb75e3b07",instance="172.24.0.5",cluster="test",datacenter="dc1",rack="default",pod_name="test-dc1-default-sts-0",node_name="k8ssandra-0-worker3",keyspace="system_schema",table="dropped_columns",} 2.0
org_apache_cassandra_metrics_table_live_ss_table_count{host="1f8db44a-2fb4-4a65-8f60-82fdb75e3b07",instance="172.24.0.5",cluster="test",datacenter="dc1",rack="default",pod_name="test-dc1-default-sts-0",node_name="k8ssandra-0-worker3",keyspace="system_auth",table="role_members",} 0.0
org_apache_cassandra_metrics_table_live_ss_table_count{host="1f8db44a-2fb4-4a65-8f60-82fdb75e3b07",instance="172.24.0.5",cluster="test",datacenter="dc1",rack="default",pod_name="test-dc1-default-sts-0",node_name="k8ssandra-0-worker3",keyspace="system",table="transferred_ranges_v2",} 0.0
org_apache_cassandra_metrics_table_live_ss_table_count{host="1f8db44a-2fb4-4a65-8f60-82fdb75e3b07",instance="172.24.0.5",cluster="test",datacenter="dc1",rack="default",pod_name="test-dc1-default-sts-0",node_name="k8ssandra-0-worker3",keyspace="system",table="peers",} 2.0
org_apache_cassandra_metrics_table_live_ss_table_count{host="1f8db44a-2fb4-4a65-8f60-82fdb75e3b07",instance="172.24.0.5",cluster="test",datacenter="dc1",rack="default",pod_name="test-dc1-default-sts-0",node_name="k8ssandra-0-worker3",keyspace="system_schema",table="tables",} 2.0
org_apache_cassandra_metrics_table_live_ss_table_count{host="1f8db44a-2fb4-4a65-8f60-82fdb75e3b07",instance="172.24.0.5",cluster="test",datacenter="dc1",rack="default",pod_name="test-dc1-default-sts-0",node_name="k8ssandra-0-worker3",keyspace="system_auth",table="roles",} 3.0
org_apache_cassandra_metrics_table_live_ss_table_count{host="1f8db44a-2fb4-4a65-8f60-82fdb75e3b07",instance="172.24.0.5",cluster="test",datacenter="dc1",rack="default",pod_name="test-dc1-default-sts-0",node_name="k8ssandra-0-worker3",keyspace="system",table="peer_events_v2",} 0.0
org_apache_cassandra_metrics_table_live_ss_table_count{host="1f8db44a-2fb4-4a65-8f60-82fdb75e3b07",instance="172.24.0.5",cluster="test",datacenter="dc1",rack="default",pod_name="test-dc1-default-sts-0",node_name="k8ssandra-0-worker3",keyspace="system",table="peer_events",} 0.0
org_apache_cassandra_metrics_table_live_ss_table_count{host="1f8db44a-2fb4-4a65-8f60-82fdb75e3b07",instance="172.24.0.5",cluster="test",datacenter="dc1",rack="default",pod_name="test-dc1-default-sts-0",node_name="k8ssandra-0-worker3",keyspace="system",table="local",} 1.0
The above metrics should survive the filters, given the name of the metrics, but they don't 🤔
So, too many metrics are being dropped.
Switching the first relabeling to:
- regex: org_apache_cassandra_metrics_table.*
replacement: "true"
sourceLabels:
- __name__
targetLabel: should_drop
generates even weirder behaviors, since I now lose all the live_ss_table_count
metrics, but end up with metrics that have both should_drop=true
and should_drop=false
😆
org_apache_cassandra_metrics_table_all_memtables_live_data_size_all{host="af660e57-38c4-4863-af8a-31c08a4314dc",instance="172.24.0.4",cluster="test",datacenter="dc1",rack="default",pod_name="test-dc1-default-sts-0",node_name="k8ssandra-0-worker2",should_drop="true",should_drop="false",} 125755.0
@burmanm, there seems to be something wrong with the relabelings here.
Could you tell us what you think?
I've done some manual testing and this is ready for merge, subject to my interpretation of the original metrics filters (which I've commented on above) being right. |
22fbc85
to
ff4650c
Compare
--- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like these docs were removed in a previous PR, I'm not sure if that was a good idea since some folks still need to use MCAC given that older patch versions aren't supported by the new metrics agent.
Just getting thoughts here, not insisting it be left in.
@@ -79,7 +162,7 @@ func (c Configurator) ReconcileTelemetryAgentConfig(dc *cassdcapi.CassandraDatac | |||
recRes := reconciliation.ReconcileObject(c.Ctx, c.RemoteClient, c.RequeueDelay, *desiredCm) | |||
switch { | |||
case recRes.IsError(): | |||
fallthrough |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@burmanm why did you change this to fallthrough? This will give you a Done
result AFAIK, which is wrong if the ConfigMap reconciliation has failed.
I have a few questions in my comments RE docs, and some changes that I didn't previously catch to the configmap reconciliation. Beyond that, I believe that this now works. |
--- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This content was intentionally moved to make this page a ToC, and spread the monitoring tasks over multiple pages: https://docs-staging.k8ssandra.io/tasks/monitor/
This doc is here now and has been updated to reflect the latest changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I'll delete it again, thanks!
Port: "9000", | ||
Address: "127.0.0.1", | ||
}, | ||
Relabels: []promapi.RelabelConfig{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The behavior looks different than what was done by the original filters.
Instead of dropping by default all org_apache_cassandra_metrics_table.*
metrics, you're dropping everything that has a table
label, and then allow specific metrics by changing the should_drop
label value.
Based on my testing, none of the metrics with a table
label survives these filters, despite the following rules.
With the default filters on, I can only see 3 lines when looking for for org_apache_cassandra_metrics_table_live_ss_table_count
:
# HELP org_apache_cassandra_metrics_table_live_ss_table_count_all
# TYPE org_apache_cassandra_metrics_table_live_ss_table_count_all gauge
org_apache_cassandra_metrics_table_live_ss_table_count_all{host="67140d82-7ae5-45a0-b3c8-cac99530fde2",instance="172.24.0.4",cluster="test",datacenter="dc1",rack="default",pod_name="test-dc1-default-sts-1",node_name="k8ssandra-0-worker2",} 12.0
But without the filters, I see 45 such lines. Here's a sample:
# HELP org_apache_cassandra_metrics_table_live_ss_table_count
# TYPE org_apache_cassandra_metrics_table_live_ss_table_count gauge
org_apache_cassandra_metrics_table_live_ss_table_count{host="1f8db44a-2fb4-4a65-8f60-82fdb75e3b07",instance="172.24.0.5",cluster="test",datacenter="dc1",rack="default",pod_name="test-dc1-default-sts-0",node_name="k8ssandra-0-worker3",keyspace="system",table="built_views",} 0.0
org_apache_cassandra_metrics_table_live_ss_table_count{host="1f8db44a-2fb4-4a65-8f60-82fdb75e3b07",instance="172.24.0.5",cluster="test",datacenter="dc1",rack="default",pod_name="test-dc1-default-sts-0",node_name="k8ssandra-0-worker3",keyspace="system",table="view_builds_in_progress",} 0.0
org_apache_cassandra_metrics_table_live_ss_table_count{host="1f8db44a-2fb4-4a65-8f60-82fdb75e3b07",instance="172.24.0.5",cluster="test",datacenter="dc1",rack="default",pod_name="test-dc1-default-sts-0",node_name="k8ssandra-0-worker3",keyspace="system_traces",table="sessions",} 0.0
org_apache_cassandra_metrics_table_live_ss_table_count{host="1f8db44a-2fb4-4a65-8f60-82fdb75e3b07",instance="172.24.0.5",cluster="test",datacenter="dc1",rack="default",pod_name="test-dc1-default-sts-0",node_name="k8ssandra-0-worker3",keyspace="system_schema",table="aggregates",} 2.0
org_apache_cassandra_metrics_table_live_ss_table_count{host="1f8db44a-2fb4-4a65-8f60-82fdb75e3b07",instance="172.24.0.5",cluster="test",datacenter="dc1",rack="default",pod_name="test-dc1-default-sts-0",node_name="k8ssandra-0-worker3",keyspace="system",table="available_ranges",} 0.0
org_apache_cassandra_metrics_table_live_ss_table_count{host="1f8db44a-2fb4-4a65-8f60-82fdb75e3b07",instance="172.24.0.5",cluster="test",datacenter="dc1",rack="default",pod_name="test-dc1-default-sts-0",node_name="k8ssandra-0-worker3",keyspace="system",table="size_estimates",} 0.0
org_apache_cassandra_metrics_table_live_ss_table_count{host="1f8db44a-2fb4-4a65-8f60-82fdb75e3b07",instance="172.24.0.5",cluster="test",datacenter="dc1",rack="default",pod_name="test-dc1-default-sts-0",node_name="k8ssandra-0-worker3",keyspace="system_schema",table="dropped_columns",} 2.0
org_apache_cassandra_metrics_table_live_ss_table_count{host="1f8db44a-2fb4-4a65-8f60-82fdb75e3b07",instance="172.24.0.5",cluster="test",datacenter="dc1",rack="default",pod_name="test-dc1-default-sts-0",node_name="k8ssandra-0-worker3",keyspace="system_auth",table="role_members",} 0.0
org_apache_cassandra_metrics_table_live_ss_table_count{host="1f8db44a-2fb4-4a65-8f60-82fdb75e3b07",instance="172.24.0.5",cluster="test",datacenter="dc1",rack="default",pod_name="test-dc1-default-sts-0",node_name="k8ssandra-0-worker3",keyspace="system",table="transferred_ranges_v2",} 0.0
org_apache_cassandra_metrics_table_live_ss_table_count{host="1f8db44a-2fb4-4a65-8f60-82fdb75e3b07",instance="172.24.0.5",cluster="test",datacenter="dc1",rack="default",pod_name="test-dc1-default-sts-0",node_name="k8ssandra-0-worker3",keyspace="system",table="peers",} 2.0
org_apache_cassandra_metrics_table_live_ss_table_count{host="1f8db44a-2fb4-4a65-8f60-82fdb75e3b07",instance="172.24.0.5",cluster="test",datacenter="dc1",rack="default",pod_name="test-dc1-default-sts-0",node_name="k8ssandra-0-worker3",keyspace="system_schema",table="tables",} 2.0
org_apache_cassandra_metrics_table_live_ss_table_count{host="1f8db44a-2fb4-4a65-8f60-82fdb75e3b07",instance="172.24.0.5",cluster="test",datacenter="dc1",rack="default",pod_name="test-dc1-default-sts-0",node_name="k8ssandra-0-worker3",keyspace="system_auth",table="roles",} 3.0
org_apache_cassandra_metrics_table_live_ss_table_count{host="1f8db44a-2fb4-4a65-8f60-82fdb75e3b07",instance="172.24.0.5",cluster="test",datacenter="dc1",rack="default",pod_name="test-dc1-default-sts-0",node_name="k8ssandra-0-worker3",keyspace="system",table="peer_events_v2",} 0.0
org_apache_cassandra_metrics_table_live_ss_table_count{host="1f8db44a-2fb4-4a65-8f60-82fdb75e3b07",instance="172.24.0.5",cluster="test",datacenter="dc1",rack="default",pod_name="test-dc1-default-sts-0",node_name="k8ssandra-0-worker3",keyspace="system",table="peer_events",} 0.0
org_apache_cassandra_metrics_table_live_ss_table_count{host="1f8db44a-2fb4-4a65-8f60-82fdb75e3b07",instance="172.24.0.5",cluster="test",datacenter="dc1",rack="default",pod_name="test-dc1-default-sts-0",node_name="k8ssandra-0-worker3",keyspace="system",table="local",} 1.0
The above metrics should survive the filters, given the name of the metrics, but they don't 🤔
So, too many metrics are being dropped.
Switching the first relabeling to:
- regex: org_apache_cassandra_metrics_table.*
replacement: "true"
sourceLabels:
- __name__
targetLabel: should_drop
generates even weirder behaviors, since I now lose all the live_ss_table_count
metrics, but end up with metrics that have both should_drop=true
and should_drop=false
😆
org_apache_cassandra_metrics_table_all_memtables_live_data_size_all{host="af660e57-38c4-4863-af8a-31c08a4314dc",instance="172.24.0.4",cluster="test",datacenter="dc1",rack="default",pod_name="test-dc1-default-sts-0",node_name="k8ssandra-0-worker2",should_drop="true",should_drop="false",} 125755.0
@burmanm, there seems to be something wrong with the relabelings here.
Could you tell us what you think?
The former part of the solution was @burmanm's suggestion, the latter part is required due to the differences in the way allow/deny rules work in MCAC (additive) vs the new metrics agent (subtractive). w.r.t. some metrics not appearing, that's exactly what I was hoping you could confirm @adejanovski. I do see table related metrics appearing, however, having just re-tested I've realised that you're right - the ones I can see don't have the So it appears that something is indeed wrong. |
… as it must be optional from the perspective of the CR so that it can have defaulting behaviour in the controller.
…rrectly parsed by regex.
9f59727
to
00b97dc
Compare
SonarCloud Quality Gate failed. 0 Bugs No Coverage information |
Coming back to this now that this PR has been merged. The previous iteration of management api's new metrics agent had some issues where a capitalised metric name sometimes appeared, which meant that our default rules in this PR did not recognise the metric name via regex and failed to mark it with I have manually tested and found that:
These tests are run with I believe that makes this PR ready for merge. |
What this PR does:
This PR hopefully replicates the previous metrics filters that applied to MCAC over to the new metrics agent's config file which we've built and mounted via this PR.
I haven't been able to track down docs on how the metrics relabelling/filtering works for the new agent, so I'm flying slightly blind here (if anyone knows where docs are please feel free to chime in).
Having said that, if it works similarly to Prometheus then I think this will be pretty close to what we need.
Which issue(s) this PR fixes:
Fixes #816
Checklist