-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Default metrics filters for new metrics agent #834
Changes from all commits
67eee3f
a823dd1
47ff077
e54491b
f1e13ee
5771b7e
982158f
adaa306
00b97dc
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,12 +5,15 @@ import ( | |
"path/filepath" | ||
"time" | ||
|
||
"github.com/adutra/goalesce" | ||
|
||
cassdcapi "github.com/k8ssandra/cass-operator/apis/cassandra/v1beta1" | ||
k8ssandraapi "github.com/k8ssandra/k8ssandra-operator/apis/k8ssandra/v1alpha1" | ||
telemetryapi "github.com/k8ssandra/k8ssandra-operator/apis/telemetry/v1alpha1" | ||
"github.com/k8ssandra/k8ssandra-operator/pkg/labels" | ||
"github.com/k8ssandra/k8ssandra-operator/pkg/reconciliation" | ||
"github.com/k8ssandra/k8ssandra-operator/pkg/result" | ||
promapi "github.com/prometheus-operator/prometheus-operator/pkg/apis/monitoring/v1" | ||
corev1 "k8s.io/api/core/v1" | ||
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" | ||
"k8s.io/apimachinery/pkg/types" | ||
|
@@ -21,10 +24,89 @@ import ( | |
var ( | ||
agentConfigLocation = "/opt/management-api/configs/metrics-collector.yaml" | ||
defaultAgentConfig = telemetryapi.CassandraAgentSpec{ | ||
Endpoint: telemetryapi.Endpoint{ | ||
Endpoint: &telemetryapi.Endpoint{ | ||
Port: "9000", | ||
Address: "127.0.0.1", | ||
}, | ||
Relabels: []promapi.RelabelConfig{ | ||
{ | ||
SourceLabels: []string{"table"}, | ||
Regex: ".+", | ||
TargetLabel: "should_drop", | ||
Replacement: "true", | ||
}, | ||
{ | ||
SourceLabels: []string{"__name__"}, | ||
Regex: "org_apache_cassandra_metrics_table_live_ss_table_count", | ||
TargetLabel: "should_drop", | ||
Replacement: "false", | ||
}, | ||
{ | ||
SourceLabels: []string{"__name__"}, | ||
Regex: "org_apache_cassandra_metrics_table_live_disk_space_used", | ||
TargetLabel: "should_drop", | ||
Replacement: "false", | ||
}, | ||
{ | ||
SourceLabels: []string{"__name__"}, | ||
Regex: "org_apache_cassandra_metrics_table_memtable.*", | ||
TargetLabel: "should_drop", | ||
Replacement: "false", | ||
}, | ||
{ | ||
SourceLabels: []string{"__name__"}, | ||
Regex: "org_apache_cassandra_metrics_table_all_memtables.*", | ||
TargetLabel: "should_drop", | ||
Replacement: "false", | ||
}, | ||
{ | ||
SourceLabels: []string{"__name__"}, | ||
Regex: "org_apache_cassandra_metrics_table_compaction_bytes_written", | ||
TargetLabel: "should_drop", | ||
Replacement: "false", | ||
}, | ||
{ | ||
SourceLabels: []string{"__name__"}, | ||
Regex: "org_apache_cassandra_metrics_table_pending_compactions", | ||
TargetLabel: "should_drop", | ||
Replacement: "false", | ||
}, | ||
{ | ||
SourceLabels: []string{"__name__"}, | ||
Regex: "org_apache_cassandra_metrics_table_read_.*", | ||
TargetLabel: "should_drop", | ||
Replacement: "false", | ||
}, | ||
{ | ||
SourceLabels: []string{"__name__"}, | ||
Regex: "org_apache_cassandra_metrics_table_write_.*", | ||
TargetLabel: "should_drop", | ||
Replacement: "false", | ||
}, | ||
{ | ||
SourceLabels: []string{"__name__"}, | ||
Regex: "org_apache_cassandra_metrics_table_range.*", | ||
TargetLabel: "should_drop", | ||
Replacement: "false", | ||
}, | ||
{ | ||
SourceLabels: []string{"__name__"}, | ||
Regex: "org_apache_cassandra_metrics_table_coordinator_.*", | ||
TargetLabel: "should_drop", | ||
Replacement: "false", | ||
}, | ||
{ | ||
SourceLabels: []string{"__name__"}, | ||
Regex: "org_apache_cassandra_metrics_table_dropped_mutations", | ||
TargetLabel: "should_drop", | ||
Replacement: "false", | ||
}, | ||
{ | ||
SourceLabels: []string{"should_drop"}, | ||
Regex: "true", | ||
Action: "drop", | ||
}, | ||
}, | ||
} | ||
) | ||
|
||
|
@@ -42,7 +124,8 @@ func (c Configurator) GetTelemetryAgentConfigMap() (*corev1.ConfigMap, error) { | |
var yamlData []byte | ||
var err error | ||
if c.TelemetrySpec.Cassandra != nil { | ||
yamlData, err = yaml.Marshal(&c.TelemetrySpec.Cassandra) | ||
mergedSpec := goalesce.MustDeepMerge(&defaultAgentConfig, c.TelemetrySpec.Cassandra) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @adejanovski and I have a slight problem with this. If the fields are merged, what's the ordering here? Can user override everything that's set in the defaults? How are they merged, before, after, what about duplicates? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I confirm that my expectation here is that any custom filter from users would override all the default filters. They shouldn't get merged with it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. And that's exactly how this works. MustDeepMerge uses an atomic merge for slices by default. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Awesome! I wasn't sure about this because Goalesce is very powerful and has a lot of different merge techniques. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Well, in this case I'd say the "Merge" is a bit of a confusing (sorry, insanely confusing) name for a method (and perhaps this is why this would require a comment to say we don't ever want to merge these despite calling a function to merge). It's not merging anything. It either returns a copy of defaultAgentConfig (if c.TelemetrySpec.Cassandra is nil) or a copy of c.TelemetrySpec.Cassandra (if it isn't nil), it never merges values from those two. Now finding out the intended behavior requires to go through 4-5 jumps in files while reading which method to jump to next in the goalesce. And even then next one has no idea what's the intention - was it a bug that it didn't merge when one user asks "why isn't it merging". The fact that the behavior depends of the type (map or slice have different behaviors, so if for whatever reason someone changes the slices to be maps for some preprocessing requirements then the entire behavior will change) makes it even more difficult to remember these. |
||
yamlData, err = yaml.Marshal(&mergedSpec) | ||
if err != nil { | ||
return &corev1.ConfigMap{}, err | ||
} | ||
|
@@ -79,7 +162,7 @@ func (c Configurator) ReconcileTelemetryAgentConfig(dc *cassdcapi.CassandraDatac | |
recRes := reconciliation.ReconcileObject(c.Ctx, c.RemoteClient, c.RequeueDelay, *desiredCm) | ||
switch { | ||
case recRes.IsError(): | ||
fallthrough | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @burmanm why did you change this to fallthrough? This will give you a |
||
return recRes | ||
case recRes.IsRequeue(): | ||
return recRes | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adejanovski can you please take a look at this to confirm that I've captured the intention behind the original filters?
Some metric names present quite differently (e.g. org.apache.cassandra.metrics.table.read doesn't appear directly anymore, now it seems to just be read_latency, similar with org.apache.cassandra.metrics.Table.LiveSSTableCount which now appears to be ... live_ss_table_count).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The behavior looks different than what was done by the original filters.
Instead of dropping by default all
org_apache_cassandra_metrics_table.*
metrics, you're dropping everything that has atable
label, and then allow specific metrics by changing theshould_drop
label value.Based on my testing, none of the metrics with a
table
label survives these filters, despite the following rules.With the default filters on, I can only see 3 lines when looking for for
org_apache_cassandra_metrics_table_live_ss_table_count
:But without the filters, I see 45 such lines. Here's a sample:
The above metrics should survive the filters, given the name of the metrics, but they don't 🤔
So, too many metrics are being dropped.
Switching the first relabeling to:
generates even weirder behaviors, since I now lose all the
live_ss_table_count
metrics, but end up with metrics that have bothshould_drop=true
andshould_drop=false
😆@burmanm, there seems to be something wrong with the relabelings here.
Could you tell us what you think?