Remove timestamp from the metrics #43

prabinsh · 2021-03-30T22:53:06Z

The timestamp in the metrics is 2 hours behind the system time.

 HELP collectd_collectd_cache_size write_prometheus plugin: 'collectd' Type: 'cache_size', Dstype: 'gauge', Dsname: 'value'
# TYPE collectd_collectd_cache_size gauge
collectd_collectd_cache_size{collectd="cache",instance="10.0.1.1",cluster="CassCluster",dc="DAL",rack="rack1"} 11969 1617137796120

Here's the system time and timestamp it translates to

$ date -d @1617137796
Tue Mar 30 13:56:36 GMT+7 2021
$ date
Tue Mar 30 15:39:22 GMT+7 2021

The time reported in metric is 2 hours behind and I can't figure out the way to disable the timestamp in metrics.

This is causing the following error when scrapping in prometheus

msg="Error on ingesting samples that are too old or are too far into the future" num_dropped=51908

The text was updated successfully, but these errors were encountered:

MattFellows · 2021-08-11T08:53:06Z

I've also got a similar issue...
Our k8ssandra nodes ran out of disk space, we fixed the issue, but ever since, we've had no grafana metrics from k8ssandra... I've restarted, deleted and recreated every pod an servicemonitor, removed coutless directories / caches and it just keeps happening. The timestamps are out by about 5 minutes immediately after a delete of mcac_data and restart, then get older and older until they are bout 4 hours old, then start moving forwards...

Any advice or help about what to grab for diagnosis would be great, but this really feels like a bug of some sort, induced by an unexpected state...

kenjaix · 2021-08-16T03:19:17Z

Same here.

I got the following metrics with the timestamps that are 2 months ago:

collectd_tcpconns_tcp_connections{tcpconns="9999-local",type="SYN_SENT",instance="172.17.47.22",cluster="V2",dc="F1",rack="D1"} 0 1624932662653
collectd_uptime{instance="172.17.47.22",cluster="V2",dc="F1",rack="D1"} 10100889 1624932662650
collectd_vmem_vmpage_action_total{vmem="dirtied",instance="172.17.47.22",cluster="V2",dc="F1",rack="D1"} 20647291651 1624932662647

1624932662650
GMT: Tuesday, June 29, 2021 2:11:02.650 AM
Relative: 2 months ago

Causing the Prometheus drops those metrics. Not sure why the mcac doesn't update the timestamp.

Please advise.

tah-mas · 2021-08-20T15:31:34Z

We got the same issue. It was working fine, but after leaving it for a couple of days, mcac is reporting the wrong time causing prometheus to fail:
level=warn ts=2021-08-20T15:29:17.688Z caller=scrape.go:1375 component="scrape manager" scrape_pool=k8ssandra/k8ssandra-prometheus-k8ssandra/0 target=http://xxxxx:9103/metrics msg="Error on ingesting samples that are too old or are too far into the future" num_dropped=205

Please fix.

adejanovski · 2021-09-14T09:41:27Z

I'm unable to reproduce the issue on GKE. I've left the cluster run for a few days and Prometheus isn't complaining about metrics that are too old.
Could you compare the clocks from the Prometheus container and from the Cassandra containers to see if there's a drift? Same question on comparing the clocks on all K8s worker nodes to see if they're in sync.

tah-mas · 2021-09-27T09:12:41Z

Hi, @adejanovski no drift and both prometheus and cassandra container report the same time (UTC). I did notice that with the fix for 'out-of-order timespaces' (#969), I had no problems with the timestamps as long as I had a smaller number of tables (~100) in the DB. After our production upgrade, I now have 326 tables spread across keyspaces and the problem has reappeared again. Our dev env has also got a similar number of tables so it appears that this happens if you've got a large number of tables in your DB, but that is just an observation...

adejanovski · 2021-09-28T03:48:00Z

Hi @tah-mas,

that's an interesting observation. Each table comes with a large set of metrics and this could mean that they take too long to be ingested and end up being ingested once they're outside of the accepted timestamp range.
The solution there would be to filter some metrics so that we reduce the overall volume. I'm not even sure we have table specific metrics used in the current set of dashboards.
I'll investigate to see how easily this could be achieved.

tah-mas · 2021-09-28T07:16:11Z

Thank you @adejanovski! Much appreciated

eriksw · 2021-12-31T06:48:55Z

@adejanovski Any ETA making the default config usable?

We just switched from the instaclustr exporter to MCAC and are winding up with no metrics/blank dashboards from our main cluster due to this issue, despite it working fine on a smaller cluster with fewer tables.

adejanovski · 2022-01-03T07:14:24Z

Hi @eriksw,

we merged the changes a while ago actually to let you filter metrics more easily. Check this commit for some examples.
Let me know how this works for you.

eriksw · 2022-01-03T18:14:25Z

@adejanovski Glad to see some rules documented here! I had looked around and found https://github.com/k8ssandra/k8ssandra/pull/1149/files and derived the following rule set:

filtering_rules:
  - policy: deny
    pattern: org.apache.cassandra.metrics.Table
    scope: global
  - policy: deny
    pattern: org.apache.cassandra.metrics.table
    scope: global
  - policy: allow
    pattern: org.apache.cassandra.metrics.table.live_ss_table_count
    scope: global
  - policy: allow
    pattern: org.apache.cassandra.metrics.Table.LiveSSTableCount
    scope: global
  - policy: allow
    pattern: org.apache.cassandra.metrics.table.live_disk_space_used
    scope: global
  - policy: allow
    pattern: org.apache.cassandra.metrics.table.LiveDiskSpaceUsed
    scope: global
  - policy: allow
    pattern: org.apache.cassandra.metrics.Table.Pending
    scope: global
  - policy: allow
    pattern: org.apache.cassandra.metrics.Table.Memtable
    scope: global
  - policy: allow
    pattern: org.apache.cassandra.metrics.Table.Compaction
    scope: global
  - policy: allow
    pattern: org.apache.cassandra.metrics.table.read
    scope: global
  - policy: allow
    pattern: org.apache.cassandra.metrics.table.write
    scope: global
  - policy: allow
    pattern: org.apache.cassandra.metrics.table.range
    scope: global
  - policy: allow
    pattern: org.apache.cassandra.metrics.table.coordinator
    scope: global
  - policy: allow
    pattern: org.apache.cassandra.metrics.table.dropped_mutations
    scope: global

The bad news: with those rules, on our main cluster we still ran into wildly out of date metric timestamps and all the other issues of #39

Has MCAC ever been used in actual production on a cluster with >300 tables on 60 nodes? If so, how?

ducnm0711 · 2022-02-21T15:37:20Z

Hi everyone

The rate of having Prometheus warning out-of-order samples indeed decrease with above setup.

Increase metric_sampling_interval_in_seconds: 120 does help a bit.

I was from having scrape warning every minutes to every 3-4 mins.

I'm testing MCAC in a 3-nodes-cluster with 100+ tables.
Prometheus/ServiceMonitor deployed in k8s cluster.
Cassandra in VM Instances.

raskar7 · 2022-05-06T08:41:22Z

Hi everyone
We're having the same issue.
The mcac exporter metrics are timestamped 2hours in the past compared to our France current UTC+2. All servers are NTP synced.
So I think the exporter gets the time from Cassandra and not from the system.
If there's no way to configure it, the simplest way would be to change prometheus server timezone to match UTC ?

jsanda · 2022-08-07T20:18:12Z

@Miles-Garnsey can you investigate this? Could this be related to #73?

tah-mas mentioned this issue Aug 23, 2021

K8SSAND-712 ⁃ Prometheus "out-of-order timestamp" error due to metrics relabeling conflict k8ssandra/k8ssandra#969

Closed

sync-by-unito bot mentioned this issue Sep 9, 2021

K8SSAND-874 ⁃ Dropped MCAC metrics due to timestamp too far in the past k8ssandra/k8ssandra#1092

Closed

Miles-Garnsey self-assigned this Aug 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove timestamp from the metrics #43

Remove timestamp from the metrics #43

prabinsh commented Mar 30, 2021

MattFellows commented Aug 11, 2021

kenjaix commented Aug 16, 2021 •

edited

Loading

tah-mas commented Aug 20, 2021 •

edited

Loading

adejanovski commented Sep 14, 2021

tah-mas commented Sep 27, 2021

adejanovski commented Sep 28, 2021

tah-mas commented Sep 28, 2021

eriksw commented Dec 31, 2021

adejanovski commented Jan 3, 2022

eriksw commented Jan 3, 2022

ducnm0711 commented Feb 21, 2022 •

edited

Loading

raskar7 commented May 6, 2022

jsanda commented Aug 7, 2022

Remove timestamp from the metrics #43

Remove timestamp from the metrics #43

Comments

prabinsh commented Mar 30, 2021

MattFellows commented Aug 11, 2021

kenjaix commented Aug 16, 2021 • edited Loading

tah-mas commented Aug 20, 2021 • edited Loading

adejanovski commented Sep 14, 2021

tah-mas commented Sep 27, 2021

adejanovski commented Sep 28, 2021

tah-mas commented Sep 28, 2021

eriksw commented Dec 31, 2021

adejanovski commented Jan 3, 2022

eriksw commented Jan 3, 2022

ducnm0711 commented Feb 21, 2022 • edited Loading

raskar7 commented May 6, 2022

jsanda commented Aug 7, 2022

kenjaix commented Aug 16, 2021 •

edited

Loading

tah-mas commented Aug 20, 2021 •

edited

Loading

ducnm0711 commented Feb 21, 2022 •

edited

Loading