Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compaction is Not Running Properly Because of Out-of-order Chunk #5584

Open
LERUfic opened this issue Oct 3, 2023 · 4 comments
Open

Compaction is Not Running Properly Because of Out-of-order Chunk #5584

LERUfic opened this issue Oct 3, 2023 · 4 comments

Comments

@LERUfic
Copy link

LERUfic commented Oct 3, 2023

Describe the bug
Due to out-of-order chunk, the compactor is not performing compaction as expected. Although I have added the skip_blocks_with_out_of_order_chunks_enabled: true configuration, the block is not being marked as non-compact.

To Reproduce
Steps to reproduce the behavior:

  1. Start Cortex 1.14.1
/ # cortex --version
Cortex, version 1.14.1 (branch: HEAD, revision: 984ac41)
  build user:
  build date:
  go version:       go1.19
  platform:         linux/amd64
  1. Run the compaction process

Expected behavior
I expect the compaction process is running smoothly and even the out-of-order happening it's should be skipped because of the skip_blocks_with_out_of_order_chunks_enabled: true config.

Environment:

Additional Context
My compactor config

auth_enabled: true
tenant_federation:
  enabled: true
limits:
  enforce_metric_name: true
  reject_old_samples: true
  reject_old_samples_max_age: 365d
  max_label_name_length: 2048
  max_label_value_length: 4096
  max_label_names_per_series: 1024
  max_metadata_length: 2048
  max_query_lookback: 0
  compactor_blocks_retention_period: 365d
  max_series_per_user: 0
  max_series_per_metric: 0
  max_fetched_chunks_per_query: 0
  max_series_per_query: 10000000
  max_metadata_per_user: 0
  max_metadata_per_metric: 0
server:
  http_listen_port: 8080
  grpc_listen_port: 9095
  grpc_server_max_recv_msg_size: 100000000
  grpc_server_max_send_msg_size: 100000000
  grpc_server_max_concurrent_streams: 10000
  log_level: info
ingester_client:
  grpc_client_config:
    max_recv_msg_size: 100000000
    max_send_msg_size: 100000000
storage:
  engine: blocks
blocks_storage:
  backend: gcs
  gcs:
    bucket_name: <redacted>
  tsdb:
    dir: /data/tsdb
    block_ranges_period:
      - 1h0m0s
    retention_period: 10h
  bucket_store:
    ignore_deletion_mark_delay: 1h
    sync_dir: /data/tsdb-sync
    max_concurrent: 1000
    bucket_index:
      enabled: true
      max_stale_period: 24h
store_gateway:
  sharding_enabled: true
  sharding_ring:
    kvstore:
      store: "memberlist"
    replication_factor: 2
memberlist:
  bind_port: 7946
  join_members:
    - '{{ include "cortex.fullname" $ }}-memberlist'
compactor:
  sharding_enabled: true
  sharding_ring:
    kvstore:
      store: "memberlist"
  skip_blocks_with_out_of_order_chunks_enabled: true
...

Runtime Config

runtime_config:
  overrides:
    prometheus-data-prd:
      max_query_lookback: 0
      compactor_blocks_retention_period: 365d
      max_series_per_metric: 0
      max_series_per_query: 20000000
      ingestion_rate: 20000000

Metrics

cortex_compactor_runs_failed_total 1
cortex_compactor_runs_started_total 1
cortex_compactor_runs_completed_total 0
cortex_compactor_runs_interrupted_total 0
cortex_bucket_blocks_marked_for_no_compaction_count{user="prometheus-data-prd"} 0

Error logs

{"blocks":"[data/compact/0@8032743924406704676/01GVSPKX0477E5R42A89AJ4C2K data/compact/0@8032743924406704676/01GVSPKY9YGRT3NC0GT0E6ANTZ data/compact/0@8032743924406704676/01GVSPJTT7AFDGS33Z4BPANNXA data/compact/0@8032743924406704676/01GVSPK01NDHJWSX9ZD46R24PK data/compact/0@8032743924406704676/01GVSPKPN74CV8RWQ6RNNHDHBJ data/compact/0@8032743924406704676/01GVSPJWFZ3SCRG2R0ZNG8EJSQ data/compact/0@8032743924406704676/01GVSPJH83QGFKT0ZVW23KBCM1 data/compact/0@8032743924406704676/01GVSPK8JGH1XYXV0SM07C7QKZ data/compact/0@8032743924406704676/01GVSPJWYJKZWNS6H2NB06MSG8 data/compact/0@8032743924406704676/01GVSPKJ3K3BYJCTWTX61NFV6X]","caller":"compact.go:1097","component":"compactor","duration":"31.589419502s","duration_ms":31589,"group":"0@{__org_id__=\"prometheus-data-prd\"}","groupKey":"0@8032743924406704676","level":"info","msg":"compacted blocks","new":"01HBSWWRTXT3BC29KZERZJ0ZYZ","org_id":"prometheus-data-prd","overlapping_blocks":true,"ts":"2023-10-03T04:32:15.520558663Z"}
{"caller":"index.go:331","component":"compactor","group":"0@{__org_id__=\"prometheus-data-prd\"}","groupKey":"0@8032743924406704676","labels":{"<redacted>"},"level":"debug","msg":"found out of order series","org_id":"prometheus-data-prd","ts":"2023-10-03T04:32:17.77270071Z"}
{"caller":"index.go:331","component":"compactor","group":"0@{__org_id__=\"prometheus-data-prd\"}","groupKey":"0@8032743924406704676","labels":{"<redacted>"},"level":"debug","msg":"found out of order series","org_id":"prometheus-data-prd","ts":"2023-10-03T04:32:17.83720268Z"}
{"caller":"index.go:331","component":"compactor","group":"0@{__org_id__=\"prometheus-data-prd\"}","groupKey":"0@8032743924406704676","labels":{"<redacted>"},"level":"debug","msg":"found out of order series","org_id":"prometheus-data-prd","ts":"2023-10-03T04:32:17.837553736Z"}
{"caller":"index.go:331","component":"compactor","group":"0@{__org_id__=\"prometheus-data-prd\"}","groupKey":"0@8032743924406704676","labels":{"<redacted>"},"level":"debug","msg":"found out of order series","org_id":"prometheus-data-prd","ts":"2023-10-03T04:32:17.837782653Z"}
{"caller":"index.go:331","component":"compactor","group":"0@{__org_id__=\"prometheus-data-prd\"}","groupKey":"0@8032743924406704676","labels":{"<redacted>"},"level":"debug","msg":"found out of order series","org_id":"prometheus-data-prd","ts":"2023-10-03T04:32:17.838058837Z"}
{"caller":"index.go:331","component":"compactor","group":"0@{__org_id__=\"prometheus-data-prd\"}","groupKey":"0@8032743924406704676","labels":{"<redacted>"},"level":"debug","msg":"found out of order series","org_id":"prometheus-data-prd","ts":"2023-10-03T04:32:17.8382072Z"}
{"caller":"index.go:331","component":"compactor","group":"0@{__org_id__=\"prometheus-data-prd\"}","groupKey":"0@8032743924406704676","labels":{"<redacted>"},"level":"debug","msg":"found out of order series","org_id":"prometheus-data-prd","ts":"2023-10-03T04:32:17.838398942Z"}
{"caller":"index.go:331","component":"compactor","group":"0@{__org_id__=\"prometheus-data-prd\"}","groupKey":"0@8032743924406704676","labels":{"<redacted>"},"level":"debug","msg":"found out of order series","org_id":"prometheus-data-prd","ts":"2023-10-03T04:32:17.83861866Z"}
{"caller":"index.go:331","component":"compactor","group":"0@{__org_id__=\"prometheus-data-prd\"}","groupKey":"0@8032743924406704676","labels":{"<redacted>"},"level":"debug","msg":"found out of order series","org_id":"prometheus-data-prd","ts":"2023-10-03T04:32:17.838865968Z"}
{"caller":"index.go:331","component":"compactor","group":"0@{__org_id__=\"prometheus-data-prd\"}","groupKey":"0@8032743924406704676","labels":{"<redacted>"},"level":"debug","msg":"found out of order series","org_id":"prometheus-data-prd","ts":"2023-10-03T04:32:17.83909492Z"}
{"caller":"index.go:331","component":"compactor","group":"0@{__org_id__=\"prometheus-data-prd\"}","groupKey":"0@8032743924406704676","labels":{"<redacted>"},"level":"debug","msg":"found out of order series","org_id":"prometheus-data-prd","ts":"2023-10-03T04:32:17.839210632Z"}
{"caller":"index.go:331","component":"compactor","group":"0@{__org_id__=\"prometheus-data-prd\"}","groupKey":"0@8032743924406704676","labels":{"<redacted>"},"level":"debug","msg":"found out of order series","org_id":"prometheus-data-prd","ts":"2023-10-03T04:32:17.839330004Z"}
{"caller":"index.go:331","component":"compactor","group":"0@{__org_id__=\"prometheus-data-prd\"}","groupKey":"0@8032743924406704676","labels":{"<redacted>"},"level":"debug","msg":"found out of order series","org_id":"prometheus-data-prd","ts":"2023-10-03T04:32:17.839442933Z"}
{"caller":"index.go:331","component":"compactor","group":"0@{__org_id__=\"prometheus-data-prd\"}","groupKey":"0@8032743924406704676","labels":{"<redacted>"},"level":"debug","msg":"found out of order series","org_id":"prometheus-data-prd","ts":"2023-10-03T04:32:17.839611222Z"}
{"caller":"index.go:331","component":"compactor","group":"0@{__org_id__=\"prometheus-data-prd\"}","groupKey":"0@8032743924406704676","labels":{"<redacted>"},"level":"debug","msg":"found out of order series","org_id":"prometheus-data-prd","ts":"2023-10-03T04:32:17.839776342Z"}
{"caller":"index.go:331","component":"compactor","group":"0@{__org_id__=\"prometheus-data-prd\"}","groupKey":"0@8032743924406704676","labels":{"<redacted>"},"level":"debug","msg":"found out of order series","org_id":"prometheus-data-prd","ts":"2023-10-03T04:32:17.862857914Z"}
{"caller":"index.go:331","component":"compactor","group":"0@{__org_id__=\"prometheus-data-prd\"}","groupKey":"0@8032743924406704676","labels":{"<redacted>"},"level":"debug","msg":"found out of order series","org_id":"prometheus-data-prd","ts":"2023-10-03T04:32:17.863121857Z"}
{"caller":"index.go:331","component":"compactor","group":"0@{__org_id__=\"prometheus-data-prd\"}","groupKey":"0@8032743924406704676","labels":{"<redacted>"},"level":"debug","msg":"found out of order series","org_id":"prometheus-data-prd","ts":"2023-10-03T04:32:17.863390645Z"}
{"caller":"index.go:331","component":"compactor","group":"0@{__org_id__=\"prometheus-data-prd\"}","groupKey":"0@8032743924406704676","labels":{"<redacted>"},"level":"debug","msg":"found out of order series","org_id":"prometheus-data-prd","ts":"2023-10-03T04:32:17.863593061Z"}
{"caller":"index.go:331","component":"compactor","group":"0@{__org_id__=\"prometheus-data-prd\"}","groupKey":"0@8032743924406704676","labels":{"<redacted>"},"level":"debug","msg":"found out of order series","org_id":"prometheus-data-prd","ts":"2023-10-03T04:32:17.863937026Z"}
{"caller":"index.go:331","component":"compactor","group":"0@{__org_id__=\"prometheus-data-prd\"}","groupKey":"0@8032743924406704676","labels":{"<redacted>"},"level":"debug","msg":"found out of order series","org_id":"prometheus-data-prd","ts":"2023-10-03T04:32:17.864282218Z"}
{"caller":"index.go:331","component":"compactor","group":"0@{__org_id__=\"prometheus-data-prd\"}","groupKey":"0@8032743924406704676","labels":{"<redacted>"},"level":"debug","msg":"found out of order series","org_id":"prometheus-data-prd","ts":"2023-10-03T04:32:17.864422168Z"}
{"caller":"index.go:331","component":"compactor","group":"0@{__org_id__=\"prometheus-data-prd\"}","groupKey":"0@8032743924406704676","labels":{"<redacted>"},"level":"debug","msg":"found out of order series","org_id":"prometheus-data-prd","ts":"2023-10-03T04:32:17.864677397Z"}
{"caller":"index.go:331","component":"compactor","group":"0@{__org_id__=\"prometheus-data-prd\"}","groupKey":"0@8032743924406704676","labels":{"<redacted>"},"level":"debug","msg":"found out of order series","org_id":"prometheus-data-prd","ts":"2023-10-03T04:32:17.864931805Z"}
{"caller":"index.go:331","component":"compactor","group":"0@{__org_id__=\"prometheus-data-prd\"}","groupKey":"0@8032743924406704676","labels":{"<redacted>"},"level":"debug","msg":"found out of order series","org_id":"prometheus-data-prd","ts":"2023-10-03T04:32:17.865108675Z"}
{"caller":"index.go:331","component":"compactor","group":"0@{__org_id__=\"prometheus-data-prd\"}","groupKey":"0@8032743924406704676","labels":{"<redacted>"},"level":"debug","msg":"found out of order series","org_id":"prometheus-data-prd","ts":"2023-10-03T04:32:17.865267327Z"}
{"caller":"index.go:331","component":"compactor","group":"0@{__org_id__=\"prometheus-data-prd\"}","groupKey":"0@8032743924406704676","labels":{"<redacted>"},"level":"debug","msg":"found out of order series","org_id":"prometheus-data-prd","ts":"2023-10-03T04:32:17.865520627Z"}
{"caller":"index.go:331","component":"compactor","group":"0@{__org_id__=\"prometheus-data-prd\"}","groupKey":"0@8032743924406704676","labels":{"<redacted>"},"level":"debug","msg":"found out of order series","org_id":"prometheus-data-prd","ts":"2023-10-03T04:32:17.865847171Z"}
{"caller":"index.go:331","component":"compactor","group":"0@{__org_id__=\"prometheus-data-prd\"}","groupKey":"0@8032743924406704676","labels":{"<redacted>"},"level":"debug","msg":"found out of order series","org_id":"prometheus-data-prd","ts":"2023-10-03T04:32:17.866020439Z"}
{"caller":"index.go:331","component":"compactor","group":"0@{__org_id__=\"prometheus-data-prd\"}","groupKey":"0@8032743924406704676","labels":{"<redacted>"},"level":"debug","msg":"found out of order series","org_id":"prometheus-data-prd","ts":"2023-10-03T04:32:17.866556481Z"}
{"caller":"index.go:331","component":"compactor","group":"0@{__org_id__=\"prometheus-data-prd\"}","groupKey":"0@8032743924406704676","labels":{"<redacted>"},"level":"debug","msg":"found out of order series","org_id":"prometheus-data-prd","ts":"2023-10-03T04:32:17.866783036Z"}
{"caller":"index.go:331","component":"compactor","group":"0@{__org_id__=\"prometheus-data-prd\"}","groupKey":"0@8032743924406704676","labels":{"<redacted>"},"level":"debug","msg":"found out of order series","org_id":"prometheus-data-prd","ts":"2023-10-03T04:32:17.866886127Z"}
{"caller":"index.go:331","component":"compactor","group":"0@{__org_id__=\"prometheus-data-prd\"}","groupKey":"0@8032743924406704676","labels":{"<redacted>"},"level":"debug","msg":"found out of order series","org_id":"prometheus-data-prd","ts":"2023-10-03T04:32:17.867022838Z"}
{"caller":"index.go:331","component":"compactor","group":"0@{__org_id__=\"prometheus-data-prd\"}","groupKey":"0@8032743924406704676","labels":{"<redacted>"},"level":"debug","msg":"found out of order series","org_id":"prometheus-data-prd","ts":"2023-10-03T04:32:17.867160204Z"}
{"caller":"compactor.go:696","component":"compactor","err":"compaction: group 0@8032743924406704676: invalid result block data/compact/0@8032743924406704676/01HBSWWRTXT3BC29KZERZJ0ZYZ: 34/359219 series have an average of 1.000 out-of-order chunks: 0.000 of these are exact duplicates (in terms of data and time range)","level":"error","msg":"failed to compact user blocks","ts":"2023-10-03T04:32:18.308412624Z","user":"prometheus-data-prd"}
@yeya24
Copy link
Contributor

yeya24 commented Oct 3, 2023

@LERUfic May I know which version of Prometheus are you using? I am wondering if it is the same issue as thanos-io/thanos#6723

@LERUfic
Copy link
Author

LERUfic commented Oct 3, 2023

sure @yeya24
I use prometheus 2.42.0

root@vm# ./prometheus --version
prometheus, version 2.42.0 (branch: HEAD, revision: 225c61122d88b01d1f0eaaee0e05b6f3e0567ac0)
  build user:       root@c67d48967507
  build date:       20230201-07:53:32
  go version:       go1.19.5
  platform:         linux/amd64

PS: opss wrong account

@yeya24
Copy link
Contributor

yeya24 commented Oct 4, 2023

This seems like a bug in Thanos https://github.com/thanos-io/thanos/blob/main/pkg/compact/compact.go#L1394.
The error cause doesn't work properly to identify the original error is out of chunk error, causing fail to ignore OOO chunks issue.

I will create an issue on Thanos side.

@LERUfic
Copy link
Author

LERUfic commented Oct 4, 2023

I see thank you for the response. For now I marked the chunks with no-compact-mark.json using thanos tools.

type: GCS
config:
    bucket: <redacted>
prefix: prometheus-data-prd
thanos tools bucket mark --id=<block> --marker=no-compact-mark.json --objstore.config-file=thanos.yaml --details=OOO --log.level=debug

And seems like there's no error for now but the compaction still has not finished yet. We will monitor this for a while.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants