Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data repair delayed #132

Closed
scottyeager opened this issue Nov 22, 2024 · 8 comments · Fixed by #140
Closed

Data repair delayed #132

scottyeager opened this issue Nov 22, 2024 · 8 comments · Fixed by #140
Assignees

Comments

@scottyeager
Copy link

scottyeager commented Nov 22, 2024

I did the following steps:

  1. Start zstor with an initial config and write some data (minimal shards 2 and expected shards 4)
  2. Replace the config file with a new one where 3/4 data backends and 3/4 metadata backends are the same but one is changed (one original backend of each type has been decommissioned so the namespace is deleted)
  3. Do a hot reload of the config with SIGUSR1

Then I waited for the repair subsystem to kick in and restore the expected shards count using the new empty backend. I understand from here:

const OBJECT_SWEEP_INTERVAL_SECONDS: u64 = 60 * 10;

that the repair cycle should run every ten minutes. However, it was exactly 40 minutes before any rebuilds were triggered:

2024-11-22 00:26:41 +00:00: INFO Reloading config after receiving SIGUSR1
2024-11-22 00:26:41 +00:00: DEBUG Config actor reloading running config from "/etc/zstor-default.toml"
2024-11-22 00:26:41 +00:00: DEBUG Config actor finished reloading running config
2024-11-22 00:26:41 +00:00: DEBUG connecting to zdb at [2a02:1802:5e:0:e8f8:dcff:fe44:6b2]:9900
2024-11-22 00:26:41 +00:00: DEBUG connecting to zdb at [2a02:1802:5e:0:8c0e:bfff:fe33:fa8b]:9900
2024-11-22 00:26:41 +00:00: DEBUG connecting to zdb at [2a02:1802:5e:0:586f:52ff:fe5b:8ac0]:9900
2024-11-22 00:26:41 +00:00: DEBUG connecting to zdb at [2a02:1802:5e:0:a89b:bfff:fec4:5205]:9900
2024-11-22 00:26:41 +00:00: DEBUG connecting to zdb at [2a02:1802:5e:0:e8f8:dcff:fe44:6b2]:9900
2024-11-22 00:26:41 +00:00: DEBUG connecting to zdb at [2a02:1802:5e:0:8c0e:bfff:fe33:fa8b]:9900
2024-11-22 00:26:41 +00:00: DEBUG connecting to zdb at [2a02:1802:5e:0:586f:52ff:fe5b:8ac0]:9900
2024-11-22 00:26:41 +00:00: DEBUG connecting to zdb at [2a02:1802:5e:0:a89b:bfff:fec4:5205]:9900
2024-11-22 00:27:05 +00:00: DEBUG Starting metastore key iteration with prefix /zstor-meta/meta/
2024-11-22 00:27:05 +00:00: DEBUG checking data dir size
2024-11-22 00:27:05 +00:00: DEBUG Terminating scan: No: more data
2024-11-22 00:27:05 +00:00: DEBUG Directory "/data/data/zdbfs-data/" is within size limits (1048992582 < 2684354560)
2024-11-22 00:28:05 +00:00: DEBUG checking data dir size
2024-11-22 00:28:05 +00:00: DEBUG Directory "/data/data/zdbfs-data/" is within size limits (1048992582 < 2684354560)
2024-11-22 00:29:05 +00:00: DEBUG checking data dir size
2024-11-22 00:29:05 +00:00: DEBUG Directory "/data/data/zdbfs-data/" is within size limits (1048992582 < 2684354560)
2024-11-22 00:30:05 +00:00: DEBUG checking data dir size
2024-11-22 00:30:05 +00:00: DEBUG Directory "/data/data/zdbfs-data/" is within size limits (1048992582 < 2684354560)
2024-11-22 00:31:05 +00:00: DEBUG checking data dir size
2024-11-22 00:31:05 +00:00: DEBUG Directory "/data/data/zdbfs-data/" is within size limits (1048992582 < 2684354560)
2024-11-22 00:32:05 +00:00: DEBUG checking data dir size
2024-11-22 00:32:05 +00:00: DEBUG Directory "/data/data/zdbfs-data/" is within size limits (1048992582 < 2684354560)
2024-11-22 00:33:05 +00:00: DEBUG checking data dir size
2024-11-22 00:33:05 +00:00: DEBUG Directory "/data/data/zdbfs-data/" is within size limits (1048992582 < 2684354560)
2024-11-22 00:34:05 +00:00: DEBUG checking data dir size
2024-11-22 00:34:05 +00:00: DEBUG Directory "/data/data/zdbfs-data/" is within size limits (1048992582 < 2684354560)
2024-11-22 00:35:05 +00:00: DEBUG checking data dir size
2024-11-22 00:35:05 +00:00: DEBUG Directory "/data/data/zdbfs-data/" is within size limits (1048992582 < 2684354560)
2024-11-22 00:36:05 +00:00: DEBUG checking data dir size
2024-11-22 00:36:05 +00:00: DEBUG Directory "/data/data/zdbfs-data/" is within size limits (1048992582 < 2684354560)
2024-11-22 00:37:05 +00:00: DEBUG Starting metastore key iteration with prefix /zstor-meta/meta/
2024-11-22 00:37:05 +00:00: DEBUG checking data dir size
2024-11-22 00:37:05 +00:00: DEBUG Terminating scan: No: more data
2024-11-22 00:37:05 +00:00: DEBUG Directory "/data/data/zdbfs-data/" is within size limits (1048992582 < 2684354560)
2024-11-22 00:38:05 +00:00: DEBUG checking data dir size
2024-11-22 00:38:05 +00:00: DEBUG Directory "/data/data/zdbfs-data/" is within size limits (1048992582 < 2684354560)
2024-11-22 00:39:05 +00:00: DEBUG checking data dir size
2024-11-22 00:39:05 +00:00: DEBUG Directory "/data/data/zdbfs-data/" is within size limits (1048992582 < 2684354560)
2024-11-22 00:40:05 +00:00: DEBUG checking data dir size
2024-11-22 00:40:05 +00:00: DEBUG Directory "/data/data/zdbfs-data/" is within size limits (1048992582 < 2684354560)
2024-11-22 00:41:05 +00:00: DEBUG checking data dir size
2024-11-22 00:41:05 +00:00: DEBUG Directory "/data/data/zdbfs-data/" is within size limits (1048992582 < 2684354560)
2024-11-22 00:42:05 +00:00: DEBUG checking data dir size
2024-11-22 00:42:05 +00:00: DEBUG Directory "/data/data/zdbfs-data/" is within size limits (1048992582 < 2684354560)
2024-11-22 00:43:05 +00:00: DEBUG checking data dir size
2024-11-22 00:43:05 +00:00: DEBUG Directory "/data/data/zdbfs-data/" is within size limits (1048992582 < 2684354560)
2024-11-22 00:44:05 +00:00: DEBUG checking data dir size
2024-11-22 00:44:05 +00:00: DEBUG Directory "/data/data/zdbfs-data/" is within size limits (1048992582 < 2684354560)
2024-11-22 00:45:05 +00:00: DEBUG checking data dir size
2024-11-22 00:45:05 +00:00: DEBUG Directory "/data/data/zdbfs-data/" is within size limits (1048992582 < 2684354560)
2024-11-22 00:46:05 +00:00: DEBUG checking data dir size
2024-11-22 00:46:05 +00:00: DEBUG Directory "/data/data/zdbfs-data/" is within size limits (1048992582 < 2684354560)
2024-11-22 00:47:05 +00:00: DEBUG Starting metastore key iteration with prefix /zstor-meta/meta/
2024-11-22 00:47:05 +00:00: DEBUG checking data dir size
2024-11-22 00:47:05 +00:00: DEBUG Terminating scan: No: more data
2024-11-22 00:47:05 +00:00: DEBUG Directory "/data/data/zdbfs-data/" is within size limits (1048992582 < 2684354560)
2024-11-22 00:48:05 +00:00: DEBUG checking data dir size
2024-11-22 00:48:05 +00:00: DEBUG Directory "/data/data/zdbfs-data/" is within size limits (1048992582 < 2684354560)
2024-11-22 00:49:05 +00:00: DEBUG checking data dir size
2024-11-22 00:49:05 +00:00: DEBUG Directory "/data/data/zdbfs-data/" is within size limits (1048992582 < 2684354560)
2024-11-22 00:50:05 +00:00: DEBUG checking data dir size
2024-11-22 00:50:05 +00:00: DEBUG Directory "/data/data/zdbfs-data/" is within size limits (1048992582 < 2684354560)
2024-11-22 00:51:05 +00:00: DEBUG checking data dir size
2024-11-22 00:51:05 +00:00: DEBUG Directory "/data/data/zdbfs-data/" is within size limits (1048992582 < 2684354560)
2024-11-22 00:52:05 +00:00: DEBUG checking data dir size
2024-11-22 00:52:05 +00:00: DEBUG Directory "/data/data/zdbfs-data/" is within size limits (1048992582 < 2684354560)
2024-11-22 00:53:05 +00:00: DEBUG checking data dir size
2024-11-22 00:53:05 +00:00: DEBUG Directory "/data/data/zdbfs-data/" is within size limits (1048992582 < 2684354560)
2024-11-22 00:54:05 +00:00: DEBUG checking data dir size
2024-11-22 00:54:05 +00:00: DEBUG Directory "/data/data/zdbfs-data/" is within size limits (1048992582 < 2684354560)
2024-11-22 00:55:05 +00:00: DEBUG checking data dir size
2024-11-22 00:55:05 +00:00: DEBUG Directory "/data/data/zdbfs-data/" is within size limits (1048992582 < 2684354560)
2024-11-22 00:56:05 +00:00: DEBUG checking data dir size
2024-11-22 00:56:05 +00:00: DEBUG Directory "/data/data/zdbfs-data/" is within size limits (1048992582 < 2684354560)
2024-11-22 00:57:05 +00:00: DEBUG Starting metastore key iteration with prefix /zstor-meta/meta/
2024-11-22 00:57:05 +00:00: DEBUG checking data dir size
2024-11-22 00:57:05 +00:00: DEBUG Terminating scan: No: more data
2024-11-22 00:57:05 +00:00: DEBUG Directory "/data/data/zdbfs-data/" is within size limits (1048992582 < 2684354560)
2024-11-22 00:58:05 +00:00: DEBUG checking data dir size
2024-11-22 00:58:05 +00:00: DEBUG Directory "/data/data/zdbfs-data/" is within size limits (1048992582 < 2684354560)
2024-11-22 00:59:05 +00:00: DEBUG checking data dir size
2024-11-22 00:59:05 +00:00: DEBUG Directory "/data/data/zdbfs-data/" is within size limits (1048992582 < 2684354560)
2024-11-22 01:00:05 +00:00: DEBUG checking data dir size
2024-11-22 01:00:05 +00:00: DEBUG Directory "/data/data/zdbfs-data/" is within size limits (1048992582 < 2684354560)
2024-11-22 01:01:05 +00:00: DEBUG checking data dir size
2024-11-22 01:01:05 +00:00: DEBUG Directory "/data/data/zdbfs-data/" is within size limits (1048992582 < 2684354560)
2024-11-22 01:02:05 +00:00: DEBUG checking data dir size
2024-11-22 01:02:05 +00:00: DEBUG Directory "/data/data/zdbfs-data/" is within size limits (1048992582 < 2684354560)
2024-11-22 01:03:05 +00:00: DEBUG checking data dir size
2024-11-22 01:03:05 +00:00: DEBUG Directory "/data/data/zdbfs-data/" is within size limits (1048992582 < 2684354560)
2024-11-22 01:04:05 +00:00: DEBUG checking data dir size
2024-11-22 01:04:05 +00:00: DEBUG Directory "/data/data/zdbfs-data/" is within size limits (1048992582 < 2684354560)
2024-11-22 01:05:05 +00:00: DEBUG checking data dir size
2024-11-22 01:05:05 +00:00: DEBUG Directory "/data/data/zdbfs-data/" is within size limits (1048992582 < 2684354560)
2024-11-22 01:06:05 +00:00: DEBUG checking data dir size
2024-11-22 01:06:05 +00:00: DEBUG Directory "/data/data/zdbfs-data/" is within size limits (1048992582 < 2684354560)
2024-11-22 01:06:13 +00:00: DEBUG Handling new client connection from (unnamed)
2024-11-22 01:06:13 +00:00: DEBUG scheduler received a command ZstorSchedulerMessage { cmd: ZstorCommand(Rebuild(Rebuild { file: Some("/data/data/zdbfs-data/d0"), key: None })), blocking: true, priority: 0, response: Sender { inner: Some(Inner { state: State { is_complete: false, is_closed: false, is_rx_task_set: true, is_tx_task_set: false } }) } } with prio 0
2024-11-22 01:06:13 +00:00: DEBUG the scheduler forwarding command Rebuild(Rebuild { file: Some("/data/data/zdbfs-data/d0"), key: None })
2024-11-22 01:06:13 +00:00: DEBUG Loading metadata for key /zstor-meta/meta/adc0536df10097fed133fab8aa35f308
2024-11-22 01:06:13 +00:00: DEBUG Reading data from zdb metastore for key /zstor-meta/meta/adc0536df10097fed133fab8aa35f308
2024-11-22 01:06:13 +00:00: DEBUG connecting to zdb at [2a02:1802:5e:0:e8f8:dcff:fe44:6b2]:9900
2024-11-22 01:06:13 +00:00: DEBUG connecting to zdb at [2a02:1802:5e:0:107d:6dff:fe38:c095]:9900
2024-11-22 01:06:13 +00:00: DEBUG connecting to zdb at [2a02:1802:5e:0:a89b:bfff:fec4:5205]:9900
2024-11-22 01:06:13 +00:00: DEBUG connecting to zdb at [2a02:1802:5e:0:8c0e:bfff:fe33:fa8b]:9900
2024-11-22 01:06:13 +00:00: WARN could not download shard 1: error during storage: ZDB at [2a02:1802:5e:0:107d:6dff:fe38:c095]:9900 5545-708152-data13, error operation READ caused by Namespace: not found
2024-11-22 01:06:13 +00:00: DEBUG Finding backend config
2024-11-22 01:06:13 +00:00: DEBUG connecting to zdb at [2a02:1802:5e:0:586f:52ff:fe5b:8ac0]:9900
2024-11-22 01:06:13 +00:00: DEBUG connecting to zdb at [2a02:1802:5e:0:e8f8:dcff:fe44:6b2]:9900
2024-11-22 01:06:13 +00:00: DEBUG connecting to zdb at [2a02:1802:5e:0:a89b:bfff:fec4:5205]:9900
2024-11-22 01:06:13 +00:00: DEBUG connecting to zdb at [2a02:1802:5e:0:8c0e:bfff:fe33:fa8b]:9900
2024-11-22 01:06:13 +00:00: DEBUG found valid backend configuration
2024-11-22 01:06:13 +00:00: INFO Rebuild file from [2a02:1802:5e:0:e8f8:dcff:fe44:6b2]:9900,[2a02:1802:5e:0:107d:6dff:fe38:c095]:9900,[2a02:1802:5e:0:a89b:bfff:fec4:5205]:9900,[2a02:1802:5e:0:8c0e:bfff:fe33:fa8b]:9900 to [2a02:1802:5e:0:586f:52ff:fe5b:8ac0]:9900,[2a02:1802:5e:0:e8f8:dcff:fe44:6b2]:9900,[2a02:1802:5e:0:a89b:bfff:fec4:5205]:9900,[2a02:1802:5e:0:8c0e:bfff:fe33:fa8b]:9900
2024-11-22 01:07:05 +00:00: DEBUG Starting metastore key iteration with prefix /zstor-meta/meta/
2024-11-22 01:07:05 +00:00: DEBUG checking data dir size
2024-11-22 01:07:05 +00:00: DEBUG Terminating scan: No: more data
2024-11-22 01:07:05 +00:00: DEBUG Reading data from zdb metastore for key /zstor-meta/meta/10c95b07ca4a0e0966ed2e153167f816
2024-11-22 01:07:05 +00:00: DEBUG Directory "/data/data/zdbfs-data/" is within size limits (1048992582 < 2684354560)
2024-11-22 01:07:05 +00:00: DEBUG Reading data from zdb metastore for key /zstor-meta/meta/c2d50473cbb7358d452bffb284ac3ea1
2024-11-22 01:07:05 +00:00: DEBUG Reading data from zdb metastore for key /zstor-meta/meta/404e8b1d07f77ed837dfc793660d80c5
2024-11-22 01:07:05 +00:00: DEBUG Reading data from zdb metastore for key /zstor-meta/meta/f58da2621af1f25528b9109cc6d9f56f
2024-11-22 01:07:05 +00:00: DEBUG Reading data from zdb metastore for key /zstor-meta/meta/adc0536df10097fed133fab8aa35f308
2024-11-22 01:07:05 +00:00: DEBUG Reading data from zdb metastore for key /zstor-meta/meta/d4a42488842c95e976974d1e8e98c7dc
2024-11-22 01:07:05 +00:00: DEBUG Reading data from zdb metastore for key /zstor-meta/meta/e84de87131d307b3679303d4fdc5a407
2024-11-22 01:07:05 +00:00: DEBUG Reading data from zdb metastore for key /zstor-meta/meta/cb09536e96e41643a4a112bbd8119482
2024-11-22 01:07:05 +00:00: DEBUG Reading data from zdb metastore for key /zstor-meta/meta/4027046ac821e5df93685c6b7c1b8ab2
2024-11-22 01:07:05 +00:00: DEBUG Reading data from zdb metastore for key /zstor-meta/meta/a0eb6b9f558f54933bed0d9bf903a57a
2024-11-22 01:07:05 +00:00: DEBUG Reading data from zdb metastore for key /zstor-meta/meta/82d142f1ebddd212d7133313d03bcd54
2024-11-22 01:07:05 +00:00: DEBUG Reading data from zdb metastore for key /zstor-meta/meta/76330048b24beba3320ea59aef554d11
2024-11-22 01:07:05 +00:00: DEBUG Reading data from zdb metastore for key /zstor-meta/meta/34f1fabbe9936a4626008c1836a6b3c0
2024-11-22 01:07:05 +00:00: DEBUG Reading data from zdb metastore for key /zstor-meta/meta/d104a2fb60984e5081fb0b76fb077a8b
2024-11-22 01:07:05 +00:00: DEBUG Reading data from zdb metastore for key /zstor-meta/meta/5a107ef186d4d26c099df33f61d271b4
2024-11-22 01:07:05 +00:00: DEBUG Reading data from zdb metastore for key /zstor-meta/meta/2ea29feab8dbb939b618bc9e89c6f675
2024-11-22 01:07:05 +00:00: DEBUG Reading data from zdb metastore for key /zstor-meta/meta/a1c4142a8b326a6e5e121c8bc999613c
2024-11-22 01:07:05 +00:00: DEBUG Reading data from zdb metastore for key /zstor-meta/meta/35e1718bc3c857b595861d164e5fc49f
2024-11-22 01:07:05 +00:00: DEBUG Reading data from zdb metastore for key /zstor-meta/meta/46d204eba02568271f9c74394aa2dfec
2024-11-22 01:07:05 +00:00: DEBUG Reading data from zdb metastore for key /zstor-meta/meta/4287301f433c36be033edd1ca2a645e8
2024-11-22 01:07:05 +00:00: DEBUG Reading data from zdb metastore for key /zstor-meta/meta/dfb79c97f5739c0aca29549535c76281
2024-11-22 01:07:05 +00:00: DEBUG Reading data from zdb metastore for key /zstor-meta/meta/d4354933f80491ae58da09611f903f61
2024-11-22 01:07:05 +00:00: DEBUG Reading data from zdb metastore for key /zstor-meta/meta/6b3de7dad787ac460ef1cd6386e0494a
2024-11-22 01:07:05 +00:00: DEBUG Reading data from zdb metastore for key /zstor-meta/meta/1465e36182a66d30da559ba45f5f4d1c
2024-11-22 01:07:05 +00:00: DEBUG Reading data from zdb metastore for key /zstor-meta/meta/45beffd02213cbaa09d4dd00a9dde499
2024-11-22 01:07:05 +00:00: DEBUG Reading data from zdb metastore for key /zstor-meta/meta/0772b876b7c9030cbc1adc232671f3cd
2024-11-22 01:07:05 +00:00: DEBUG Reading data from zdb metastore for key /zstor-meta/meta/fb66d28253d16949667da9d998a7c96b
2024-11-22 01:07:05 +00:00: DEBUG Reading data from zdb metastore for key /zstor-meta/meta/ccbc631a80ecd7707663f29498da3988
2024-11-22 01:07:05 +00:00: DEBUG Reading data from zdb metastore for key /zstor-meta/meta/0ae2584e9cd8a9d56da0253815ef3a64
2024-11-22 01:07:05 +00:00: DEBUG Reading data from zdb metastore for key /zstor-meta/meta/6fcb5c7c0b088d8be025986362380c9e
2024-11-22 01:07:05 +00:00: DEBUG Reading data from zdb metastore for key /zstor-meta/meta/3ea265903052015d542f5688c8744c72
2024-11-22 01:07:05 +00:00: DEBUG Reading data from zdb metastore for key /zstor-meta/meta/97ca57c2949f38fa167c0327411050d9
2024-11-22 01:07:05 +00:00: DEBUG Reading data from zdb metastore for key /zstor-meta/meta/a5a1494be9d7fc914cb829813f843ee2
2024-11-22 01:07:05 +00:00: DEBUG Reading data from zdb metastore for key /zstor-meta/meta/9de91db9a9d5c92195f9a25dc3ea4b20
2024-11-22 01:07:05 +00:00: DEBUG Reading data from zdb metastore for key /zstor-meta/meta/3a49b8691bdba31e7f7923b64fcd8403
2024-11-22 01:07:05 +00:00: DEBUG Reading data from zdb metastore for key /zstor-meta/meta/e8c5f334c9f39f390bb52006538ef446
2024-11-22 01:07:05 +00:00: DEBUG Reading data from zdb metastore for key /zstor-meta/meta/323f9ab2a341c7edb4a414eb60397255
2024-11-22 01:07:05 +00:00: DEBUG Reading data from zdb metastore for key /zstor-meta/meta/483e3dddb31a73f276bacd8a27ef6950
2024-11-22 01:07:05 +00:00: DEBUG Reading data from zdb metastore for key /zstor-meta/meta/26c7cee4bae9b66ab4c0396590460426
2024-11-22 01:07:05 +00:00: DEBUG Reading data from zdb metastore for key /zstor-meta/meta/50286c7fc66c796a99d6523b1e4f9ae7
2024-11-22 01:07:05 +00:00: DEBUG Reading data from zdb metastore for key /zstor-meta/meta/266f4e1d27e0d42628a5b8440c7da4e5
2024-11-22 01:07:05 +00:00: DEBUG Reading data from zdb metastore for key /zstor-meta/meta/2c2411cee9d20b40180cd7d77bd9012b
2024-11-22 01:07:05 +00:00: DEBUG Reading data from zdb metastore for key /zstor-meta/meta/81f79d46ed48549f39e96838f50c98d9
2024-11-22 01:07:05 +00:00: DEBUG Reading data from zdb metastore for key /zstor-meta/meta/8388baf2bb6ca356ce84bce18a359c33
2024-11-22 01:07:05 +00:00: DEBUG connecting to zdb at [2a02:1802:5e:0:107d:6dff:fe38:c095]:9900
2024-11-22 01:07:05 +00:00: DEBUG Loading metadata for key /zstor-meta/meta/10c95b07ca4a0e0966ed2e153167f816
2024-11-22 01:07:05 +00:00: DEBUG Reading data from zdb metastore for key /zstor-meta/meta/10c95b07ca4a0e0966ed2e153167f816
2024-11-22 01:07:05 +00:00: DEBUG connecting to zdb at [2a02:1802:5e:0:107d:6dff:fe38:c095]:9900
2024-11-22 01:07:05 +00:00: DEBUG connecting to zdb at [2a02:1802:5e:0:8c0e:bfff:fe33:fa8b]:9900
2024-11-22 01:07:05 +00:00: DEBUG connecting to zdb at [2a02:1802:5e:0:e8f8:dcff:fe44:6b2]:9900
2024-11-22 01:07:05 +00:00: DEBUG connecting to zdb at [2a02:1802:5e:0:a89b:bfff:fec4:5205]:9900
2024-11-22 01:07:05 +00:00: WARN could not download shard 0: error during storage: ZDB at [2a02:1802:5e:0:107d:6dff:fe38:c095]:9900 5545-708152-data13, error operation READ caused by Namespace: not found
2024-11-22 01:07:05 +00:00: DEBUG Finding backend config
2024-11-22 01:07:05 +00:00: DEBUG connecting to zdb at [2a02:1802:5e:0:a89b:bfff:fec4:5205]:9900
2024-11-22 01:07:05 +00:00: DEBUG connecting to zdb at [2a02:1802:5e:0:586f:52ff:fe5b:8ac0]:9900
2024-11-22 01:07:05 +00:00: DEBUG connecting to zdb at [2a02:1802:5e:0:8c0e:bfff:fe33:fa8b]:9900
2024-11-22 01:07:05 +00:00: DEBUG connecting to zdb at [2a02:1802:5e:0:e8f8:dcff:fe44:6b2]:9900
2024-11-22 01:07:05 +00:00: DEBUG found valid backend configuration
2024-11-22 01:07:05 +00:00: INFO Rebuild file from [2a02:1802:5e:0:107d:6dff:fe38:c095]:9900,[2a02:1802:5e:0:8c0e:bfff:fe33:fa8b]:9900,[2a02:1802:5e:0:e8f8:dcff:fe44:6b2]:9900,[2a02:1802:5e:0:a89b:bfff:fec4:5205]:9900 to [2a02:1802:5e:0:a89b:bfff:fec4:5205]:9900,[2a02:1802:5e:0:586f:52ff:fe5b:8ac0]:9900,[2a02:1802:5e:0:8c0e:bfff:fe33:fa8b]:9900,[2a02:1802:5e:0:e8f8:dcff:fe44:6b2]:9900

From here the logs contain a lot more similar rebuild messages. Actually it seems to continue on forever until the backends are full, but that's another issue.

@scottyeager
Copy link
Author

scottyeager commented Nov 23, 2024

I ran through this test again and observed slightly different behavior. This time the rebuilding never happened no matter how long I waited.

Basically it goes on like this forever:

2024-11-23 00:42:23 +00:00: DEBUG checking data dir size
2024-11-23 00:42:23 +00:00: DEBUG Directory "/data/data/zdbfs-data/" is within size limits (1048992526 < 2684354560)
2024-11-23 00:43:23 +00:00: DEBUG Starting metastore key iteration with prefix /zstor-meta/meta/
2024-11-23 00:43:23 +00:00: DEBUG checking data dir size
2024-11-23 00:43:23 +00:00: DEBUG Terminating scan: No: more data
2024-11-23 00:43:23 +00:00: DEBUG Directory "/data/data/zdbfs-data/" is within size limits (1048992526 < 2684354560)
2024-11-23 00:44:23 +00:00: DEBUG checking data dir size
2024-11-23 00:44:23 +00:00: DEBUG Directory "/data/data/zdbfs-data/" is within size limits (1048992526 < 2684354560)
2024-11-23 00:45:23 +00:00: DEBUG checking data dir size
2024-11-23 00:45:23 +00:00: DEBUG Directory "/data/data/zdbfs-data/" is within size limits (1048992526 < 2684354560)
2024-11-23 00:45:47 +00:00: INFO Triggering repair after receiving SIGUSR2
2024-11-23 00:45:47 +00:00: DEBUG Starting metastore key iteration with prefix /zstor-meta/meta/
2024-11-23 00:45:47 +00:00: DEBUG Terminating scan: No: more data
2024-11-23 00:46:23 +00:00: DEBUG checking data dir size
2024-11-23 00:46:23 +00:00: DEBUG Directory "/data/data/zdbfs-data/" is within size limits (1048992526 < 2684354560)
2024-11-23 00:47:23 +00:00: DEBUG checking data dir size

While reading the logs I got an idea from Terminating scan: No: more data that maybe the issue is somehow linked to having an empty backend that's causing this error and interrupting the repair process.

So I changed the test to look like this:

  1. Start zstor with an initial config and write some data (minimal shards 2 and expected shards 4)
  2. Replace the config file with a new one where 3/4 data backends and 3/4 metadata backends are the same but one is changed (one original backend of each type has been decommissioned so the namespace is deleted)
  3. Do a hot reload of the config with SIGUSR1
  4. Write some new data

As soon as new data is written into the fresh backend, then the rebuilding kicks off as expected. The metadata also finally gets rebuilt at this point too.

I'm beginning to suspect that this issue is really not a separate issue from #131. So far I've always been replacing both a metadata and a data backend at the same time, so I can't say that this behavior is specific to data backend replacements.

@iwanbk
Copy link
Member

iwanbk commented Nov 25, 2024

I'm beginning to suspect that this issue is really not a separate issue from #131. So far I've always been replacing both a metadata and a data backend at the same time, so I can't say that this behavior is specific to data backend replacements.

yes, should be the same

@iwanbk
Copy link
Member

iwanbk commented Nov 29, 2024

I'm beginning to suspect that this issue is really not a separate issue from #131. So far I've always been replacing both a metadata and a data backend at the same time, so I can't say that this behavior is specific to data backend replacements.

yes, should be the same

After more checking and getting more familiar with the code, i found that it is different issue

@iwanbk iwanbk self-assigned this Dec 4, 2024
@iwanbk
Copy link
Member

iwanbk commented Dec 4, 2024

OK, i can reproduce it.

Please note that there are two scenarios:

  1. replaced data backend still up

based on this #131 (comment), no rebuild is expected

  1. replaced data backend is then dead

rebuild is expected

  1. data backend that still in the config is down

rebuild is expected, but we should be careful here to differentiate between temporary down vs dead

@iwanbk
Copy link
Member

iwanbk commented Dec 6, 2024

It should be fixed by #140

There is another possibility of delay because 0-stor cache the connection to 0-db, and there is always stale cache.
But the staleness shouldn't be too long.

@scottyeager
Copy link
Author

Verified this in my test. Data repair seemed to be triggered immediately after hot config reload actually, which is nice (unless that's a coincidence of lining up with the 10 minute cycle 🙂).

@iwanbk
Copy link
Member

iwanbk commented Dec 14, 2024

which is nice (unless that's a coincidence of lining up with the 10 minute cycle 🙂).

Actually i assume that it is the expected behavior for the data. (meta is different case).
And this issue doesn't expect to have immediate rebuild, CMIIW.

@scottyeager
Copy link
Author

Actually i assume that it is the expected behavior for the data. (meta is different case).
And this issue doesn't expect to have immediate rebuild, CMIIW.

Yeah, on a second test it looks like it was just luck the first time. Rebuild on the regular cycle is okay and I wasn't expecting immediate rebuild on hot reload to be there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants