-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data repair delayed #132
Comments
I ran through this test again and observed slightly different behavior. This time the rebuilding never happened no matter how long I waited. Basically it goes on like this forever: 2024-11-23 00:42:23 +00:00: DEBUG checking data dir size
2024-11-23 00:42:23 +00:00: DEBUG Directory "/data/data/zdbfs-data/" is within size limits (1048992526 < 2684354560)
2024-11-23 00:43:23 +00:00: DEBUG Starting metastore key iteration with prefix /zstor-meta/meta/
2024-11-23 00:43:23 +00:00: DEBUG checking data dir size
2024-11-23 00:43:23 +00:00: DEBUG Terminating scan: No: more data
2024-11-23 00:43:23 +00:00: DEBUG Directory "/data/data/zdbfs-data/" is within size limits (1048992526 < 2684354560)
2024-11-23 00:44:23 +00:00: DEBUG checking data dir size
2024-11-23 00:44:23 +00:00: DEBUG Directory "/data/data/zdbfs-data/" is within size limits (1048992526 < 2684354560)
2024-11-23 00:45:23 +00:00: DEBUG checking data dir size
2024-11-23 00:45:23 +00:00: DEBUG Directory "/data/data/zdbfs-data/" is within size limits (1048992526 < 2684354560)
2024-11-23 00:45:47 +00:00: INFO Triggering repair after receiving SIGUSR2
2024-11-23 00:45:47 +00:00: DEBUG Starting metastore key iteration with prefix /zstor-meta/meta/
2024-11-23 00:45:47 +00:00: DEBUG Terminating scan: No: more data
2024-11-23 00:46:23 +00:00: DEBUG checking data dir size
2024-11-23 00:46:23 +00:00: DEBUG Directory "/data/data/zdbfs-data/" is within size limits (1048992526 < 2684354560)
2024-11-23 00:47:23 +00:00: DEBUG checking data dir size While reading the logs I got an idea from So I changed the test to look like this:
As soon as new data is written into the fresh backend, then the rebuilding kicks off as expected. The metadata also finally gets rebuilt at this point too. I'm beginning to suspect that this issue is really not a separate issue from #131. So far I've always been replacing both a metadata and a data backend at the same time, so I can't say that this behavior is specific to data backend replacements. |
yes, should be the same |
After more checking and getting more familiar with the code, i found that it is different issue |
OK, i can reproduce it. Please note that there are two scenarios:
based on this #131 (comment), no rebuild is expected
rebuild is expected
rebuild is expected, but we should be careful here to differentiate between temporary down vs dead |
It should be fixed by #140 There is another possibility of delay because 0-stor cache the connection to 0-db, and there is always stale cache. |
Verified this in my test. Data repair seemed to be triggered immediately after hot config reload actually, which is nice (unless that's a coincidence of lining up with the 10 minute cycle 🙂). |
Actually i assume that it is the expected behavior for the |
Yeah, on a second test it looks like it was just luck the first time. Rebuild on the regular cycle is okay and I wasn't expecting immediate rebuild on hot reload to be there. |
I did the following steps:
Then I waited for the repair subsystem to kick in and restore the expected shards count using the new empty backend. I understand from here:
0-stor_v2/zstor/src/actors/repairer.rs
Line 11 in cd24f42
that the repair cycle should run every ten minutes. However, it was exactly 40 minutes before any rebuilds were triggered:
From here the logs contain a lot more similar rebuild messages. Actually it seems to continue on forever until the backends are full, but that's another issue.
The text was updated successfully, but these errors were encountered: