-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
S3 Snapshots become very slow with more existing snapshots #174
Comments
Outcome of a quick discussion with @bleskes:
Both the short-term and long-term approach would be doable with those requirements from what I can see. At the moment this issue is semi-critical for us, but I can see it becoming an issue in the mid-term. At that point I'll likely have a closer look how to implement the long-term approach (if no one beats me to it :D). |
@ankon sorry, I just see this issue now... This issue is well known and is not specific to S3. There is an improvement about to be merged #8969 and it is very similar to what you suggested here. |
@tlrx so should we close the issue here? |
Some explicit links:
Update from my point: I cannot update to 2.0.0, so I'll instead use a different strategy to manage snapshots -- probably by actually rolling the repositories, and then selectively wiping those directly. |
I did some tests with the S3 snapshot facility, to see how well they will work in a production environment. Initially things looked nice on our data set (about 500 shards, all quite small). However, after only about 2 days of hourly snapshots the time it took to complete a snapshot shot up from initially ~160s to ~300s, and then after 7 days the time was at 1700s.
While this might be improvable by removing existing snapshots over time, it seriously limits the ability to use snapshots as backups over long time.
Looking through the code: the issue seems to stem from the fact that the S3 snapshots use the Blob storage format, which requires being able to read many files to recover the full meta data. On a file system this might be "ok" for a long time, but on S3 accessing each of those files means API requests and network use.
I can see a short-time hack to improve this: in addition to the small meta data files also keep an aggregated version, which gets updated when new snapshots are created. When accessing a single blob the aggregate could be checked first, and if it does exist the information is take from there.
This should be reasonably backwards-compatible, but will likely be very ugly and potentially have issues when the blob storage format introduces new meta data files.
In the long run it seems it would make more sense to stop using the blob storage format, and build something that uses S3's strengths better, and avoids the weaknesses. I'm not sure yet about how this could look like though :)
So, some questions:
The text was updated successfully, but these errors were encountered: