Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 Snapshots become very slow with more existing snapshots #174

Open
ankon opened this issue Feb 17, 2015 · 4 comments
Open

S3 Snapshots become very slow with more existing snapshots #174

ankon opened this issue Feb 17, 2015 · 4 comments

Comments

@ankon
Copy link
Contributor

ankon commented Feb 17, 2015

I did some tests with the S3 snapshot facility, to see how well they will work in a production environment. Initially things looked nice on our data set (about 500 shards, all quite small). However, after only about 2 days of hourly snapshots the time it took to complete a snapshot shot up from initially ~160s to ~300s, and then after 7 days the time was at 1700s.

While this might be improvable by removing existing snapshots over time, it seriously limits the ability to use snapshots as backups over long time.

Looking through the code: the issue seems to stem from the fact that the S3 snapshots use the Blob storage format, which requires being able to read many files to recover the full meta data. On a file system this might be "ok" for a long time, but on S3 accessing each of those files means API requests and network use.

I can see a short-time hack to improve this: in addition to the small meta data files also keep an aggregated version, which gets updated when new snapshots are created. When accessing a single blob the aggregate could be checked first, and if it does exist the information is take from there.
This should be reasonably backwards-compatible, but will likely be very ugly and potentially have issues when the blob storage format introduces new meta data files.

In the long run it seems it would make more sense to stop using the blob storage format, and build something that uses S3's strengths better, and avoids the weaknesses. I'm not sure yet about how this could look like though :)

So, some questions:

  1. What do you think? Has this appeared before? Is any of this on the roadmap?
  2. Would it make sense to try coming up with patches for both the short-time and the long-time ideas?
  3. How are S3 snapshots intended to be used?
@ankon
Copy link
Contributor Author

ankon commented Apr 14, 2015

Outcome of a quick discussion with @bleskes:

  1. Being able to do a 1:1 copy from a local snapshot to a S3 snapshot isn't necessarily needed, so the formats could differ.
  2. Being able to read "old" snapshots is required.
  3. Being able to continue writing "old" snapshots is very much desirable
  4. Being able to do an in-place upgrade of a repository to a different storage format is nice-to-have, but not required.
  5. Changing the internal API of ES should be avoided, although backwards compatible changes (like pulling in another layer) could be ok.

Both the short-term and long-term approach would be doable with those requirements from what I can see. At the moment this issue is semi-critical for us, but I can see it becoming an issue in the mid-term. At that point I'll likely have a closer look how to implement the long-term approach (if no one beats me to it :D).

@tlrx
Copy link
Member

tlrx commented Apr 15, 2015

@ankon sorry, I just see this issue now... This issue is well known and is not specific to S3. There is an improvement about to be merged #8969 and it is very similar to what you suggested here.

@dadoonet
Copy link
Member

@tlrx so should we close the issue here?

@ankon
Copy link
Contributor Author

ankon commented Sep 30, 2015

Some explicit links:

Update from my point: I cannot update to 2.0.0, so I'll instead use a different strategy to manage snapshots -- probably by actually rolling the repositories, and then selectively wiping those directly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants