-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Super fast checking of cloud targets #1181
Conversation
This PR does have a couple side effects that may require an adjustment:
|
Hmm, so it sounds like my in-development |
Yeah, the
If a |
Prework
Related GitHub issues and pull requests
Summary
Before today,
targets
was slow to check the status of cloud targets. Each target needed its own individual HEAD request to check if the object in the bucket was up to date. But as of this PR,targets
uses a LIST request to get all the hashes in the prefix ahead of time, then look up the hash in the list.The results benchmarks are exciting! I tested performance in a pipeline with 1000 file targets and about 1000 regular targets:
When this pipeline is up to date,
tar_outdated()
should take no time at all. Previously, it took about 4 minutes and 6.775 seconds. With this PR, it only took 3.805 seconds! That's a 64-fold speedup. For comparison, the unwise shortcut version withcue = tar_cue(file = FALSE)
took 1.162 seconds, which is not much faster.Everything is ready as far as AWS is concerned. I just need to finish benchmarking on Google Cloud.
FYI @noamross