-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PERF] Native globbing early stopping #1452
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
jaychia
force-pushed
the
jay/native-globbing-early-stopping
branch
3 times, most recently
from
October 2, 2023 17:27
33cf586
to
ef1e6e1
Compare
jaychia
force-pushed
the
jay/native-globbing
branch
from
October 2, 2023 17:28
887e30b
to
428f1b3
Compare
jaychia
force-pushed
the
jay/native-globbing-early-stopping
branch
from
October 2, 2023 17:34
ef1e6e1
to
0ef0609
Compare
Benchmarks on my M1 mac + minio:
|
jaychia
force-pushed
the
jay/native-globbing-early-stopping
branch
from
October 3, 2023 00:41
ae81313
to
2706650
Compare
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #1452 +/- ##
==========================================
+ Coverage 74.64% 74.66% +0.01%
==========================================
Files 60 60
Lines 6042 6042
==========================================
+ Hits 4510 4511 +1
+ Misses 1532 1531 -1 |
Benchmarks running on AWS S3 and in EC2:
|
jaychia
commented
Oct 3, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Adds early stopping capabilities for our native globbing to prevent runaway parallelism when listing adverserial cases (one file per folder)
Other changes in this PR:
posix
argument tols
anditer_dir
: this controls whether the operation will perform a posix-like list (of just the next level), or an object-store-like prefix list (of all files starting with the provided prefix)delimiter
is no longer anOption
argument and is required by all the ObjectSources: it could actually be argued that this should be automatically determined by the ObjectSource itself?Not in scope in this PR:
How it works
d
, we define the number of nodes at the depth asfanout
.fanout_limit
to limit the fanout of our globbing. When we perform a recursive list and see that entering depthd+1
will increasefanout
by more thanfanout_limit
, we will instead fall back on "flat listing" which is a listing of all files starting with the current path as their prefix.Benchmarks
These were run locally on a minio Docker container
We see that limiting the parallelism helps a lot in this adversarial example, where instead of traversing the entire directory structure recursively we will stop after a certain depth and fall back onto prefix listing.