Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PERF] Native globbing early stopping #1452

Merged
merged 16 commits into from
Oct 3, 2023
Merged

Conversation

jaychia
Copy link
Contributor

@jaychia jaychia commented Sep 30, 2023

Adds early stopping capabilities for our native globbing to prevent runaway parallelism when listing adverserial cases (one file per folder)

Other changes in this PR:

  1. Adds benchmarks on 10k local files
  2. Adds a new posix argument to ls and iter_dir: this controls whether the operation will perform a posix-like list (of just the next level), or an object-store-like prefix list (of all files starting with the provided prefix)
  3. delimiter is no longer an Option argument and is required by all the ObjectSources: it could actually be argued that this should be automatically determined by the ObjectSource itself?

Not in scope in this PR:

  1. Prefix listing for GCS/Azure/Local/HTTP are not implemented in this PR, thus the globbing algorithm will actually not work for those cases
  2. We may need a different globbing implementation for Local and HTTP given that they do not support prefix listing

How it works

  1. We can think of globbing a path as a tree search.
  2. At tree depth d, we define the number of nodes at the depth as fanout.
  3. This PR adds a fanout_limit to limit the fanout of our globbing. When we perform a recursive list and see that entering depth d+1 will increase fanout by more than fanout_limit, we will instead fall back on "flat listing" which is a listing of all files starting with the current path as their prefix.

Benchmarks

These were run locally on a minio Docker container

image

We see that limiting the parallelism helps a lot in this adversarial example, where instead of traversing the entire directory structure recursively we will stop after a certain depth and fall back onto prefix listing.

@jaychia jaychia force-pushed the jay/native-globbing-early-stopping branch 3 times, most recently from 33cf586 to ef1e6e1 Compare October 2, 2023 17:27
@jaychia jaychia force-pushed the jay/native-globbing branch from 887e30b to 428f1b3 Compare October 2, 2023 17:28
@jaychia jaychia force-pushed the jay/native-globbing-early-stopping branch from ef1e6e1 to 0ef0609 Compare October 2, 2023 17:34
@jaychia
Copy link
Contributor Author

jaychia commented Oct 2, 2023

Benchmarks on my M1 mac + minio:

----------------------------------------------------------------------------------------------------------- benchmark 'setup_bucket=one-file-per-dir': 14 tests -----------------------------------------------------------------------------------------------------------
Name (time in ms)                                                                                   Min                   Max                  Mean              StdDev                Median                 IQR            Outliers     OPS            Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_glob_daft[setup_bucket:one-file-per-dir-page_size:100-fanout_limit:512]         234.9010 (1.0)        245.1309 (1.0)        241.0674 (1.0)        3.7631 (1.63)       241.7193 (1.0)        3.3977 (1.77)          2;0  4.1482 (1.0)           5           1
test_benchmark_glob_daft[setup_bucket:one-file-per-dir-page_size:100-fanout_limit:256]         236.6957 (1.01)       246.7250 (1.01)       242.1410 (1.00)       4.1976 (1.82)       242.9567 (1.01)       7.1486 (3.72)          2;0  4.1298 (1.00)          5           1
test_benchmark_glob_daft[setup_bucket:one-file-per-dir-page_size:100-fanout_limit:128]         238.3742 (1.01)       255.8198 (1.04)       247.0925 (1.02)       7.1096 (3.08)       247.2936 (1.02)      11.8574 (6.18)          2;0  4.0471 (0.98)          5           1
test_benchmark_glob_daft[setup_bucket:one-file-per-dir-page_size:1_000-fanout_limit:128]       241.5293 (1.03)       255.1591 (1.04)       247.0737 (1.02)       6.1623 (2.67)       243.6697 (1.01)      10.4290 (5.43)          1;0  4.0474 (0.98)          5           1
test_benchmark_glob_daft[setup_bucket:one-file-per-dir-page_size:1_000-fanout_limit:256]       249.3034 (1.06)       723.8448 (2.95)       357.0079 (1.48)     206.7898 (89.65)      250.5372 (1.04)     164.7438 (85.83)         1;1  2.8011 (0.68)          5           1
test_benchmark_glob_daft[setup_bucket:one-file-per-dir-page_size:1_000-fanout_limit:512]       253.1211 (1.08)       337.0833 (1.38)       288.6682 (1.20)      35.4367 (15.36)      289.5584 (1.20)      59.5260 (31.01)         2;0  3.4642 (0.84)          5           1
test_benchmark_glob_daft[setup_bucket:one-file-per-dir-page_size:1_000-fanout_limit:64]        254.6384 (1.08)       553.2939 (2.26)       325.4490 (1.35)     127.8517 (55.43)      276.4690 (1.14)      90.3918 (47.09)         1;1  3.0727 (0.74)          5           1
test_benchmark_glob_daft[setup_bucket:one-file-per-dir-page_size:100-fanout_limit:64]          430.9935 (1.83)       457.6041 (1.87)       440.2964 (1.83)      10.7653 (4.67)       439.0003 (1.82)      14.3953 (7.50)          1;0  2.2712 (0.55)          5           1
test_benchmark_glob_daft[setup_bucket:one-file-per-dir-page_size:1_000-fanout_limit:8]         574.5199 (2.45)       607.6646 (2.48)       589.0032 (2.44)      15.6114 (6.77)       581.1158 (2.40)      28.0656 (14.62)         1;0  1.6978 (0.41)          5           1
test_benchmark_glob_boto3_list[setup_bucket:one-file-per-dir-page_size:1_000]                  596.3994 (2.54)     1,056.8779 (4.31)       763.7985 (3.17)     218.9530 (94.92)      613.5084 (2.54)     363.7618 (189.52)        1;0  1.3092 (0.32)          5           1
test_benchmark_glob_s3fs[setup_bucket:one-file-per-dir]                                        651.0911 (2.77)       765.5339 (3.12)       706.3358 (2.93)      47.6301 (20.65)      685.2028 (2.83)      75.1342 (39.15)         2;0  1.4158 (0.34)          5           1
test_benchmark_glob_daft[setup_bucket:one-file-per-dir-page_size:100-fanout_limit:8]           681.0068 (2.90)       698.5435 (2.85)       689.5254 (2.86)       8.0227 (3.48)       691.8908 (2.86)      14.6602 (7.64)          3;0  1.4503 (0.35)          5           1
test_benchmark_glob_boto3_list[setup_bucket:one-file-per-dir-page_size:100]                  1,045.8968 (4.45)     1,052.3051 (4.29)     1,049.5455 (4.35)       2.3066 (1.0)      1,049.8371 (4.34)       1.9194 (1.0)           2;0  0.9528 (0.23)          5           1
test_benchmark_io_list_recursive_daft[setup_bucket:one-file-per-dir]                         1,769.2465 (7.53)     2,136.9073 (8.72)     1,936.0943 (8.03)     133.5847 (57.91)    1,909.7637 (7.90)     144.8743 (75.48)         2;0  0.5165 (0.12)          5           1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------------------------- benchmark 'setup_bucket=partitioned-data-balanced': 14 tests -------------------------------------------------------------------------------------------------------------
Name (time in ms)                                                                                          Min                    Max                   Mean                 StdDev              Median                    IQR            Outliers     OPS            Rounds  Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_glob_daft[setup_bucket:partitioned-data-balanced-page_size:1_000-fanout_limit:64]      122.2108 (1.0)         135.6452 (1.0)         129.9232 (1.0)           4.4044 (1.23)     130.4486 (1.0)           5.2672 (1.0)           2;0  7.6969 (1.0)           7           1
test_benchmark_glob_daft[setup_bucket:partitioned-data-balanced-page_size:100-fanout_limit:512]       139.1545 (1.14)        154.2715 (1.14)        147.1317 (1.13)          5.6699 (1.58)     145.7969 (1.12)          8.4615 (1.61)          3;0  6.7966 (0.88)          6           1
test_benchmark_glob_daft[setup_bucket:partitioned-data-balanced-page_size:1_000-fanout_limit:128]     140.5694 (1.15)        159.8324 (1.18)        150.3679 (1.16)          7.0939 (1.98)     150.2839 (1.15)         11.4939 (2.18)          2;0  6.6504 (0.86)          7           1
test_benchmark_glob_daft[setup_bucket:partitioned-data-balanced-page_size:1_000-fanout_limit:256]     142.3119 (1.16)        168.3845 (1.24)        151.4110 (1.17)          8.8785 (2.47)     147.5623 (1.13)          9.8568 (1.87)          2;0  6.6045 (0.86)          7           1
test_benchmark_glob_daft[setup_bucket:partitioned-data-balanced-page_size:100-fanout_limit:128]       143.0764 (1.17)        165.5693 (1.22)        150.1923 (1.16)          7.4232 (2.07)     149.8735 (1.15)          5.5138 (1.05)          1;1  6.6581 (0.87)          7           1
test_benchmark_glob_daft[setup_bucket:partitioned-data-balanced-page_size:1_000-fanout_limit:512]     144.9715 (1.19)        154.5524 (1.14)        150.0353 (1.15)          3.5884 (1.0)      151.2558 (1.16)          5.7393 (1.09)          2;0  6.6651 (0.87)          7           1
test_benchmark_glob_daft[setup_bucket:partitioned-data-balanced-page_size:100-fanout_limit:256]       145.5017 (1.19)        159.1665 (1.17)        150.5677 (1.16)          4.9662 (1.38)     149.3066 (1.14)          7.0808 (1.34)          2;0  6.6415 (0.86)          7           1
test_benchmark_io_list_recursive_daft[setup_bucket:partitioned-data-balanced]                         253.2256 (2.07)        289.9312 (2.14)        275.0586 (2.12)         17.9586 (5.00)     285.1925 (2.19)         32.5217 (6.17)          1;0  3.6356 (0.47)          5           1
test_benchmark_glob_daft[setup_bucket:partitioned-data-balanced-page_size:1_000-fanout_limit:8]       259.0277 (2.12)        273.6566 (2.02)        266.0233 (2.05)          5.4274 (1.51)     265.9544 (2.04)          7.0869 (1.35)          2;0  3.7591 (0.49)          5           1
test_benchmark_glob_daft[setup_bucket:partitioned-data-balanced-page_size:100-fanout_limit:64]        407.3057 (3.33)        439.9604 (3.24)        417.3086 (3.21)         13.3017 (3.71)     413.6960 (3.17)         14.9304 (2.83)          1;0  2.3963 (0.31)          5           1
test_benchmark_glob_daft[setup_bucket:partitioned-data-balanced-page_size:100-fanout_limit:8]         430.3860 (3.52)        494.7791 (3.65)        449.9261 (3.46)         25.5724 (7.13)     440.9530 (3.38)         19.1313 (3.63)          1;1  2.2226 (0.29)          5           1
test_benchmark_glob_boto3_list[setup_bucket:partitioned-data-balanced-page_size:1_000]                541.3400 (4.43)        630.9887 (4.65)        571.7699 (4.40)         36.8388 (10.27)    552.9752 (4.24)         48.2272 (9.16)          1;0  1.7490 (0.23)          5           1
test_benchmark_glob_s3fs[setup_bucket:partitioned-data-balanced]                                      574.1520 (4.70)        681.6861 (5.03)        599.9184 (4.62)         45.9656 (12.81)    580.9244 (4.45)         34.8155 (6.61)          1;1  1.6669 (0.22)          5           1
test_benchmark_glob_boto3_list[setup_bucket:partitioned-data-balanced-page_size:100]                  776.3258 (6.35)     61,531.4728 (453.62)   25,083.2981 (193.06)   33,263.9043 (>1000.0)  816.4429 (6.26)     60,738.6150 (>1000.0)       2;0  0.0399 (0.01)          5           1
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------------------------- benchmark 'setup_bucket=partitioned-data-left-skew-dirs': 14 tests -------------------------------------------------------------------------------------------------------------
Name (time in ms)                                                                                                Min                    Max                   Mean                 StdDev              Median                    IQR            Outliers     OPS            Rounds  Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_glob_daft[setup_bucket:partitioned-data-left-skew-dirs-page_size:1_000-fanout_limit:256]     134.7646 (1.0)         186.9283 (1.17)        151.9002 (1.02)         20.7400 (2.73)     140.3700 (1.0)          33.6649 (3.23)          2;0  6.5833 (0.98)          8           1
test_benchmark_glob_daft[setup_bucket:partitioned-data-left-skew-dirs-page_size:1_000-fanout_limit:512]     136.1092 (1.01)        159.8688 (1.0)         149.4955 (1.00)          8.3767 (1.10)     147.3587 (1.05)         12.1757 (1.17)          3;0  6.6892 (1.00)          7           1
test_benchmark_glob_daft[setup_bucket:partitioned-data-left-skew-dirs-page_size:100-fanout_limit:256]       139.6590 (1.04)        161.8424 (1.01)        148.8099 (1.0)           7.6099 (1.0)      149.2558 (1.06)         10.4165 (1.0)           2;0  6.7200 (1.0)           7           1
test_benchmark_glob_daft[setup_bucket:partitioned-data-left-skew-dirs-page_size:100-fanout_limit:512]       141.1836 (1.05)        193.5507 (1.21)        161.9213 (1.09)         19.7722 (2.60)     152.2920 (1.08)         30.8197 (2.96)          3;0  6.1758 (0.92)          7           1
test_benchmark_glob_daft[setup_bucket:partitioned-data-left-skew-dirs-page_size:1_000-fanout_limit:64]      159.2753 (1.18)        234.5834 (1.47)        176.8461 (1.19)         28.7204 (3.77)     166.3616 (1.19)         12.3961 (1.19)          1;1  5.6546 (0.84)          6           1
test_benchmark_glob_daft[setup_bucket:partitioned-data-left-skew-dirs-page_size:1_000-fanout_limit:128]     201.4852 (1.50)        295.8799 (1.85)        239.8800 (1.61)         38.1874 (5.02)     241.7812 (1.72)         57.6243 (5.53)          2;0  4.1688 (0.62)          5           1
test_benchmark_glob_daft[setup_bucket:partitioned-data-left-skew-dirs-page_size:100-fanout_limit:64]        243.1403 (1.80)        535.2877 (3.35)        463.8081 (3.12)        124.5574 (16.37)    518.4520 (3.69)        103.0243 (9.89)          1;1  2.1561 (0.32)          5           1
test_benchmark_io_list_recursive_daft[setup_bucket:partitioned-data-left-skew-dirs]                         250.8679 (1.86)        284.7067 (1.78)        263.8107 (1.77)         13.7608 (1.81)     256.6717 (1.83)         19.1706 (1.84)          1;0  3.7906 (0.56)          5           1
test_benchmark_glob_daft[setup_bucket:partitioned-data-left-skew-dirs-page_size:100-fanout_limit:128]       275.0953 (2.04)        531.2042 (3.32)        334.6774 (2.25)        110.3208 (14.50)    284.5897 (2.03)         80.0789 (7.69)          1;1  2.9880 (0.44)          5           1
test_benchmark_glob_daft[setup_bucket:partitioned-data-left-skew-dirs-page_size:1_000-fanout_limit:8]       276.2865 (2.05)        301.3190 (1.88)        285.3326 (1.92)          9.9727 (1.31)     283.4705 (2.02)         13.2982 (1.28)          1;0  3.5047 (0.52)          5           1
test_benchmark_glob_daft[setup_bucket:partitioned-data-left-skew-dirs-page_size:100-fanout_limit:8]         403.0369 (2.99)     11,148.4390 (69.73)     2,554.7771 (17.17)     4,804.0040 (631.29)   406.0586 (2.89)      2,691.0323 (258.34)        1;1  0.3914 (0.06)          5           1
test_benchmark_glob_boto3_list[setup_bucket:partitioned-data-left-skew-dirs-page_size:1_000]                498.6411 (3.70)        622.9111 (3.90)        526.4255 (3.54)         54.0454 (7.10)     502.9483 (3.58)         36.7207 (3.53)          1;1  1.8996 (0.28)          5           1
test_benchmark_glob_s3fs[setup_bucket:partitioned-data-left-skew-dirs]                                      519.7612 (3.86)        636.5656 (3.98)        569.3776 (3.83)         56.5931 (7.44)     533.9758 (3.80)         99.7397 (9.58)          1;0  1.7563 (0.26)          5           1
test_benchmark_glob_boto3_list[setup_bucket:partitioned-data-left-skew-dirs-page_size:100]                  773.7305 (5.74)     60,982.3894 (381.45)   12,825.8703 (86.19)    26,920.3185 (>1000.0)  781.1053 (5.56)     15,083.8829 (>1000.0)       1;1  0.0780 (0.01)          5           1
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------------------------- benchmark 'setup_bucket=partitioned-data-left-skew-files': 14 tests -------------------------------------------------------------------------------------------------------------
Name (time in ms)                                                                                                 Min                    Max                   Mean                 StdDev              Median                    IQR            Outliers     OPS            Rounds  Iterations
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_glob_daft[setup_bucket:partitioned-data-left-skew-files-page_size:1_000-fanout_limit:64]      138.5641 (1.0)         243.2327 (1.37)        194.6797 (1.12)         31.4781 (9.80)     199.4270 (1.14)         35.9167 (6.02)          2;0  5.1366 (0.89)          8           1
test_benchmark_glob_daft[setup_bucket:partitioned-data-left-skew-files-page_size:1_000-fanout_limit:512]     165.6827 (1.20)        177.6910 (1.0)         173.1174 (1.0)           4.5386 (1.41)     174.2640 (1.0)           6.9038 (1.16)          2;0  5.7764 (1.0)           6           1
test_benchmark_glob_daft[setup_bucket:partitioned-data-left-skew-files-page_size:1_000-fanout_limit:256]     170.9759 (1.23)        188.7314 (1.06)        178.9268 (1.03)          8.5573 (2.67)     176.2209 (1.01)         16.3389 (2.74)          1;0  5.5889 (0.97)          5           1
test_benchmark_glob_daft[setup_bucket:partitioned-data-left-skew-files-page_size:100-fanout_limit:256]       189.2530 (1.37)        243.2769 (1.37)        214.2520 (1.24)         20.6074 (6.42)     218.8988 (1.26)         27.4193 (4.59)          2;0  4.6674 (0.81)          5           1
test_benchmark_glob_daft[setup_bucket:partitioned-data-left-skew-files-page_size:100-fanout_limit:512]       190.1659 (1.37)        462.8728 (2.60)        271.9665 (1.57)        109.3536 (34.06)    225.6389 (1.29)         93.7103 (15.70)         1;1  3.6769 (0.64)          5           1
test_benchmark_glob_daft[setup_bucket:partitioned-data-left-skew-files-page_size:100-fanout_limit:128]       198.1573 (1.43)        228.3224 (1.28)        209.7147 (1.21)         13.0677 (4.07)     202.1975 (1.16)         20.3844 (3.42)          1;0  4.7684 (0.83)          5           1
test_benchmark_glob_daft[setup_bucket:partitioned-data-left-skew-files-page_size:1_000-fanout_limit:128]     213.9063 (1.54)        338.8295 (1.91)        266.6174 (1.54)         47.4164 (14.77)    248.4503 (1.43)         58.6701 (9.83)          2;0  3.7507 (0.65)          5           1
test_benchmark_io_list_recursive_daft[setup_bucket:partitioned-data-left-skew-files]                         270.2626 (1.95)        299.4954 (1.69)        284.9257 (1.65)         10.9275 (3.40)     282.1723 (1.62)         13.9084 (2.33)          2;0  3.5097 (0.61)          5           1
test_benchmark_glob_daft[setup_bucket:partitioned-data-left-skew-files-page_size:1_000-fanout_limit:8]       284.4802 (2.05)        312.8341 (1.76)        295.5863 (1.71)         12.2095 (3.80)     293.4016 (1.68)         20.6241 (3.46)          1;0  3.3831 (0.59)          5           1
test_benchmark_glob_daft[setup_bucket:partitioned-data-left-skew-files-page_size:100-fanout_limit:64]        408.4270 (2.95)        481.6110 (2.71)        434.8469 (2.51)         28.4953 (8.88)     425.2613 (2.44)         33.7858 (5.66)          1;0  2.2997 (0.40)          5           1
test_benchmark_glob_daft[setup_bucket:partitioned-data-left-skew-files-page_size:100-fanout_limit:8]         443.0699 (3.20)        449.8287 (2.53)        446.0205 (2.58)          3.2106 (1.0)      444.7324 (2.55)          5.9683 (1.0)           1;0  2.2420 (0.39)          5           1
test_benchmark_glob_boto3_list[setup_bucket:partitioned-data-left-skew-files-page_size:1_000]                545.8004 (3.94)        669.8062 (3.77)        595.9021 (3.44)         62.7771 (19.55)    552.8211 (3.17)        111.4303 (18.67)         2;0  1.6781 (0.29)          5           1
test_benchmark_glob_s3fs[setup_bucket:partitioned-data-left-skew-files]                                      571.3430 (4.12)      1,018.3662 (5.73)        731.2774 (4.22)        177.8392 (55.39)    652.8226 (3.75)        223.8406 (37.51)         1;0  1.3675 (0.24)          5           1
test_benchmark_glob_boto3_list[setup_bucket:partitioned-data-left-skew-files-page_size:100]                  860.1503 (6.21)     61,587.4814 (346.60)   13,009.5538 (75.15)    27,155.8873 (>1000.0)  867.2178 (4.98)     15,185.2729 (>1000.0)       1;1  0.0769 (0.01)          5           1
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------------------------- benchmark 'setup_bucket=partitioned-data-right-skew-dirs': 14 tests -------------------------------------------------------------------------------------------------------------
Name (time in ms)                                                                                                 Min                    Max                   Mean                 StdDev              Median                    IQR            Outliers     OPS            Rounds  Iterations
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_glob_daft[setup_bucket:partitioned-data-right-skew-dirs-page_size:100-fanout_limit:512]       134.8738 (1.0)         161.9070 (1.0)         147.3364 (1.0)           8.9912 (2.03)     146.7323 (1.0)          11.5473 (4.11)          2;0  6.7872 (1.0)           7           1
test_benchmark_glob_daft[setup_bucket:partitioned-data-right-skew-dirs-page_size:1_000-fanout_limit:256]     135.9932 (1.01)        182.0130 (1.12)        156.3985 (1.06)         19.6000 (4.43)     151.1282 (1.03)         36.0945 (12.84)         3;0  6.3939 (0.94)          8           1
test_benchmark_glob_daft[setup_bucket:partitioned-data-right-skew-dirs-page_size:1_000-fanout_limit:512]     137.9243 (1.02)        169.0678 (1.04)        149.9035 (1.02)         11.0087 (2.49)     148.6662 (1.01)         14.3189 (5.09)          4;0  6.6710 (0.98)          8           1
test_benchmark_glob_daft[setup_bucket:partitioned-data-right-skew-dirs-page_size:100-fanout_limit:256]       139.2581 (1.03)        176.8020 (1.09)        157.7510 (1.07)         13.4480 (3.04)     156.2790 (1.07)         21.5269 (7.66)          3;0  6.3391 (0.93)          7           1
test_benchmark_glob_daft[setup_bucket:partitioned-data-right-skew-dirs-page_size:1_000-fanout_limit:64]      165.2625 (1.23)        180.0114 (1.11)        171.8706 (1.17)          4.7947 (1.08)     171.3480 (1.17)          2.8115 (1.0)           2;2  5.8183 (0.86)          6           1
test_benchmark_glob_daft[setup_bucket:partitioned-data-right-skew-dirs-page_size:1_000-fanout_limit:128]     229.1155 (1.70)        251.6386 (1.55)        240.7588 (1.63)          8.5453 (1.93)     240.7843 (1.64)         12.1587 (4.32)          2;0  4.1535 (0.61)          5           1
test_benchmark_io_list_recursive_daft[setup_bucket:partitioned-data-right-skew-dirs]                         246.5664 (1.83)        349.9480 (2.16)        272.2951 (1.85)         43.7490 (9.88)     254.4813 (1.73)         34.3877 (12.23)         1;1  3.6725 (0.54)          5           1
test_benchmark_glob_daft[setup_bucket:partitioned-data-right-skew-dirs-page_size:100-fanout_limit:64]        248.4687 (1.84)        532.7757 (3.29)        466.5687 (3.17)        122.4014 (27.64)    521.2408 (3.55)         88.2590 (31.39)         1;1  2.1433 (0.32)          5           1
test_benchmark_glob_daft[setup_bucket:partitioned-data-right-skew-dirs-page_size:1_000-fanout_limit:8]       273.7005 (2.03)        283.7165 (1.75)        277.8555 (1.89)          5.1845 (1.17)     274.4072 (1.87)          9.4181 (3.35)          2;0  3.5990 (0.53)          5           1
test_benchmark_glob_daft[setup_bucket:partitioned-data-right-skew-dirs-page_size:100-fanout_limit:128]       285.0055 (2.11)        295.9085 (1.83)        289.8968 (1.97)          4.4282 (1.0)      290.7411 (1.98)          6.9323 (2.47)          2;0  3.4495 (0.51)          5           1
test_benchmark_glob_daft[setup_bucket:partitioned-data-right-skew-dirs-page_size:100-fanout_limit:8]         401.6999 (2.98)     10,885.8273 (67.24)     2,508.1816 (17.02)     4,683.2785 (>1000.0)  407.2993 (2.78)      2,651.8836 (943.23)        1;1  0.3987 (0.06)          5           1
test_benchmark_glob_boto3_list[setup_bucket:partitioned-data-right-skew-dirs-page_size:1_000]                496.1212 (3.68)        591.8494 (3.66)        517.7779 (3.51)         41.4981 (9.37)     501.5133 (3.42)         28.0139 (9.96)          1;1  1.9313 (0.28)          5           1
test_benchmark_glob_s3fs[setup_bucket:partitioned-data-right-skew-dirs]                                      522.2731 (3.87)        629.6462 (3.89)        555.5160 (3.77)         43.2903 (9.78)     545.7491 (3.72)         45.7038 (16.26)         1;0  1.8001 (0.27)          5           1
test_benchmark_glob_boto3_list[setup_bucket:partitioned-data-right-skew-dirs-page_size:100]                  715.1889 (5.30)     61,194.2763 (377.96)   12,829.9383 (87.08)    27,036.5085 (>1000.0)  723.6155 (4.93)     15,179.0604 (>1000.0)       1;1  0.0779 (0.01)          5           1
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------------------------ benchmark 'setup_bucket=partitioned-data-right-skew-files': 14 tests -----------------------------------------------------------------------------------------------------------
Name (time in ms)                                                                                                  Min                    Max                  Mean                StdDev              Median                   IQR            Outliers     OPS            Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_glob_daft[setup_bucket:partitioned-data-right-skew-files-page_size:1_000-fanout_limit:64]      123.3110 (1.0)         151.4593 (1.0)        136.3031 (1.0)          9.9157 (2.25)     137.4372 (1.0)         16.0701 (3.13)          2;0  7.3366 (1.0)           8           1
test_benchmark_glob_daft[setup_bucket:partitioned-data-right-skew-files-page_size:1_000-fanout_limit:128]     155.0425 (1.26)        182.8814 (1.21)       167.2268 (1.23)         9.2181 (2.09)     167.7227 (1.22)        11.6201 (2.26)          2;0  5.9799 (0.82)          7           1
test_benchmark_glob_daft[setup_bucket:partitioned-data-right-skew-files-page_size:1_000-fanout_limit:512]     157.1675 (1.27)        177.4776 (1.17)       166.8764 (1.22)         8.2186 (1.87)     168.1872 (1.22)        13.9531 (2.72)          3;0  5.9925 (0.82)          6           1
test_benchmark_glob_daft[setup_bucket:partitioned-data-right-skew-files-page_size:1_000-fanout_limit:256]     159.7433 (1.30)        177.2750 (1.17)       165.0864 (1.21)         5.9178 (1.34)     163.2278 (1.19)         5.1313 (1.0)           1;1  6.0574 (0.83)          7           1
test_benchmark_glob_daft[setup_bucket:partitioned-data-right-skew-files-page_size:100-fanout_limit:512]       184.6212 (1.50)        226.7511 (1.50)       214.2303 (1.57)        17.2975 (3.93)     216.8075 (1.58)        18.1381 (3.53)          1;0  4.6679 (0.64)          5           1
test_benchmark_glob_daft[setup_bucket:partitioned-data-right-skew-files-page_size:100-fanout_limit:256]       196.2460 (1.59)        222.9414 (1.47)       214.5209 (1.57)        10.9348 (2.49)     219.9118 (1.60)        12.8928 (2.51)          1;0  4.6616 (0.64)          5           1
test_benchmark_glob_daft[setup_bucket:partitioned-data-right-skew-files-page_size:100-fanout_limit:128]       196.3901 (1.59)        231.2313 (1.53)       218.4320 (1.60)        14.2064 (3.23)     223.6417 (1.63)        20.4423 (3.98)          1;0  4.5781 (0.62)          5           1
test_benchmark_glob_daft[setup_bucket:partitioned-data-right-skew-files-page_size:1_000-fanout_limit:8]       262.9494 (2.13)        274.4871 (1.81)       267.8134 (1.96)         4.4001 (1.0)      268.3080 (1.95)         5.5565 (1.08)          2;0  3.7339 (0.51)          5           1
test_benchmark_io_list_recursive_daft[setup_bucket:partitioned-data-right-skew-files]                         274.8266 (2.23)        331.2182 (2.19)       293.0396 (2.15)        22.4073 (5.09)     287.0752 (2.09)        23.9019 (4.66)          1;0  3.4125 (0.47)          5           1
test_benchmark_glob_daft[setup_bucket:partitioned-data-right-skew-files-page_size:100-fanout_limit:64]        392.8144 (3.19)        448.4930 (2.96)       430.1389 (3.16)        22.4789 (5.11)     438.4684 (3.19)        27.5529 (5.37)          1;0  2.3248 (0.32)          5           1
test_benchmark_glob_daft[setup_bucket:partitioned-data-right-skew-files-page_size:100-fanout_limit:8]         443.9558 (3.60)     10,759.3326 (71.04)    2,513.0559 (18.44)    4,609.8136 (>1000.0)  455.4045 (3.31)     2,589.2998 (504.60)        1;1  0.3979 (0.05)          5           1
test_benchmark_glob_boto3_list[setup_bucket:partitioned-data-right-skew-files-page_size:1_000]                547.0223 (4.44)        652.6025 (4.31)       576.6361 (4.23)        42.9441 (9.76)     560.6614 (4.08)        30.1182 (5.87)          1;1  1.7342 (0.24)          5           1
test_benchmark_glob_s3fs[setup_bucket:partitioned-data-right-skew-files]                                      570.8940 (4.63)        673.1797 (4.44)       614.8166 (4.51)        50.4939 (11.48)    586.4322 (4.27)        92.3190 (17.99)         2;0  1.6265 (0.22)          5           1
test_benchmark_glob_boto3_list[setup_bucket:partitioned-data-right-skew-files-page_size:100]                  808.1026 (6.55)        852.2430 (5.63)       822.5225 (6.03)        18.5667 (4.22)     814.7859 (5.93)        26.0045 (5.07)          1;0  1.2158 (0.17)          5           1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Base automatically changed from jay/native-globbing to main October 3, 2023 00:41
@jaychia jaychia force-pushed the jay/native-globbing-early-stopping branch from ae81313 to 2706650 Compare October 3, 2023 00:41
@codecov
Copy link

codecov bot commented Oct 3, 2023

Codecov Report

Merging #1452 (e4dbd13) into main (9c32d73) will increase coverage by 0.01%.
The diff coverage is n/a.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1452      +/-   ##
==========================================
+ Coverage   74.64%   74.66%   +0.01%     
==========================================
  Files          60       60              
  Lines        6042     6042              
==========================================
+ Hits         4510     4511       +1     
+ Misses       1532     1531       -1     

see 1 file with indirect coverage changes

@jaychia
Copy link
Contributor Author

jaychia commented Oct 3, 2023

Benchmarks running on AWS S3 and in EC2:

---------------------------------------------------------------------------------------------- benchmark 'setup_bucket=one-file-per-dir': 8 tests ----------------------------------------------------------------------------------------------
Name (time in ms)                                                   Min                    Max                   Mean              StdDev                 Median                   IQR            Outliers     OPS            Rounds  Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_glob_daft[one-file-per-dir-100-256]             922.8166 (1.0)         943.2092 (1.0)         932.0248 (1.0)        8.0528 (1.58)        929.5299 (1.0)         12.0656 (1.92)          2;0  1.0729 (1.0)           5           1
test_benchmark_glob_daft[one-file-per-dir-100-128]             934.7766 (1.01)        948.5476 (1.01)        942.1334 (1.01)       5.0828 (1.0)         943.1045 (1.01)         6.2755 (1.0)           2;0  1.0614 (0.99)          5           1
test_benchmark_glob_daft[one-file-per-dir-1000-256]            947.2192 (1.03)        977.1858 (1.04)        963.1141 (1.03)      12.5968 (2.48)        959.7690 (1.03)        20.8876 (3.33)          2;0  1.0383 (0.97)          5           1
test_benchmark_glob_daft[one-file-per-dir-1000-128]            950.5481 (1.03)      1,060.0713 (1.12)        980.8208 (1.05)      44.9363 (8.84)        966.8742 (1.04)        36.3742 (5.80)          1;1  1.0196 (0.95)          5           1
test_benchmark_glob_boto3_list[one-file-per-dir-1000]        2,139.8908 (2.32)      2,243.2323 (2.38)      2,199.2541 (2.36)      45.1138 (8.88)      2,221.0372 (2.39)        75.4486 (12.02)         1;0  0.4547 (0.42)          5           1
test_benchmark_glob_s3fs[one-file-per-dir]                   2,220.1224 (2.41)      2,261.5436 (2.40)      2,245.8032 (2.41)      16.5399 (3.25)      2,249.7271 (2.42)        23.4971 (3.74)          1;0  0.4453 (0.42)          5           1
test_benchmark_glob_boto3_list[one-file-per-dir-100]         5,429.8453 (5.88)      5,944.0960 (6.30)      5,732.3480 (6.15)     216.8590 (42.66)     5,849.3330 (6.29)       334.6059 (53.32)         1;0  0.1744 (0.16)          5           1
test_benchmark_io_list_recursive_daft[one-file-per-dir]     45,511.4757 (49.32)    47,005.9711 (49.84)    46,132.7139 (49.50)    624.1900 (122.80)   46,068.1810 (49.56)    1,034.4738 (164.84)        1;0  0.0217 (0.02)          5           1
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------- benchmark 'setup_bucket=partitioned-data-balanced': 8 tests -------------------------------------------------------------------------------------------
Name (time in ms)                                                           Min                   Max                  Mean              StdDev                Median                 IQR            Outliers     OPS            Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_glob_daft[partitioned-data-balanced-1000-256]           924.4737 (1.0)        986.2079 (1.01)       954.2737 (1.01)      25.5804 (2.48)       956.8314 (1.02)      43.3363 (4.55)          2;0  1.0479 (0.99)          5           1
test_benchmark_glob_daft[partitioned-data-balanced-1000-128]           929.6083 (1.01)       992.7560 (1.02)       944.6160 (1.0)       26.9687 (2.62)       933.7993 (1.0)       16.5835 (1.74)          1;1  1.0586 (1.0)           5           1
test_benchmark_glob_daft[partitioned-data-balanced-100-128]            943.3698 (1.02)     1,400.7239 (1.43)     1,048.1702 (1.11)     197.5532 (19.16)      961.6159 (1.03)     134.1142 (14.07)         1;1  0.9540 (0.90)          5           1
test_benchmark_glob_daft[partitioned-data-balanced-100-256]            950.2330 (1.03)       977.1553 (1.0)        967.7476 (1.02)      10.3114 (1.0)        969.5682 (1.04)       9.5290 (1.0)           1;0  1.0333 (0.98)          5           1
test_benchmark_glob_boto3_list[partitioned-data-balanced-1000]       2,146.5787 (2.32)     2,405.7797 (2.46)     2,238.0434 (2.37)      98.2898 (9.53)     2,210.7653 (2.37)      79.7248 (8.37)          1;1  0.4468 (0.42)          5           1
test_benchmark_glob_s3fs[partitioned-data-balanced]                  2,167.3272 (2.34)     2,331.4793 (2.39)     2,258.2162 (2.39)      61.8661 (6.00)     2,256.2980 (2.42)      81.9348 (8.60)          2;0  0.4428 (0.42)          5           1
test_benchmark_io_list_recursive_daft[partitioned-data-balanced]     4,588.5144 (4.96)     4,748.5673 (4.86)     4,703.9149 (4.98)      65.2151 (6.32)     4,728.4979 (5.06)      45.9187 (4.82)          1;1  0.2126 (0.20)          5           1
test_benchmark_glob_boto3_list[partitioned-data-balanced-100]        5,711.0636 (6.18)     5,918.6253 (6.06)     5,843.6531 (6.19)      79.1204 (7.67)     5,852.0659 (6.27)      76.9459 (8.07)          1;0  0.1711 (0.16)          5           1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------------------------------------- benchmark 'setup_bucket=partitioned-data-bushy-early': 8 tests ---------------------------------------------------------------------------------------------
Name (time in ms)                                                               Min                    Max                   Mean              StdDev                 Median                 IQR            Outliers     OPS            Rounds  Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_glob_daft[partitioned-data-bushy-early-1000-128]            903.9532 (1.0)       1,146.2781 (1.0)         991.0503 (1.0)       94.6051 (2.57)        979.2657 (1.0)      112.4624 (2.19)          1;0  1.0090 (1.0)           5           1
test_benchmark_glob_daft[partitioned-data-bushy-early-1000-256]          1,056.5512 (1.17)      1,152.0701 (1.01)      1,097.5205 (1.11)      36.8668 (1.0)       1,087.1650 (1.11)      51.3355 (1.0)           2;0  0.9111 (0.90)          5           1
test_benchmark_glob_boto3_list[partitioned-data-bushy-early-1000]        1,960.4860 (2.17)      2,110.7342 (1.84)      2,038.4531 (2.06)      68.0327 (1.85)      2,052.8846 (2.10)     126.0740 (2.46)          2;0  0.4906 (0.49)          5           1
test_benchmark_glob_s3fs[partitioned-data-bushy-early]                   2,031.1216 (2.25)      2,154.1490 (1.88)      2,106.4585 (2.13)      47.3566 (1.28)      2,105.7340 (2.15)      57.1154 (1.11)          2;0  0.4747 (0.47)          5           1
test_benchmark_glob_daft[partitioned-data-bushy-early-100-256]           4,706.3957 (5.21)      4,883.0649 (4.26)      4,808.7575 (4.85)      72.0657 (1.95)      4,842.3254 (4.94)     108.0922 (2.11)          2;0  0.2080 (0.21)          5           1
test_benchmark_glob_daft[partitioned-data-bushy-early-100-128]           4,842.3956 (5.36)      5,123.8571 (4.47)      4,955.4958 (5.00)     117.8701 (3.20)      4,969.4470 (5.07)     185.8090 (3.62)          1;0  0.2018 (0.20)          5           1
test_benchmark_glob_boto3_list[partitioned-data-bushy-early-100]         5,620.9676 (6.22)      5,945.9131 (5.19)      5,750.9109 (5.80)     138.6722 (3.76)      5,735.4237 (5.86)     233.2490 (4.54)          1;0  0.1739 (0.17)          5           1
test_benchmark_io_list_recursive_daft[partitioned-data-bushy-early]     12,210.4545 (13.51)    12,442.8573 (10.86)    12,301.3519 (12.41)     99.7145 (2.70)     12,269.8255 (12.53)    165.7973 (3.23)          1;0  0.0813 (0.08)          5           1
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------- benchmark 'setup_bucket=partitioned-data-bushy-late': 8 tests -------------------------------------------------------------------------------------------
Name (time in ms)                                                             Min                   Max                  Mean              StdDev                Median                 IQR            Outliers     OPS            Rounds  Iterations
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_glob_daft[partitioned-data-bushy-late-1000-256]           996.0310 (1.0)      1,248.4970 (1.0)      1,072.3512 (1.0)      100.9137 (2.01)     1,047.7031 (1.0)       85.5866 (1.64)          1;1  0.9325 (1.0)           5           1
test_benchmark_glob_daft[partitioned-data-bushy-late-1000-128]         1,031.0781 (1.04)     1,278.2812 (1.02)     1,122.8120 (1.05)     102.6160 (2.05)     1,099.8839 (1.05)     157.7075 (3.03)          1;0  0.8906 (0.96)          5           1
test_benchmark_glob_s3fs[partitioned-data-bushy-late]                  1,976.9866 (1.98)     2,181.5205 (1.75)     2,097.5513 (1.96)      85.7277 (1.71)     2,121.4635 (2.02)     140.8707 (2.70)          1;0  0.4767 (0.51)          5           1
test_benchmark_glob_boto3_list[partitioned-data-bushy-late-1000]       2,105.0167 (2.11)     2,255.0866 (1.81)     2,187.3279 (2.04)      60.2273 (1.20)     2,193.2562 (2.09)      96.2919 (1.85)          2;0  0.4572 (0.49)          5           1
test_benchmark_io_list_recursive_daft[partitioned-data-bushy-late]     4,254.4595 (4.27)     4,428.5011 (3.55)     4,322.6489 (4.03)      78.5986 (1.57)     4,274.5772 (4.08)     128.9670 (2.47)          1;0  0.2313 (0.25)          5           1
test_benchmark_glob_daft[partitioned-data-bushy-late-100-128]          4,747.9671 (4.77)     5,032.5390 (4.03)     4,876.0880 (4.55)     113.9584 (2.28)     4,898.4336 (4.68)     173.2691 (3.32)          2;0  0.2051 (0.22)          5           1
test_benchmark_glob_daft[partitioned-data-bushy-late-100-256]          4,787.2884 (4.81)     4,919.7063 (3.94)     4,869.5510 (4.54)      50.0836 (1.0)      4,872.6894 (4.65)      52.1180 (1.0)           2;0  0.2054 (0.22)          5           1
test_benchmark_glob_boto3_list[partitioned-data-bushy-late-100]        5,577.2037 (5.60)     5,822.8991 (4.66)     5,703.8201 (5.32)     113.2413 (2.26)     5,673.4757 (5.42)     209.3263 (4.02)          3;0  0.1753 (0.19)          5           1
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------- benchmark 'setup_bucket=partitioned-data-left-skew-dirs': 8 tests -------------------------------------------------------------------------------------------
Name (time in ms)                                                                 Min                   Max                  Mean              StdDev                Median                 IQR            Outliers     OPS            Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_glob_daft[partitioned-data-left-skew-dirs-1000-256]           911.9300 (1.0)        946.9144 (1.0)        921.3870 (1.0)       14.8002 (1.37)       914.1802 (1.0)       15.7721 (1.27)          1;0  1.0853 (1.0)           5           1
test_benchmark_glob_daft[partitioned-data-left-skew-dirs-100-256]            950.8700 (1.04)       977.4617 (1.03)       959.2754 (1.04)      10.7723 (1.0)        955.0574 (1.04)      12.4100 (1.0)           1;0  1.0425 (0.96)          5           1
test_benchmark_glob_daft[partitioned-data-left-skew-dirs-1000-128]           971.7375 (1.07)     1,053.3885 (1.11)     1,016.9940 (1.10)      29.4575 (2.73)     1,020.6215 (1.12)      29.3753 (2.37)          2;0  0.9833 (0.91)          5           1
test_benchmark_glob_boto3_list[partitioned-data-left-skew-dirs-1000]       2,065.4774 (2.26)     2,308.3819 (2.44)     2,183.3584 (2.37)      97.1508 (9.02)     2,197.9975 (2.40)     154.2497 (12.43)         2;0  0.4580 (0.42)          5           1
test_benchmark_glob_s3fs[partitioned-data-left-skew-dirs]                  2,226.3009 (2.44)     2,321.4736 (2.45)     2,275.2653 (2.47)      37.2787 (3.46)     2,272.9340 (2.49)      57.4631 (4.63)          2;0  0.4395 (0.40)          5           1
test_benchmark_glob_daft[partitioned-data-left-skew-dirs-100-128]          2,387.0591 (2.62)     2,575.1336 (2.72)     2,466.5299 (2.68)      76.7335 (7.12)     2,436.4807 (2.67)     118.4726 (9.55)          2;0  0.4054 (0.37)          5           1
test_benchmark_io_list_recursive_daft[partitioned-data-left-skew-dirs]     4,443.9865 (4.87)     4,623.2038 (4.88)     4,555.3393 (4.94)      77.4926 (7.19)     4,577.2296 (5.01)     129.0114 (10.40)         1;0  0.2195 (0.20)          5           1
test_benchmark_glob_boto3_list[partitioned-data-left-skew-dirs-100]        5,414.7249 (5.94)     5,823.1391 (6.15)     5,681.8101 (6.17)     170.9285 (15.87)    5,740.7790 (6.28)     251.5411 (20.27)         1;0  0.1760 (0.16)          5           1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------- benchmark 'setup_bucket=partitioned-data-left-skew-files': 8 tests -------------------------------------------------------------------------------------------
Name (time in ms)                                                                  Min                   Max                  Mean              StdDev                Median                 IQR            Outliers     OPS            Rounds  Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_glob_daft[partitioned-data-left-skew-files-1000-128]           973.1317 (1.0)        991.9393 (1.0)        983.6127 (1.0)        7.1482 (1.0)        984.1016 (1.0)        9.8523 (1.0)           2;0  1.0167 (1.0)           5           1
test_benchmark_glob_daft[partitioned-data-left-skew-files-1000-256]           980.9880 (1.01)     1,020.8125 (1.03)       992.8197 (1.01)      16.3077 (2.28)       988.7917 (1.00)      17.4818 (1.77)          1;0  1.0072 (0.99)          5           1
test_benchmark_glob_daft[partitioned-data-left-skew-files-100-128]          1,337.9062 (1.37)     1,419.9755 (1.43)     1,379.0752 (1.40)      29.5899 (4.14)     1,374.7367 (1.40)      31.3909 (3.19)          2;0  0.7251 (0.71)          5           1
test_benchmark_glob_daft[partitioned-data-left-skew-files-100-256]          1,346.9604 (1.38)     1,411.2739 (1.42)     1,380.3212 (1.40)      29.2470 (4.09)     1,379.2420 (1.40)      54.9526 (5.58)          2;0  0.7245 (0.71)          5           1
test_benchmark_glob_boto3_list[partitioned-data-left-skew-files-1000]       2,166.8689 (2.23)     2,238.8988 (2.26)     2,210.2283 (2.25)      29.6302 (4.15)     2,223.3895 (2.26)      44.7430 (4.54)          1;0  0.4524 (0.45)          5           1
test_benchmark_glob_s3fs[partitioned-data-left-skew-files]                  2,290.5829 (2.35)     2,379.8530 (2.40)     2,346.5211 (2.39)      37.6512 (5.27)     2,366.2794 (2.40)      56.4555 (5.73)          1;0  0.4262 (0.42)          5           1
test_benchmark_io_list_recursive_daft[partitioned-data-left-skew-files]     4,575.4927 (4.70)     4,772.7562 (4.81)     4,640.2457 (4.72)      81.0660 (11.34)    4,613.5328 (4.69)     107.2055 (10.88)         1;0  0.2155 (0.21)          5           1
test_benchmark_glob_boto3_list[partitioned-data-left-skew-files-100]        5,533.2437 (5.69)     5,897.6575 (5.95)     5,723.0835 (5.82)     141.1148 (19.74)    5,713.9669 (5.81)     210.9062 (21.41)         2;0  0.1747 (0.17)          5           1
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------- benchmark 'setup_bucket=partitioned-data-right-skew-dirs': 8 tests -------------------------------------------------------------------------------------------
Name (time in ms)                                                                  Min                   Max                  Mean              StdDev                Median                 IQR            Outliers     OPS            Rounds  Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_glob_daft[partitioned-data-right-skew-dirs-1000-256]           848.6907 (1.0)        905.1816 (1.0)        872.3632 (1.0)       21.2755 (1.0)        864.9555 (1.0)       25.5842 (1.0)           2;0  1.1463 (1.0)           5           1
test_benchmark_glob_daft[partitioned-data-right-skew-dirs-100-256]            939.4638 (1.11)       993.9926 (1.10)       965.2551 (1.11)      21.8175 (1.03)       960.9926 (1.11)      34.6841 (1.36)          2;0  1.0360 (0.90)          5           1
test_benchmark_glob_daft[partitioned-data-right-skew-dirs-1000-128]           974.4231 (1.15)     1,261.4406 (1.39)     1,071.9302 (1.23)     112.5898 (5.29)     1,030.3310 (1.19)     122.4367 (4.79)          1;0  0.9329 (0.81)          5           1
test_benchmark_glob_boto3_list[partitioned-data-right-skew-dirs-1000]       2,049.8630 (2.42)     2,336.5646 (2.58)     2,155.3547 (2.47)     107.1107 (5.03)     2,131.5849 (2.46)      75.1801 (2.94)          1;1  0.4640 (0.40)          5           1
test_benchmark_glob_s3fs[partitioned-data-right-skew-dirs]                  2,153.2300 (2.54)     2,247.0984 (2.48)     2,217.1506 (2.54)      37.3730 (1.76)     2,224.6345 (2.57)      38.3691 (1.50)          1;0  0.4510 (0.39)          5           1
test_benchmark_glob_daft[partitioned-data-right-skew-dirs-100-128]          2,373.0427 (2.80)     2,573.3823 (2.84)     2,495.9858 (2.86)      81.2354 (3.82)     2,521.3445 (2.91)     120.7975 (4.72)          1;0  0.4006 (0.35)          5           1
test_benchmark_io_list_recursive_daft[partitioned-data-right-skew-dirs]     4,475.1268 (5.27)     4,658.8421 (5.15)     4,586.2772 (5.26)      72.0654 (3.39)     4,615.7886 (5.34)      96.1454 (3.76)          2;0  0.2180 (0.19)          5           1
test_benchmark_glob_boto3_list[partitioned-data-right-skew-dirs-100]        5,487.5937 (6.47)     5,825.6772 (6.44)     5,670.1211 (6.50)     123.3416 (5.80)     5,665.0538 (6.55)     140.6435 (5.50)          2;0  0.1764 (0.15)          5           1
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------- benchmark 'setup_bucket=partitioned-data-right-skew-files': 8 tests -------------------------------------------------------------------------------------------
Name (time in ms)                                                                   Min                   Max                  Mean              StdDev                Median                 IQR            Outliers     OPS            Rounds  Iterations
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_glob_daft[partitioned-data-right-skew-files-1000-128]           986.5435 (1.0)      1,016.1466 (1.0)        998.5531 (1.0)       15.1840 (1.0)        989.2758 (1.0)       27.9558 (1.33)          2;0  1.0014 (1.0)           5           1
test_benchmark_glob_daft[partitioned-data-right-skew-files-1000-256]           999.2719 (1.01)     1,111.9583 (1.09)     1,029.9544 (1.03)      46.8773 (3.09)     1,010.2431 (1.02)      44.4565 (2.11)          1;0  0.9709 (0.97)          5           1
test_benchmark_glob_daft[partitioned-data-right-skew-files-100-128]          1,307.2191 (1.33)     1,414.3743 (1.39)     1,355.7786 (1.36)      40.1731 (2.65)     1,354.8713 (1.37)      53.2315 (2.53)          2;0  0.7376 (0.74)          5           1
test_benchmark_glob_daft[partitioned-data-right-skew-files-100-256]          1,348.7567 (1.37)     1,407.3953 (1.39)     1,368.0153 (1.37)      22.9341 (1.51)     1,364.0508 (1.38)      21.0211 (1.0)           1;1  0.7310 (0.73)          5           1
test_benchmark_glob_boto3_list[partitioned-data-right-skew-files-1000]       2,035.6103 (2.06)     2,181.1075 (2.15)     2,113.7821 (2.12)      53.5129 (3.52)     2,116.9437 (2.14)      65.8340 (3.13)          2;0  0.4731 (0.47)          5           1
test_benchmark_glob_s3fs[partitioned-data-right-skew-files]                  2,171.9215 (2.20)     2,275.2739 (2.24)     2,221.5856 (2.22)      39.6106 (2.61)     2,227.8391 (2.25)      56.4894 (2.69)          2;0  0.4501 (0.45)          5           1
test_benchmark_io_list_recursive_daft[partitioned-data-right-skew-files]     4,580.4426 (4.64)     4,781.8909 (4.71)     4,723.2389 (4.73)      83.3848 (5.49)     4,766.9632 (4.82)      87.9002 (4.18)          1;0  0.2117 (0.21)          5           1
test_benchmark_glob_boto3_list[partitioned-data-right-skew-files-100]        5,630.5489 (5.71)     5,993.0963 (5.90)     5,836.1035 (5.84)     148.7742 (9.80)     5,864.8464 (5.93)     242.9187 (11.56)         2;0  0.1713 (0.17)          5           1
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

src/daft-io/src/object_io.rs Outdated Show resolved Hide resolved
@jaychia jaychia merged commit a317dd7 into main Oct 3, 2023
@jaychia jaychia deleted the jay/native-globbing-early-stopping branch October 3, 2023 23:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant