Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[PERF] Improve hash table probe side decisions for Swordfish (#3327)
This PR lifts statistics into optimized logical plans so that they're available for local execution plans. It then uses these newly available statistics to make better decisions on whether to build the probe table of a hash join on the left or right side. ## Benchmark results For TPC-H, this gives us some notable speedups with Q5, Q8, and Q19. - Q5: ~2.2x (in memory), ~1.5 (parquet) - Q8: ~5.9x (in memory), ~2.2x (parquet) - Q19: ~7x (in memory), ~4.5x (parquet) Crucially, with this change, our native runner is now faster (or within some small deviation) than our previous python runner for all 22 TPC-H queries. For more detailed results, we have: ### Q5 #### Before ``` --------------------------------------------------------------------------------- benchmark 'q5-parts-1': 4 tests --------------------------------------------------------------------------------- Name (time in ms) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- test_tpch[1-in-memory-python-5] 28.7213 (1.0) 30.3664 (1.0) 29.4645 (1.0) 0.3849 (1.0) 29.4894 (1.0) 0.5529 (1.0) 10;0 33.9391 (1.0) 33 1 test_tpch[1-in-memory-native-5] 30.9980 (1.08) 34.1489 (1.12) 32.2151 (1.09) 0.7150 (1.86) 32.2903 (1.09) 1.1586 (2.10) 10;0 31.0413 (0.91) 31 1 test_tpch[1-parquet-python-5] 48.8010 (1.70) 51.8535 (1.71) 50.0985 (1.70) 0.8342 (2.17) 50.0400 (1.70) 1.4193 (2.57) 9;0 19.9607 (0.59) 20 1 test_tpch[1-parquet-native-5] 51.1122 (1.78) 54.0755 (1.78) 52.3799 (1.78) 0.8317 (2.16) 52.4268 (1.78) 1.2526 (2.27) 8;0 19.0913 (0.56) 20 1 --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ``` #### After ``` --------------------------------------------------------------------------------- benchmark 'q5-parts-1': 4 tests --------------------------------------------------------------------------------- Name (time in ms) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- test_tpch[1-in-memory-native-5] 13.8393 (1.0) 16.2712 (1.0) 14.4352 (1.0) 0.5322 (2.13) 14.3212 (1.0) 0.5084 (1.71) 9;5 69.2750 (1.0) 62 1 test_tpch[1-in-memory-python-5] 28.3915 (2.05) 29.5304 (1.81) 28.8477 (2.00) 0.2501 (1.0) 28.8299 (2.01) 0.2971 (1.0) 10;1 34.6649 (0.50) 35 1 test_tpch[1-parquet-native-5] 34.3952 (2.49) 36.4230 (2.24) 35.4314 (2.45) 0.4543 (1.82) 35.3678 (2.47) 0.5042 (1.70) 8;1 28.2235 (0.41) 28 1 test_tpch[1-parquet-python-5] 55.7339 (4.03) 57.3051 (3.52) 56.4564 (3.91) 0.5101 (2.04) 56.2507 (3.93) 0.7570 (2.55) 4;0 17.7128 (0.26) 18 1 --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ``` ### Q8 #### Before ``` --------------------------------------------------------------------------------- benchmark 'q8-parts-1': 4 tests --------------------------------------------------------------------------------- Name (time in ms) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- test_tpch[1-in-memory-python-8] 14.9532 (1.0) 17.2992 (1.0) 16.1906 (1.0) 0.5962 (1.22) 16.1370 (1.0) 0.8323 (1.13) 23;0 61.7642 (1.0) 60 1 test_tpch[1-parquet-python-8] 34.7310 (2.32) 52.5183 (3.04) 37.1660 (2.30) 3.2354 (6.61) 36.5454 (2.26) 2.2469 (3.04) 1;1 26.9063 (0.44) 28 1 test_tpch[1-in-memory-native-8] 44.0259 (2.94) 46.0576 (2.66) 45.0905 (2.78) 0.4898 (1.0) 45.0528 (2.79) 0.7380 (1.0) 5;0 22.1776 (0.36) 22 1 test_tpch[1-parquet-native-8] 69.8245 (4.67) 73.1332 (4.23) 71.0333 (4.39) 0.8421 (1.72) 70.8515 (4.39) 0.9827 (1.33) 3;1 14.0779 (0.23) 14 1 --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ``` #### After ``` --------------------------------------------------------------------------------- benchmark 'q8-parts-1': 4 tests ---------------------------------------------------------------------------------- Name (time in ms) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- test_tpch[1-in-memory-native-8] 7.2145 (1.0) 8.5953 (1.0) 7.7980 (1.0) 0.3170 (1.0) 7.8074 (1.0) 0.4959 (1.13) 44;0 128.2373 (1.0) 118 1 test_tpch[1-in-memory-python-8] 15.3458 (2.13) 17.6777 (2.06) 16.3485 (2.10) 0.5217 (1.65) 16.5234 (2.12) 0.7077 (1.62) 17;0 61.1678 (0.48) 57 1 test_tpch[1-parquet-native-8] 31.8017 (4.41) 33.5568 (3.90) 32.4955 (4.17) 0.4402 (1.39) 32.4145 (4.15) 0.4376 (1.0) 9;2 30.7735 (0.24) 30 1 test_tpch[1-parquet-python-8] 46.2703 (6.41) 48.3345 (5.62) 47.6753 (6.11) 0.5142 (1.62) 47.7092 (6.11) 0.4509 (1.03) 6;2 20.9752 (0.16) 21 1 ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ``` ### Q19 #### Before ``` ------------------------------------------------------------------------------------ benchmark 'q19-parts-1': 4 tests ----------------------------------------------------------------------------------- Name (time in ms) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- test_tpch[1-in-memory-python-19] 277.9533 (1.0) 281.5647 (1.0) 279.8556 (1.0) 1.4607 (1.0) 279.4186 (1.0) 2.2466 (1.0) 2;0 3.5733 (1.0) 5 1 test_tpch[1-parquet-python-19] 311.1196 (1.12) 317.8069 (1.13) 315.6200 (1.13) 2.6611 (1.82) 316.7545 (1.13) 2.8849 (1.28) 1;0 3.1684 (0.89) 5 1 test_tpch[1-in-memory-native-19] 431.2738 (1.55) 464.2194 (1.65) 442.1488 (1.58) 13.3136 (9.11) 436.8320 (1.56) 16.3197 (7.26) 1;0 2.2617 (0.63) 5 1 test_tpch[1-parquet-native-19] 455.3492 (1.64) 460.8460 (1.64) 458.0333 (1.64) 2.1169 (1.45) 457.4410 (1.64) 3.0005 (1.34) 2;0 2.1832 (0.61) 5 1 --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ``` #### After ``` ------------------------------------------------------------------------------------ benchmark 'q19-parts-1': 4 tests ----------------------------------------------------------------------------------- Name (time in ms) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- test_tpch[1-in-memory-native-19] 62.6100 (1.0) 71.2192 (1.0) 66.0757 (1.0) 2.5634 (1.0) 65.6719 (1.0) 1.5398 (1.0) 6;4 15.1342 (1.0) 15 1 test_tpch[1-parquet-native-19] 94.8984 (1.52) 134.7584 (1.89) 103.1099 (1.56) 12.5712 (4.90) 97.7583 (1.49) 7.3370 (4.76) 1;1 9.6984 (0.64) 9 1 test_tpch[1-in-memory-python-19] 284.6653 (4.55) 295.5558 (4.15) 289.7268 (4.38) 3.9986 (1.56) 288.6982 (4.40) 4.6399 (3.01) 2;0 3.4515 (0.23) 5 1 test_tpch[1-parquet-python-19] 308.9599 (4.93) 319.0440 (4.48) 314.7801 (4.76) 4.2088 (1.64) 315.3274 (4.80) 7.0198 (4.56) 2;0 3.1768 (0.21) 5 1 --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ``` --------- Co-authored-by: Sammy Sidhu <[email protected]>
- Loading branch information