This release consists of 403 commits from 96 contributors. See credits at the end of this changelog for more information.
Breaking changes:
- Remove Arc wrapping from create_udf's return_type #12489 (findepi)
- Make make_scalar_function() result candidate for inlining, by removing the
Arc
#12477 (findepi) - Bump MSRV to 1.78 #12398 (comphead)
- fix: DataFusion panics with "No candidates provided" #12469 (Weijun-H)
- Implement PartialOrd for Expr and sub fields/structs without using hash values #12481 (ngli-me)
- Add
field
trait method toWindowUDFImpl
, removereturn_type
/nullable
#12374 (jcsherin) - parquet: Make page_index/pushdown metrics consistent with row_group metrics #12545 (progval)
- Make SessionContext::enable_url_table consume self #12573 (alamb)
- LexRequirement as a struct, instead of a type #12583 (ngli-me)
- Require
Debug
forAnalyzerRule
,FunctionRewriter
, andOptimizerRule
#12556 (alamb) - Require
Debug
forTableProvider
,TableProviderFactory
andPartitionStream
#12557 (alamb) - Require
Debug
forPhysicalOptimizerRule
#12624 (AnthonyZhOon) - Rename aggregation modules, GroupColumn #12619 (alamb)
- Update
register_table
functions args to takeInto<TableReference>
#12630 (JasonLi-cn) - Derive
Debug
forSessionStateBuilder
, addingDebug
requirements to fields #12632 (AnthonyZhOon) - Support REPLACE INTO for INSERT statements #12516 (fmeringdal)
- Add
PartitionEvaluatorArgs
toWindowUDFImpl::partition_evaluator
#12804 (jcsherin) - Convert
rank
/dense_rank
andpercent_rank
builtin functions to UDWF #12718 (jatin510) - Bug-fix: MemoryExec sort expressions do NOT refer to the projected schema #12876 (berkaysynnada)
- Minor: add flags for temporary ddl #12561 (hailelagi)
- Convert
BuiltInWindowFunction::{Lead, Lag}
to a user defined window function #12857 (jcsherin) - Improve performance for physical plan creation with many columns #12950 (askalt)
- Improve recursive
unnest
options API #12836 (duongcongtoai) - fix(substrait): disallow union with a single input #13023 (tokoko)
- feat: support arbitrary expressions in
LIMIT
plan #13028 (jonahgao) - Remove unused
LogicalPlan::CrossJoin
as it is unused #13076 (buraksenn) - Minor: make
Expr::volatile
infallible #13206 (alamb) - Convert LexOrdering
type
tostruct
. #13146 (ngli-me)
Implemented enhancements:
- feat(unparser): adding alias for table scan filter in sql unparser #12453 (Lordworms)
- feat(substrait): set ProjectRel output_mapping in producer #12495 (vbarua)
- feat:Support applying parquet bloom filters to StringView columns #12503 (my-vegetable-has-exploded)
- feat: Support adding a single new table factory to SessionStateBuilder #12563 (Weijun-H)
- feat(planner): Allowing setting sort order of parquet files without specifying the schema #12466 (devanbenz)
- feat: add support for Substrait ExtendedExpression #12728 (westonpace)
- feat(substrait): add intersect support to consumer #12830 (tokoko)
- feat: Implement grouping function using grouping id #12704 (eejbyfeldt)
- feat(substrait): add set operations to consumer, update substrait to
0.45.0
#12863 (tokoko) - feat(substrait): add wildcard handling to producer #12987 (tokoko)
- feat: Add regexp_count function #12970 (Omega359)
- feat: Decorrelate more predicate subqueries #12945 (eejbyfeldt)
- feat: Run (logical) optimizers on subqueries #13066 (eejbyfeldt)
- feat: Convert CumeDist to UDWF #13051 (jonathanc-n)
- feat: Migrate Map Functions #13047 (jonathanc-n)
- feat: improve type inference for WindowFrame #13059 (notfilippo)
- feat: Move subquery check from analyzer to PullUpCorrelatedExpr (Fix TPC-DS q41) #13091 (eejbyfeldt)
- feat: Add
Date32
/Date64
in aggregate fuzz testing #13041 (LeslieKid) - feat(substrait): support order_by in aggregate functions #13114 (bvolpato)
- feat: Support Substrait's IntervalCompound type/literal instead of interval-month-day-nano UDT #12112 (Blizzara)
- feat: Implement LeftMark join to fix subquery correctness issue #13134 (eejbyfeldt)
- feat: support logical plan for
EXECUTE
statement #13194 (jonahgao) - feat(substrait): handle emit_kind when consuming Substrait plans #13127 (vbarua)
- feat(substrait): AggregateRel grouping_expressions support #13173 (akoshchiy)
Fixed bugs:
- fix: Panic/correctness issue in variance GroupsAccumulator #12615 (eejbyfeldt)
- fix: coalesce schema issues #12308 (mesejo)
- fix: Correct results for grouping sets when columns contain nulls #12571 (eejbyfeldt)
- fix(substrait): remove optimize calls from substrait consumer #12800 (tokoko)
- fix(substrait): consuming AggregateRel as last node #12875 (tokoko)
- fix: Update TO_DATE, TO_TIMESTAMP scalar functions to support LargeUtf8, Utf8View #12929 (Omega359)
- fix: Add Int32 type override for Dialects #12916 (peasee)
- fix: using simple string match replace regex match for contains udf #12931 (zhuliquan)
- fix: Dialect requires derived table alias #12994 (peasee)
- fix: join swap for projected semi/anti joins #13022 (korowa)
- fix: Verify supported type for Unary::Plus in sql planner #13019 (eejbyfeldt)
- fix: Do NOT preserve names (aliases) of Exprs for simplification in TableScan filters #13048 (eejbyfeldt)
- fix: planning of prepare statement with limit clause #13088 (jonahgao)
- fix: add missing
NotExpr::evaluate_bounds
#13082 (crepererum) - fix: Order by mentioning missing column multiple times #13158 (eejbyfeldt)
- fix: import JoinTestType without triggering unused_qualifications lint #13170 (smarticen)
- fix: default UDWFImpl::expressions returns all expressions #13169 (Michael-J-Ward)
- fix: date_bin() on timstamps before 1970 #13204 (mhilton)
- fix: array_resize null fix #13209 (jonathanc-n)
- fix: CSV Infer Schema now properly supports escaped characters. #13214 (mnorfolk03)
Documentation updates:
- chore: Prepare 42.0.0 Release #12465 (andygrove)
- Minor: improve ParquetOpener docs #12456 (alamb)
- Improve doc wording around scalar authoring #12478 (findepi)
- Minor: improve
GroupsAccumulator
docs #12501 (alamb) - Minor: improve
GroupsAccumulatorAdapter
docs #12502 (alamb) - Improve flamegraph profiling instructions #12521 (alamb)
- docs: 📝 Add expected answers to
DataFrame
method examples #12564 (Eason0729) - parquet: Add finer metrics on operations covered by
time_elapsed_opening
#12585 (progval) - Update scalar_functions.md #12627 (Abdullahsab3)
- Move
kurtosis_pop
to datafusion-functions-extra and out of core #12647 (dharanad) - Update introduction.md for
blaze
project #12577 (liyuance) - docs: improve the documentation for Aggregate code #12617 (alamb)
- doc: Fix malformed hex string literal in user guide #12708 (kawadakk)
- docs: Update DataFusion introduction to clarify that DataFusion does provide an "out of the box" query engine #12666 (andygrove)
- Framework for generating function docs from embedded code documentation #12668 (Omega359)
- Fix misformatted links on project index page #12750 (amoeba)
- Add
DocumentationBuilder::with_standard_argument
to reduce copy/paste #12747 (alamb) - Minor: doc how field name is to be set for
WindowUDF
#12757 (jcsherin) - Port / Add Documentation for
VarianceSample
andVariancePopulation
#12742 (alamb) - Transformed::new_transformed: Fix documentation formatting #12787 (progval)
- Migrate documentation for all string functions from scalar_functions.md to code #12775 (Omega359)
- Minor: add README to Catalog Folder #12797 (jonathanc-n)
- Remove redundant aggregate/window/scalar function documentation #12745 (alamb)
- Improve description of function migration #12743 (alamb)
- Crypto Function Migration #12840 (jonathanc-n)
- Minor: more doc to
MemoryPool
module #12849 (2010YOUY01) - Migrate documentation for all core functions from scalar_functions.md to code #12854 (Omega359)
- Migrate documentation for Aggregate Functions to code #12861 (jonathanc-n)
- Wordsmith project description #12778 (matthewmturner)
- Migrate Regex Functions from static docs #12886 (jonathanc-n)
- Migrate documentation for all math functions from scalar_functions.md to code #12908 (juroberttyb)
- Combine the logic of rank, dense_rank and percent_rank udwf to reduce duplications #12893 (jatin510)
- Migrate Array function Documentation to code #12948 (jonathanc-n)
- Minor: fix Aggregation Docs from review #12880 (jonathanc-n)
- Minor: expr-doc small fixes #12960 (jonathanc-n)
- docs: Add documentation about conventional commits #12971 (andygrove)
- Migrate datetime documentation to code #12966 (jatin510)
- Fix CI on main ( regenerate function docs) #12991 (alamb)
- Split output batches of joins that do not respect batch size #12969 (alihan-synnada)
- Minor: Fixed regexpr_match docs #13008 (jonathanc-n)
- Minor: Fix spelling in regexpr_count docs #13014 (jonathanc-n)
- Update version to 42.1.0, add CHANGELOG (#12986) #12989 (alamb)
- Added expresion to "with_standard_argument" #12926 (jonathanc-n)
- Migrate documentation for
regr*
aggregate functions to code #12871 (alamb) - Minor: Add documentation for
cot
#13069 (alamb) - Documentation: Add API deprecation policy #13083 (comphead)
- docs: Fixed generate_series docs #13097 (jonathanc-n)
- [docs]: migrate lead/lag window function docs to new docs #13095 (buraksenn)
- minor: Add deprecated policy to the contributor guide contents #13100 (comphead)
- Introduce
binary_as_string
parquet option, upgrade to arrow/parquet53.2.0
#12816 (goldmedal) - Convert
ntile
builtIn function to UDWF #13040 (jatin510) - docs: Added Special Functions Page #13102 (jonathanc-n)
- [docs]: added
alternative_syntax
function for docs #13140 (jonathanc-n) - Minor: Delete old cume_dist and percent_rank docs #13137 (jonathanc-n)
- docs: Add alternative syntax for extract, trim and substring. #13143 (Omega359)
- docs: switch completely to generated docs for scalar and aggregate functions #13161 (Omega359)
- Minor: improve testing docs, mention
cargo nextest
#13160 (alamb) - minor: Update HOWTO to help with updating new docs #13172 (jonathanc-n)
- Add config option
skip_physical_aggregate_schema_check
#13176 (alamb) - Enable reading
StringViewArray
by default from Parquet (8% improvement for entire ClickBench suite) #13101 (alamb) - Forward port changes for
42.2.0
release (#13191) #13193 (alamb) - [minor] overload from_unixtime func to have optional timezone parameter #13130 (buraksenn)
Other:
- Impl
convert_to_state
forGroupsAccumulatorAdapter
(faster median for high cardinality aggregates) #11827 (Rachelint) - Upgrade sqlparser-rs to 0.51.0, support new interval logic from
sqlparse-rs
#12222 (samuelcolvin) - Do not silently ignore unsupported
CREATE TABLE
andCREATE VIEW
syntax #12450 (alamb) - use FileFormat::get_ext as the default file extension filter #12417 (waruto210)
- fix interval units parsing #12448 (samuelcolvin)
- test(substrait): update TPCH tests #12462 (vbarua)
- Add "Extended Clickbench" benchmark for median and approx_median for high cardinality aggregates #12438 (alamb)
- date_trunc small update for readability #12479 (findepi)
- cleanup
array_has
#12460 (samuelcolvin) - chore: bump chrono to 0.4.38 #12485 (my-vegetable-has-exploded)
- Remove deprecated ScalarUDF::new #12487 (findepi)
- Remove deprecated config setup functions #12486 (findepi)
- Remove unnecessary shifts in gcd() #12480 (findepi)
- Return TableProviderFilterPushDown::Exact when Parquet Pushdown Enabled #12135 (itsjunetime)
- Update substrait requirement from 0.41 to 0.42,
prost-build
to0.13.2
#12483 (dependabot[bot]) - Faster strpos() string function for ASCII-only case #12401 (goldmedal)
- Specialize ASCII case for substr() #12444 (2010YOUY01)
- Improve SQLite subquery tables aliasing unparsing #12482 (sgrebnov)
- Minor: use Option rather than Result for not found suggestion #12512 (alamb)
- Remove deprecated datafusion_physical_expr::functions module #12505 (findepi)
- Remove deprecated AggregateUDF::new #12508 (findepi)
- Make
required_guarantees
output to be deterministic #12484 (austin362667) - Deprecate unused ScalarUDF::fun #12506 (findepi)
- Remove deprecated WindowUDF::new #12507 (findepi)
- Preserve the order of right table in NestedLoopJoinExec #12504 (alihan-synnada)
- Improve benchmark for ltrim #12513 (Rachelint)
- Fix: check ambiguous column reference #12467 (HuSen8891)
- Minor: move imports to top in
row_hash.rs
#12530 (Rachelint) - tests: Fix typo in config setting name #12535 (progval)
- Expose DataFrame select_exprs method #12520 (milenkovicm)
- Replace some usages of
Expr::to_field
withExpr::qualified_name
#12522 (jonahgao) - Bump aws-sdk-sso to 1.43.0, aws-sdk-sts to 1.43.0 and aws-sdk-ssooidc from 1.40.0 to 1.44.0 in /datafusion-cli #12409 (dependabot[bot])
- Fix NestedLoopJoin performance regression #12531 (alihan-synnada)
- Produce informative error message on insert plan type mismatch #12540 (findepi)
- Fix unparse table scan with the projection pushdown #12534 (goldmedal)
- Automate sqllogictest for String, LargeString and StringView behavior #12525 (goldmedal)
- Fix unparsing offset #12539 (Stazer)
- support EXTRACT on intervals and durations #12514 (nrc)
- Support List type coercion for CASE-WHEN-THEN expression #12490 (goldmedal)
- Sort metrics alphabetically in EXPLAIN ANALYZE output #12568 (progval)
- Add
RuntimeEnv::try_new
and deprecateRuntimeEnv::new
#12566 (OussamaSaoudi) - Reorgnize the StringView tests in sqllogictests #12572 (goldmedal)
- fix parquet infer statistics for BinaryView types #12575 (XiangpengHao)
- Minor: add example to of assert_batches_eq #12580 (alamb)
- Use qualified aliases to simplify searching DFSchema #12546 (jonahgao)
- return absent stats when filters are pushed down #12471 (waruto210)
- Minor: add new() function for ParquetReadOptions #12579 (Smith-Cruise)
- make
Debug
forMemoryExec
prettier #12582 (samuelcolvin) - Add
SessionStateBuilder::with_object_store
method #12578 (OussamaSaoudi) - Fix and Improve Sort Pushdown for Nested Loop and Hash Join #12559 (berkaysynnada)
- Add Docs and Examples and helper methods to
PhysicalSortExpr
#12589 (alamb) - Warn instead of error for unused imports #12588 (samuelcolvin)
- Update prost-build requirement from =0.13.2 to =0.13.3 #12587 (dependabot[bot])
- Add JOB benchmark dataset [1/N] (imdb dataset) #12497 (doupache)
- Improve documentation and add
Display
impl toEquivalenceProperties
#12590 (alamb) - physical-plan: Cast nested group values back to dictionary if necessary #12586 (brancz)
- Support
Date32
fordate_trunc
function #12603 (goldmedal) - Avoid RowConverter for multi column grouping (10% faster clickbench queries) #12269 (jayzhan211)
- Refactor to support recursive unnest in physical plan #11577 (duongcongtoai)
- Use original value when comparing with dictionary column in unparser #12610 (Sevenannn)
- Fix to unparse the plan with multiple UNION statements into an SQL string #12605 (goldmedal)
- Keep the float information in scalar_to_sql #12609 (Sevenannn)
- Add Dictionary String (UTF8) type to String sqllogictests #12621 (goldmedal)
- Improve SanityChecker error message #12595 (alamb)
- Improve performance of
trim
for string view (10%) #12395 (Rachelint) - Simplify
update_skip_aggregation_probe
method #12332 (lewiszlw) - Minor: Encapsulate type check in GroupValuesColumn, avoid panic #12620 (alamb)
- Fix sort node deserialization from proto #12626 (palaska)
- Minor: improve documentation to StringView trim #12629 (alamb)
- [MINOR]: Simplifications Sort Operator #12639 (akurmustafa)
- [Minor] Remove redundant member from RepartitionExec #12638 (akurmustafa)
- implement nested identifier access #12614 (Lordworms)
- [MINOR]: Rename get_arrayref_at_indices to take_arrays #12654 (akurmustafa)
- [MINOR]: Use take_arrays in repartition , fix build #12657 (doupache)
- Add binary_view to string_view coercion #12643 (doupache)
- [Minor] Improve error message when bitwise_* operator takes wrong unsupported type #12646 (dharanad)
- Minor: Add github link to code that was upstreamed #12660 (alamb)
- Minor: Improve documentation on execution error handling #12651 (alamb)
- Adds
WindowUDFImpl::reverse_expr
trait method + Support forIGNORE NULLS
#12662 (jcsherin) - Fill in missing
Debug
fields forSessionState
#12663 (AnthonyZhOon) - Minor: add partial assertion for skip aggregation probe #12640 (Rachelint)
- Add more functions for string sqllogictests #12665 (goldmedal)
- Update rstest requirement from 0.22.0 to 0.23.0 #12678 (dependabot[bot])
- Minor: Change LiteralGuarantee try_new to new #12669 (pgwhalen)
- Refactor PrimitiveGroupValueBuilder to use
MaybeNullBufferBuilder
#12623 (alamb) - Add
value_from_statisics
to AggregateUDFImpl, remove special case for min/max/count aggregate statistics #12296 (edmondop) - Provide field and schema metadata missing on distinct aggregations. #12691 (wiedld)
- [MINOR]: Simplify required_input_ordering of BoundedWindowAggExec #12656 (akurmustafa)
- handle 0 and NULL value of NTH_VALUE function #12676 (thinh2)
- Improve documentation for AggregateUDFImpl::value_from_stats #12689 (alamb)
- Add support for external tables with qualified names #12645 (OussamaSaoudi)
- Fix Regex signature types #12690 (blaginin)
- Refactor
ByteGroupValueBuilder
to useMaybeNullBufferBuilder
#12681 (alamb) - Simplify match patterns in coercion rules #12711 (findepi)
- Remove aggregate functions dependency on frontend #12715 (findepi)
- Minor: Remove clone in
transform_to_states
#12707 (jayzhan211) - Refactor tests for union sorting properties, add tests for unions and constants #12702 (alamb)
- Fix: support Qualified Wildcard in count aggregate function #12673 (HuSen8891)
- Reduce code duplication in
PrimitiveGroupValueBuilder
with const generics #12703 (alamb) - Disallow duplicated qualified field names #12608 (eejbyfeldt)
- Optimize base64/hex decoding by pre-allocating output buffers (~2x faster) #12675 (simonvandel)
- Allow DynamicFileCatalog support to query partitioned file #12683 (goldmedal)
- Support
LIMIT
Push-down logical plan optimization forExtension
nodes #12685 (austin362667) - Fix AvroReader: Add union resolving for nested struct arrays #12686 (JonasDev1)
- Adds macros for creating
WindowUDF
andWindowFunction
expression #12693 (jcsherin) - Support unparsing plans with both Aggregation and Window functions #12705 (sgrebnov)
- Fix strpos invocation with dictionary and null #12712 (findepi)
- Add IMDB(JOB) Benchmark [2/N] (imdb queries) #12529 (austin362667)
- Minor: avoid clone while calculating union equivalence properties #12722 (alamb)
- Simplify streaming_merge function parameters #12719 (mertak-synnada)
- Provide field and schema metadata missing on cross joins, and union with null fields. #12729 (wiedld)
- Minor: Update string tests for strpos #12739 (alamb)
- Apply
type_union_resolution
to array and values #12753 (jayzhan211) - fix
equal_to
inPrimitiveGroupValueBuilder
#12758 (Rachelint) - Fix
equal_to
inByteGroupValueBuilder
#12770 (alamb) - Allow boolean Expr simplification even when nullable #12746 (eejbyfeldt)
- Fix unnest conjunction with selecting wildcard expression #12760 (goldmedal)
- Improve
round
scalar function unparsing for Postgres #12744 (sgrebnov) - Fix stack overflow calculating projected orderings #12759 (alamb)
- Upgrade arrow/parquet to
53.1.0
/ fix clippy #12724 (alamb) - Account for constant equivalence properties in union, tests #12562 (alamb)
- Minor: clarify comment about empty dependencies #12786 (alamb)
- Introduce Signature::String and return error if input of
strpos
is integer #12751 (jayzhan211) - Minor: improve docs on MovingMin/MovingMax #12790 (alamb)
- Add union sorting equivalence end to end tests #12721 (alamb)
- Fix bug in TopK aggregates #12766 (avantgardnerio)
- Minor: clean up TODO comments in unnest.slt #12795 (goldmedal)
- Refactor
DependencyMap
andDependencies
into structs #12761 (alamb) - Remove unnecessary
DFSchema::check_ambiguous_name
#12805 (jonahgao) - API from
ParquetExec
toParquetExecBuilder
#12799 (alamb) - Minor: add documentation note about
NullState
#12791 (alamb) - Chore: Move
aggregate statistics
optimizer test from core to optimizer crate #12783 (jayzhan211) - Clarify documentation on ArrowBytesMap and ArrowBytesViewMap #12789 (alamb)
- Bump cookie and express in /datafusion/wasmtest/datafusion-wasm-app #12825 (dependabot[bot])
- Remove unused dependencies and features #12808 (jonahgao)
- Add Aggregation fuzzer framework #12667 (Rachelint)
- Retry apt-get and rustup on CI #12714 (findepi)
- Support creating tables via SQL with
FixedSizeList
column (e.g.a int[3]
) #12810 (jandremarais) - Make HashJoinExec::join_schema public #12807 (progval)
- Fix convert_to_state bug in
GroupsAccumulatorAdapter
#12834 (alamb) - Fix: approx_percentile_cont_with_weight Panic #12823 (jonathanc-n)
- Fix clippy error on wasmtest #12844 (jonahgao)
- Fix panic on wrong number of arguments to substr #12837 (eejbyfeldt)
- Fix Bug in Display for ScalarValue::Struct #12856 (avantgardnerio)
- Support DictionaryString for Regex matching operators #12768 (blaginin)
- Minor: Small comment changes in sql folder #12838 (jonathanc-n)
- Add DuckDB struct test and row as alias #12841 (jayzhan211)
- Support struct coercion in
type_union_resolution
#12839 (jayzhan211) - Added check for aggregate functions in optimizer rules #12860 (jonathanc-n)
- Optimize
iszero
function (3-5x faster) #12881 (simonvandel) - Macro for creating record batch from literal slice #12846 (timsaucer)
- Implement special min/max accumulator for Strings and Binary (10% faster for Clickbench Q28) #12792 (alamb)
- Make PruningPredicate's rewrite public #12850 (adriangb)
- octet_length + string view == ❤️ #12900 (Omega359)
- Remove Expr clones in
select_to_plan
#12887 (jonahgao) - Minor: added to docs in expr folder #12882 (jonathanc-n)
- Print undocumented functions to console while generating docs #12874 (alamb)
- Fix: handle NULL offset of NTH_VALUE window function #12851 (HuSen8891)
- Optimize
signum
function (3-25x faster) #12890 (simonvandel) - re-export PartitionEvaluatorArgs from datafusion_expr::function #12878 (Michael-J-Ward)
- Unparse Sort with pushdown limit to SQL string #12873 (goldmedal)
- Add spilling related metrics for aggregation #12888 (2010YOUY01)
- Move equivalence fuzz testing to fuzz test binary #12767 (alamb)
- Remove unused
math_expressions.rs
#12917 (jonahgao) - Improve AggregationFuzzer error reporting #12832 (alamb)
- Import Arc consistently #12899 (findepi)
- Optimize
isnan
(2-5x faster) #12889 (simonvandel) - Minor: Move StringArrayType, StringViewArrayBuilder, etc outside of string module #12912 (Omega359)
- Remove redundant unsafe in test #12914 (findepi)
- Ensure that math functions fulfil the ColumnarValue contract #12922 (joroKr21)
- Optimization: support push down limit when full join #12963 (JasonLi-cn)
- Implement
GroupColumn
support forStringView
/ByteView
(faster grouping performance) #12809 (Rachelint) - Implement native support StringView for
REGEXP_LIKE
#12897 (tlm365) - Minor: Refactor benchmark imports to use
util
module #12885 (loloxwg) - Fix zero data type in
expr % 1
simplification #12913 (eejbyfeldt) - Optimize performance of
math::cot
(~2x faster) #12910 (tlm365) - Expand wildcard expressions in distinct on #12941 (epsio-banay)
- chores: remove redundant clone #12964 (JasonLi-cn)
- Fix: handle NULL input in lead/lag window function #12811 (HuSen8891)
- Fix logical vs physical schema mismatch for aliased
now()
#12951 (wiedld) - Optimize performance of
math::trunc
(~2.5x faster) #12909 (tlm365) - Minor: Add slt test for
DISTINCT ON
with wildcard #12968 (alamb) - Fix 'Too many open files' on fuzz test. #12961 (dhegberg)
- Increase minimum supported Rust version (MSRV) to 1.79 #12962 (findepi)
- Unparse
SubqueryAlias
without projections to SQL #12896 (goldmedal) - Fix 2 bugs related to push down partition filters #12902 (eejbyfeldt)
- Move TableConstraint to Constraints conversion #12953 (findepi)
- Added current_timestamp alias #12958 (jonathanc-n)
- Improve unparsing for
ORDER BY
,UNION
, Windows functions with Aggregation #12946 (sgrebnov) - Handle one-element array return value in ScalarFunctionExpr #12965 (joroKr21)
- Add links to new_constraint_from_table_constraints doc #12995 (findepi)
- Fix:fix HashJoin projection swap #12967 (my-vegetable-has-exploded)
- refactor(substrait): refactor ReadRel consumer #12983 (tokoko)
- Move SMJ join filtered part out of join_output stage. LeftOuter, LeftSemi #12764 (comphead)
- Remove logical cross join in planning #12985 (Dandandan)
- [MINOR]: Use arrow take_arrays, remove datafusion take_arrays #13013 (akurmustafa)
- Don't preserve functional dependency when generating UNION logical plan #12979 (Sevenannn)
- [Minor]: Add data based sort expression test #12992 (akurmustafa)
- Removed last usages of scalar_inputs, scalar_input_types and inputs2 to use arrow unary/binary for performance #12972 (buraksenn)
- Minor: Update release instructions to include new crates #13024 (alamb)
- Extract CSE logic to
datafusion_common
#13002 (peter-toth) - Enhance table scan unparsing to avoid unnamed subqueries. #13006 (goldmedal)
- Fix count on all null
VALUES
clause #13029 (findepi) - Support filter in cross join elimination #13025 (Dandandan)
- [minor]: remove same util functions from the code base. #13026 (akurmustafa)
- Improve
AggregateFuzz
testing: generate random queries #12847 (alamb) - Fix functions with Volatility::Volatile and parameters #13001 (agscpp)
- refactor: Incorporate RewriteDisjunctivePredicate rule into SimplifyExpressions #13032 (eejbyfeldt)
- Move filtered SMJ right join out of
join_partial
phase #13053 (comphead) - Remove functions and types deprecated since 37 #13056 (findepi)
- Minor: Cleaned physical-plan Comments #13055 (jonathanc-n)
- improve the condition checking for unparsing table_scan #13062 (goldmedal)
- minor: simplify associated item bound of
hash_array_primitive
#13070 (jonahgao) - extended log.rs tests for unary/binary and f32/f64 casting #13034 (buraksenn)
- Fix check_not_null_constraints null detection #13033 (findepi)
- [Minor] Update info/list of TPC-DS queries #13075 (Dandandan)
- Fix logical vs physical schema mismatch for UNION where some inputs are constants #12954 (wiedld)
- Improve CSE stats #13080 (peter-toth)
- Infer data type from schema for
Values
and add struct coercion tocoalesce
#12864 (jayzhan211) - [minor]: use arrow take_batch instead of get_record_batch_indices #13084 (akurmustafa)
- chore: Added a number of physical planning join benchmarks #13085 (mnorfolk03)
- Fix more instances of schema missing metadata #13068 (itsjunetime)
- Bug-fix / Limit with_new_exprs() #13109 (berkaysynnada)
- Minor: doc IMDB in benchmark README #13107 (2010YOUY01)
- removed --prefer_hash_join option from parquet_filter command. #13106 (neyama)
- Make CI error if a function has no documentation #12938 (alamb)
- Allow using
cargo nextest
for running tests #13045 (alamb) - Add benchmark for memory-limited aggregation #13090 (2010YOUY01)
- Add clickbench parquet based queries to sql_planner benchmark #13103 (Omega359)
- Improve documentation and examples for
SchemaAdapterFactory
, makerecord_batch
"hygenic" #13063 (alamb) - Move filtered SMJ Left Anti filtered join out of
join_partial
phase #13111 (comphead) - Improve TableScan with filters pushdown unparsing (multiple filters) #13131 (sgrebnov)
- Raise a plan error on union if column count is not the same between plans #13117 (Omega359)
- Add basic support for
unnest
unparsing #13129 (sgrebnov) - Improve TableScan with filters pushdown unparsing (joins) #13132 (sgrebnov)
- Report offending plan node when In/Exist subquery misused #13155 (findepi)
- Remove unused assert_analyzed_plan_ne test helper #13121 (findepi)
- Fix Utf8View as Join Key #13115 (demetribu)
- Add Support for
modulus
operation in substrait #13108 (LatrecheYasser) - unify cast_to function of ScalarValue #13122 (JasonLi-cn)
- Add unused_qualifications rustic lint with deny lint level. #13086 (dhegberg)
- [Optimization] Infer predicate under all JoinTypes #13081 (JasonLi-cn)
- Support
negate
arithmetic expression in substrait #13112 (LatrecheYasser) - Fix to_char signature ordering #13126 (Omega359)
- chore: re-export functions_window_common::ExpressionArgs #13149 (Michael-J-Ward)
- minor: Fix build on main #13159 (eejbyfeldt)
- minor: Update test case for issue #5771 showing it is resolved #13180 (eejbyfeldt)
- Test LIKE with dynamic pattern #13141 (findepi)
- Increase fuzz testing of streaming group by / low cardinality columns #12990 (alamb)
- FFI initial implementation #12920 (timsaucer)
- Report file location and offset when CSV schema mismatch #13185 (findepi)
- Round robin polling between tied winners in sort preserving merge #13133 (jayzhan211)
- Fix rendering of dictionary empty string values in SLT tests #13198 (findepi)
- Improve push down filter of join #13184 (JasonLi-cn)
- Minor: Reduce indirection for finding changlog #13199 (alamb)
- Support
DictionaryArray
inOVER
clause #13153 (adriangb) - Allow testing records with sibling whitespace in SLT tests and add more string tests #13197 (findepi)
- Use single file write when an extension is present in the path. #13079 (dhegberg)
- Deprecate ScalarUDF::invoke and invoke_no_args for invoke_batch #13179 (findepi)
- consider volatile function in simply_expression #13128 (Lordworms)
- Fix CI compile failure due to merge conflict #13219 (alamb)
- Revert "Improve push down filter of join (#13184)" #13229 (eejbyfeldt)
- Derive
Clone
for more ExecutionPlans #13203 (alamb) - feat(logical-types): add NativeType and LogicalType #12853 (notfilippo)
- Apply projection to
Statistics
inFilterExec
#13187 (alamb) - Minor: make LeftJoinData into a struct in CrossJoinExec #13227 (alamb)
- Deprecate invoke and invoke_no_args in favor of invoke_batch #13174 (findepi)
- Support timestamp(n) SQL type #13231 (findepi)
- Remove elements deprecated since v 38. #13245 (findepi)
Thank you to everyone who contributed to this release. Here is a breakdown of commits (PRs merged) per contributor.
68 Andrew Lamb
34 Piotr Findeisen
24 Jonathan Chen
19 Emil Ejbyfeldt
17 Jax Liu
12 Bruce Ritchie
11 Jonah Gao
9 Jay Zhan
8 Mustafa Akur
8 kamille
7 Sergei Grebnov
7 Tornike Gurgenidze
6 JasonLi
6 Oleks V
6 Val Lorentz
6 jcsherin
5 Burak Şen
5 Samuel Colvin
5 Yongting You
5 dependabot[bot]
4 HuSen
4 Jagdish Parihar
4 Simon Vandel Sillesen
4 wiedld
3 Alihan Çelikcan
3 Andy Grove
3 AnthonyZhOon
3 Austin Liu
3 Berkay Şahin
3 Daniel Hegberg
3 Daniël Heres
3 Lordworms
3 Michael J Ward
3 OussamaSaoudi
3 Qianqian
3 Tai Le Manh
3 Victor Barua
3 doupache
3 ngli-me
3 yi wang
2 Adrian Garcia Badaracco
2 Alex Huang
2 Brent Gardner
2 Dharan Aditya
2 Dmitrii Blaginin
2 Duong Cong Toai
2 Filippo Rossi
2 Georgi Krastev
2 June
2 Max Norfolk
2 Peter Toth
2 Tim Saucer
2 Yasser Latreche
2 peasee
2 waruto
1 Abdullah Sabaa Allil
1 Agaev Guseyn
1 Albert Skalt
1 Andrey Koshchiy
1 Arttu
1 Baris Palaska
1 Bruno Volpato
1 Bryce Mecum
1 Daniel Mesejo
1 Dmitry Bugakov
1 Eason
1 Edmondo Porcu
1 Eduard Karacharov
1 Frederic Branczyk
1 Fredrik Meringdal
1 Haile
1 Jan
1 JonasDev1
1 Justus Flerlage
1 Leslie Su
1 Marco Neumann
1 Marko Milenković
1 Martin Hilton
1 Matthew Turner
1 Nick Cameron
1 Paul
1 Smith Cruise
1 Tomoaki Kawada
1 WeblWabl
1 Weston Pace
1 Xiangpeng Hao
1 Xwg
1 Yuance.Li
1 epsio-banay
1 iamthinh
1 juroberttyb
1 mertak-synnada
1 neyama
1 smarticen
1 zhuliquan
1 张林伟
Thank you also to everyone who contributed in other ways such as filing issues, reviewing PRs, and providing feedback on this release.