Release v0.4.0 · Eventual-Inc/Daft

What's Changed 🚀

build: Use uv for maturin builds instead @raunakab (#3540)

💥 Breaking Changes

feat: Default native runner @colin-ho (#3608)
chore!: upgrade Ray pins and pyarrow pins @jaychia (#3612)
chore!: drop support for Python 3.8 @kevinzwang (#3592)
chore!: remove pyarrow-based file reader @kevinzwang (#3587)

✨ Features

feat: Default native runner @colin-ho (#3608)
feat(swordfish): Progress Bar @colin-ho (#3571)
feat(connect): df.show @universalmind303 (#3560)
feat(connect): support DdlParse @andrewgazelka (#3580)
feat(swordfish): Optimize grouped aggregations @colin-ho (#3534)
feat(swordfish): Enable left/right joins to build probe table on either side @colin-ho (#3548)
feat: Add DataType inference from Python types @jaychia (#3555)
feat(shuffles): Locality aware pre shuffle merge @colin-ho (#3505)
feat: Implement count-distinct for sql @raunakab (#3553)
feat(connect): add drop support @andrewgazelka (#3345)
feat: support for basic subquery execution @kevinzwang (#3536)
feat(connect): add df.filter @andrewgazelka (#3346)
feat: Make serialization code not unwrap and panic on failures @raunakab (#3546)
feat: Unity Catalog writes using daft.DataFrame.write_deltalake() @anilmenon14 (#3522)
feat(connect): add parquet support @andrewgazelka (#3360)
feat: Add iterators to more types @raunakab (#3539)
feat(optimizer): Add scaffolding to create join graphs from logical plans @desmondcheongzx (#3501)
feat(tpcds-benchmarking): Add basic tpcds benchmarking for local testing @raunakab (#3509)
feat(list): add fixed-size list support for value_counts @andrewgazelka (#3521)
feat(parquet): Limit parallel tasks in remote parquet reader @colin-ho (#3490)
feat(parquet): Target parquet writes by size bytes instead of rows @colin-ho (#3457)
feat: cross join @kevinzwang (#3437)
[FEAT] connect: remove excessive warnings from spark connect @universalmind303 (#3499)
[CHORE] connect, test: df.withColumn @andrewgazelka (#3359)
[FEAT]: expr simplifier @universalmind303 (#3393)
[FEAT] shuffle testing @raunakab (#3492)
[FEAT]: add coalesce to dataframe and SQL @universalmind303 (#3482)
[FEAT] add register-table helper to sql-catalog @chuanlei-coding (#2837)
[FEAT] Respect resource request for projections in swordfish @colin-ho (#3460)
[FEAT] Enable Actor Pool UDFs by default @kevinzwang (#3488)
[FEAT] connect: add modulus operator and withColumns support @andrewgazelka (#3351)
[FEAT] connect: createDataFrame @andrewgazelka (#3363)
[FEAT] Support parquet RLE decoding for booleans @desmondcheongzx (#3477)
[FEAT] Cap parallelism on local parquet reader @colin-ho (#3310)
[FEAT] connect: add binary operators @andrewgazelka (#3350)
[FEAT] connect: support basic column operations @andrewgazelka (#3362)
[FEAT] extend build-commit workflow to support different compile-archs @raunakab (#3459)
[FEAT] Add count-distinct aggregation @raunakab (#3455)

🐛 Bug Fixes

fix(udf): udf call with empty table and batch size @kevinzwang (#3604)
fix: use arrow's schema instead of spark's for local rel @universalmind303 (#3602)
fix: guard concurrent extension datatype setting with a lock @jaychia (#3589)
fix(parquet): Fix parquet reads of required fields nested within optional fields @desmondcheongzx (#3598)
fix: boolean and/or expressions with null @kevinzwang (#3544)
fix(run-cluster-workflow): Add null check when parsing metadata @raunakab (#3507)
fix(tpcds): fix bugs in tpcds datagen script @universalmind303 (#3495)
[BUG] Fix build commit workflow @raunakab (#3487)
[BUG]: dont panic on count(distinct) @universalmind303 (#3481)
[BUG] Block on parquet schema future in estimate_size_bytes @colin-ho (#3484)

🚀 Performance

perf: filter null join key optimization rule @kevinzwang (#3583)
perf: lazily import pyiceberg and unity catalog if available @jaychia (#3565)

♻️ Refactor

refactor: allow InMemory to take in non python based entries @universalmind303 (#3554)
refactor: create a rust based PartitionSet @universalmind303 (#3515)
refactor(swordfish): Generic broadcast state bridge @colin-ho (#3508)

📖 Documentation

docs: update tpch benchmark link @ccmao1130 (#3542)
docs: Enable Linting of docstrings @samster25 (#3506)
[FEAT] Enable Actor Pool UDFs by default @kevinzwang (#3488)

✅ Tests

test(connect): add more tests for createDataFrame @andrewgazelka (#3607)
test: Add more size estimation tests from our s3 bucket @jaychia (#3514)

👷 CI

ci: Always download logs @jaychia (#3588)
ci: Add ability to array-ify args and run multiple jobs @raunakab (#3584)
ci: Add "build" label type to accepted PR titles @raunakab (#3541)
ci: add a tool to launch workloads on cluster @jaychia (#3516)
ci(release-drafter): use conventional commit labels @andrewgazelka (#3503)

🔧 Maintenance

chore!: upgrade Ray pins and pyarrow pins @jaychia (#3612)
chore: add warning for native runner @jaychia (#3613)
chore!: drop support for Python 3.8 @kevinzwang (#3592)
chore!: remove pyarrow-based file reader @kevinzwang (#3587)
chore: Fix ordering in sql tests + pin docker images in read_sql tests @colin-ho (#3596)
chore: move symbolic and boolean algebra code into new crate @kevinzwang (#3570)
[CHORE] use conventional commits @andrewgazelka (#3493)
[CHORE] connect, test: df.withColumn @andrewgazelka (#3359)
[CHORE] Add tests for parquet size estimations @jaychia (#3405)
[CHORE] Move all python wrapping logic to separate module @raunakab (#3458)

Full Changelog: v0.3.15...v0.3.16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.4.0

What's Changed 🚀

💥 Breaking Changes

✨ Features

🐛 Bug Fixes

🚀 Performance

♻️ Refactor

📖 Documentation

✅ Tests

👷 CI

🔧 Maintenance

Contributors