Skip to content

Releases: kwai/blaze

v4.0.1

10 Dec 07:08
02082df
Compare
Choose a tag to compare

New Feature

  • Initial supports to ORC input file format.
  • Initial supports to RSS framework and Apache Celeborn shuffle service.

Improvement

  • Optimize AggExec by supporting Implement columnar-based aggregation.
  • Use custom implemented hashmap implement for aggregation.
  • Supports specialized count(0).
  • Optimize bloom filter by reusing same bloom filter in the same executor.
  • Optimize bloom filter by supporting shrinking.
  • Optimize reading parquet files by supporting parallel reading.
  • Improve spill file deletion logics.

Bug fixes

  • Fix file not found for path with url encoded character.
  • Fix Hashaggregate convert job throwing ScalaReflectionException.
  • Fix pruning error while reading parquet files with multiple row groups.
  • Fix incorrect number of tasks due to missing shuffleOrigin.
  • Fix record batch creating error when hash joining with empty input.

Other

  • Upgrade datafusion/arrow dependency to v42/v53.
  • Replace gxhash with foldhash for better compatibility on some hardwares.
  • Other minor improvement & fixes.

What's Changed

New Contributors

Full Changelog: v4.0.0...v4.0.1

v4.0.0

10 Oct 06:16
Compare
Choose a tag to compare

New features

  • supports spark3.0/3.1/3.2/3.3/3.4/3.5.
  • supports integrating with Apache Celeborn.
  • supports native ORC input format.
  • supports bloom filter join introduced in spark 3.5.
  • supports forceShuffledHashJoin for running tpch/tpcds benchmarks.
  • new supported native expression/functions: year, month, day, md5.

Bug fixes

  • add missing UDTF.terminate() invokes.
  • fix NPE while executing some native spark physical plans.

Performance

  • use custom implemented hash table for faster joining, supporting SIMD, bulk searching, memory prefetching, etc.
  • improve shuffle write performance.
  • reuse FSDataInputStream for same input file.

What's Changed

New Contributors

Full Changelog: v3.0.1...v4.0.0

v3.0.1

23 Jul 13:48
6f27604
Compare
Choose a tag to compare

blaze-v3.0.1

Features

  • Supports spark3.0/3.2/3.3.

Performance

fix GetJsonObject conversion, supporting faster get_json_object with sonic-rs.

Bugfix

  • fix childOrderingRequiredTag computation logic.

v3.0.0 [yanked]

01 Jul 07:46
Compare
Choose a tag to compare

blaze-v3.0.0 [yanked]

Features

  • Supports using spark.io.compression.codec for shuffle/broadcast compression
  • Supports date type casting
  • Refactor join implementations to support existence joins and BHJ building hash map on driver side

Performance

  • Fixed performance issues when running on spark3 with default configurations
  • Use cached parquet metadata
  • Refactor native broadcast to avoid duplicated broadcast jobs
  • Supports spark333 batch shuffle reading

Bugfix

  • Fix in_list conversion in from_proto.rs

v2.0.9.1

11 May 03:49
4180741
Compare
Choose a tag to compare
release version 2.0.9.1 (#470)

Co-authored-by: zhangli20 <[email protected]>

v2.0.9

11 Apr 08:49
3fc6838
Compare
Choose a tag to compare

v2.0.9

v2.0.8

02 Feb 12:40
1aecd9c
Compare
Choose a tag to compare

v2.0.8

v2.0.7

09 Nov 11:01
224697d
Compare
Choose a tag to compare
update blaze version 2.0.7-SNAPSHOT (#312)

Co-authored-by: zhangli20 <[email protected]>

v2.0.6

26 Sep 07:14
Compare
Choose a tag to compare
update blaze version 2.0.6-SNAPSHOT