Skip to content

Releases: JDASoftwareGroup/kartothek

Kartothek v3.19.0

12 Feb 10:10
Compare
Choose a tag to compare

Version 3.19.0 (2021-02-12)

  • Fix an issue where updates on cubes or updates on datatsets using
    dask.dataframe might not update all secondary indices, resulting in
    a corrupt state after the update
  • Expose compression type and row group chunk size in Cube interface
    via optional parameter of type
    ~kartothek.serialization.ParquetSerializer{.interpreted-text
    role="class"}.
  • Add retries to
    ~kartothek.serialization._parquet.ParquetSerializer.restore_dataframe{.interpreted-text
    role="func"} IOErrors on long running ktk + dask tasks have been
    observed. Until the root cause is fixed, the serialization is
    retried to gain more stability.

Kartothek v3.18.0

25 Jan 10:57
5f48231
Compare
Choose a tag to compare

Version 3.18.0 (2021-01-25)

  • Add cube.suppress_index_on to switch off the default index
    creation for dimension columns
  • Fixed the import issue of zstd module for [kartothek.core
    _zmsgpack]{.title-ref}.
  • Fix a bug in
    ~kartothek.io_components.read.dispatch_metapartitions_from_factory{.interpreted-text
    role="func"} where [dispatch_by=[]]{.title-ref} would be treated
    like [dispatch_by=None]{.title-ref}, not merging all dataset
    partitions into a single partitions.

Kartothek v3.17.3

04 Dec 10:22
3a3fa18
Compare
Choose a tag to compare

Version 3.17.3 (2010-12-04)

  • Allow pyarrow==2 as a dependency.

Kartothek v3.17.2

01 Dec 14:57
8329362
Compare
Choose a tag to compare

Version 3.17.2 (2020-12-01)

  • #378 Improve logging information for potential buffer serialization
    errors

Kartothek v3.17.1

24 Nov 15:51
082ace7
Compare
Choose a tag to compare

Version 3.17.1 (2020-11-24)

Bugfixes

  • Fix GitHub #375 by loosening checks of the supplied store argument

Kartothek v3.17.0

23 Nov 09:23
6bd5047
Compare
Choose a tag to compare

Version 3.17.0 (2020-11-23)

Improvements

  • Improve performance for "in" predicate literals using long object
    lists as values
  • ~kartothek.io.eager.commit_dataset{.interpreted-text role="func"}
    now allows to modify the user metadata without adding new data.

Bugfixes

  • Fix an issue where
    ~kartothek.io.dask.dataframe.collect_dataset_metadata{.interpreted-text
    role="func"} would return improper rowgroup statistics
  • Fix an issue where
    ~kartothek.io.dask.dataframe.collect_dataset_metadata{.interpreted-text
    role="func"} would execute get_parquet_metadata at graph
    construction time
  • Fix a bug in
    kartothek.io.eager_cube.remove_partitions{.interpreted-text
    role="func"} where all partitions were removed instead of non at
    all.
  • Fix a bug in
    ~kartothek.core.dataset.DatasetMetadataBase.get_indices_as_dataframe{.interpreted-text
    role="meth"} which would raise an IndexError if indices were empty
    or had not been loaded

Kartothek v3.16.0

29 Sep 15:04
Compare
Choose a tag to compare

Version 3.16.0 (2020-09-29)

New functionality

  • Allow filtering of nans using "==", "!=" and "in" operators

Bugfixes

  • Fix a regression which would not allow the usage of non serializable
    stores even when using factories

Kartothek 3.15.1

28 Sep 16:06
21ddca4
Compare
Choose a tag to compare

Version 3.15.1 (2020-09-28)

Note: Identical to 3.15.0 but with a fix in packaging

New functionality

  • Add
    ~kartothek.io.dask.dataframe.store_dataset_from_ddf{.interpreted-text
    role="func"} to offer write support of a dask dataframe without
    update support. This forbids or explicitly allows overwrites and
    does not update existing datasets.
  • The sort_partitions_by feature now supports multiple columns.
    While this has only marginal effect for predicate pushdown, it may
    be used to improve the parquet compression.
  • build_cube_from_dataframe now supports the shuffle methods
    offered by
    ~kartothek.io.dask.dataframe.store_dataset_from_ddf{.interpreted-text
    role="func"} and
    ~kartothek.io.dask.dataframe.update_dataset_from_ddf{.interpreted-text
    role="func"} but writes the output in the cube format

Improvements

  • Reduce memory consumption during index write.
  • Allow [simplekv]{.title-ref} stores and [storefact]{.title-ref} URLs
    to be passed explicitly as input for the [store]{.title-ref}
    arguments

Kartothek v3.15.0

28 Sep 14:48
Compare
Choose a tag to compare

Version 3.15.0 (2020-09-28)

New functionality

  • Add
    ~kartothek.io.dask.dataframe.store_dataset_from_ddf{.interpreted-text
    role="func"} to offer write support of a dask dataframe without
    update support. This forbids or explicitly allows overwrites and
    does not update existing datasets.
  • The sort_partitions_by feature now supports multiple columns.
    While this has only marginal effect for predicate pushdown, it may
    be used to improve the parquet compression.
  • build_cube_from_dataframe now supports the shuffle methods
    offered by
    ~kartothek.io.dask.dataframe.store_dataset_from_ddf{.interpreted-text
    role="func"} and
    ~kartothek.io.dask.dataframe.update_dataset_from_ddf{.interpreted-text
    role="func"} but writes the output in the cube format

Improvements

  • Reduce memory consumption during index write.
  • Allow [simplekv]{.title-ref} stores and [storefact]{.title-ref} URLs
    to be passed explicitly as input for the [store]{.title-ref}
    arguments

Kartothek 3.14.0

27 Aug 09:31
27d31be
Compare
Choose a tag to compare

Version 3.14.0 (2020-08-27)

New functionality

  • Add hash_dataset functionality

Improvements

  • Expand pandas version pin to include 1.1.X
  • Expand pyarrow version pin to include 1.x
  • Large addition to documentation for multi dataset handling
    (Kartothek Cubes)