Releases: JDASoftwareGroup/kartothek
Releases · JDASoftwareGroup/kartothek
Kartothek v3.19.0
Version 3.19.0 (2021-02-12)
- Fix an issue where updates on cubes or updates on datatsets using
dask.dataframe might not update all secondary indices, resulting in
a corrupt state after the update - Expose compression type and row group chunk size in Cube interface
via optional parameter of type
~kartothek.serialization.ParquetSerializer
{.interpreted-text
role="class"}. - Add retries to
~kartothek.serialization._parquet.ParquetSerializer.restore_dataframe
{.interpreted-text
role="func"} IOErrors on long running ktk + dask tasks have been
observed. Until the root cause is fixed, the serialization is
retried to gain more stability.
Kartothek v3.18.0
Version 3.18.0 (2021-01-25)
- Add
cube.suppress_index_on
to switch off the default index
creation for dimension columns - Fixed the import issue of zstd module for [kartothek.core
_zmsgpack]{.title-ref}. - Fix a bug in
~kartothek.io_components.read.dispatch_metapartitions_from_factory
{.interpreted-text
role="func"} where [dispatch_by=[]]{.title-ref} would be treated
like [dispatch_by=None]{.title-ref}, not merging all dataset
partitions into a single partitions.
Kartothek v3.17.3
Version 3.17.3 (2010-12-04)
- Allow
pyarrow==2
as a dependency.
Kartothek v3.17.2
Version 3.17.2 (2020-12-01)
- #378 Improve logging information for potential buffer serialization
errors
Kartothek v3.17.1
Version 3.17.1 (2020-11-24)
Bugfixes
- Fix GitHub #375 by loosening checks of the supplied store argument
Kartothek v3.17.0
Version 3.17.0 (2020-11-23)
Improvements
- Improve performance for "in" predicate literals using long object
lists as values ~kartothek.io.eager.commit_dataset
{.interpreted-text role="func"}
now allows to modify the user metadata without adding new data.
Bugfixes
- Fix an issue where
~kartothek.io.dask.dataframe.collect_dataset_metadata
{.interpreted-text
role="func"} would return improper rowgroup statistics - Fix an issue where
~kartothek.io.dask.dataframe.collect_dataset_metadata
{.interpreted-text
role="func"} would executeget_parquet_metadata
at graph
construction time - Fix a bug in
kartothek.io.eager_cube.remove_partitions
{.interpreted-text
role="func"} where all partitions were removed instead of non at
all. - Fix a bug in
~kartothek.core.dataset.DatasetMetadataBase.get_indices_as_dataframe
{.interpreted-text
role="meth"} which would raise anIndexError
if indices were empty
or had not been loaded
Kartothek v3.16.0
Version 3.16.0 (2020-09-29)
New functionality
- Allow filtering of nans using "==", "!=" and "in" operators
Bugfixes
- Fix a regression which would not allow the usage of non serializable
stores even when using factories
Kartothek 3.15.1
Version 3.15.1 (2020-09-28)
Note: Identical to 3.15.0 but with a fix in packaging
New functionality
- Add
~kartothek.io.dask.dataframe.store_dataset_from_ddf
{.interpreted-text
role="func"} to offer write support of a dask dataframe without
update support. This forbids or explicitly allows overwrites and
does not update existing datasets. - The
sort_partitions_by
feature now supports multiple columns.
While this has only marginal effect for predicate pushdown, it may
be used to improve the parquet compression. build_cube_from_dataframe
now supports theshuffle
methods
offered by
~kartothek.io.dask.dataframe.store_dataset_from_ddf
{.interpreted-text
role="func"} and
~kartothek.io.dask.dataframe.update_dataset_from_ddf
{.interpreted-text
role="func"} but writes the output in the cube format
Improvements
- Reduce memory consumption during index write.
- Allow [simplekv]{.title-ref} stores and [storefact]{.title-ref} URLs
to be passed explicitly as input for the [store]{.title-ref}
arguments
Kartothek v3.15.0
Version 3.15.0 (2020-09-28)
New functionality
- Add
~kartothek.io.dask.dataframe.store_dataset_from_ddf
{.interpreted-text
role="func"} to offer write support of a dask dataframe without
update support. This forbids or explicitly allows overwrites and
does not update existing datasets. - The
sort_partitions_by
feature now supports multiple columns.
While this has only marginal effect for predicate pushdown, it may
be used to improve the parquet compression. build_cube_from_dataframe
now supports theshuffle
methods
offered by
~kartothek.io.dask.dataframe.store_dataset_from_ddf
{.interpreted-text
role="func"} and
~kartothek.io.dask.dataframe.update_dataset_from_ddf
{.interpreted-text
role="func"} but writes the output in the cube format
Improvements
- Reduce memory consumption during index write.
- Allow [simplekv]{.title-ref} stores and [storefact]{.title-ref} URLs
to be passed explicitly as input for the [store]{.title-ref}
arguments
Kartothek 3.14.0
Version 3.14.0 (2020-08-27)
New functionality
- Add
hash_dataset
functionality
Improvements
- Expand
pandas
version pin to include 1.1.X - Expand
pyarrow
version pin to include 1.x - Large addition to documentation for multi dataset handling
(Kartothek Cubes)