Kartothek 4.0.0 (2021-03-17)

This is a major release of kartothek with breaking API changes.

Removal of complex user input (see gh427)
Removal of multi table feature
Removal of [kartothek.io.merge]{.title-ref} module
class ~kartothek.core.dataset.DatasetMetadata{.interpreted-text
role="class"} now has an attribute called [schema]{.title-ref} which
replaces the previous attribute [table_meta]{.title-ref} and returns
only a single schema
All outputs which previously returned a sequence of dictionaries
where each key-value pair would correspond to a table-data pair now
returns only one pandas.DataFrame{.interpreted-text role="class"}
All read pipelines will now automatically infer the table to read
such that it is no longer necessary to provide [table]{.title-ref}
or [table_name]{.title-ref} as an input argument
All writing pipelines which previously supported a complex user
input type now expose an argument [table_name]{.title-ref} which can
be used to continue usage of legacy datasets (i.e. datasets with an
intrinsic, non-trivial table name). This usage is discouraged and we
recommend users to migrate to a default table name (i.e. leave it
None / [table]{.title-ref})
All pipelines which previously accepted an argument
[tables]{.title-ref} to select the subset of tables to load no
longer accept this keyword. Instead the to-be-loaded table will be
inferred
Trying to read a multi-tabled dataset will now cause an exception
telling users that this is no longer supported with kartothek 4.0
The dict schema for
~kartothek.core.dataset.DatasetMetadataBase.to_dict{.interpreted-text
role="meth"} and
~kartothek.core.dataset.DatasetMetadata.from_dict{.interpreted-text
role="meth"} changed replacing a dictionary in
[table_meta]{.title-ref} with the simple [schema]{.title-ref}
All pipeline arguments which previously accepted a dictionary of
sequences to describe a table specific subset of columns now accept
plain sequences (e.g. [columns]{.title-ref},
[categoricals]{.title-ref})
Remove the following list of deprecated arguments for io pipelines
- label_filter
- central_partition_metadata
- load_dynamic_metadata
- load_dataset_metadata
- concat_partitions_on_primary_index
Remove [output_dataset_uuid]{.title-ref} and
[df_serializer]{.title-ref} from
kartothek.io.eager.commit_dataset{.interpreted-text role="func"}
since these arguments didn't have any effect
Remove [metadata]{.title-ref}, [df_serializer]{.title-ref},
[overwrite]{.title-ref}, [metadata_merger]{.title-ref} from
kartothek.io.eager.write_single_partition{.interpreted-text
role="func"}
~kartothek.io.eager.store_dataframes_as_dataset{.interpreted-text
role="func"} now requires a list as an input
Default value for argument [date_as_object]{.title-ref} is now
universally set to True. The behaviour for [False]{.title-ref}
will be deprecated and removed in the next major release
No longer allow to pass [delete_scope]{.title-ref} as a delayed
object to
~kartothek.io.dask.dataframe.update_dataset_from_ddf{.interpreted-text
role="func"}
~kartothek.io.dask.dataframe.update_dataset_from_ddf{.interpreted-text
role="func"} and
~kartothek.io.dask.dataframe.store_dataset_from_ddf{.interpreted-text
role="func"} now return a [dd.core.Scalar]{.title-ref} object. This
enables all [dask.DataFrame]{.title-ref} graph optimizations by
default.
Remove argument [table_name]{.title-ref} from
~kartothek.io.dask.dataframe.collect_dataset_metadata{.interpreted-text
role="func"}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kartothek v4.0.0

Kartothek 4.0.0 (2021-03-17)