Skip to content

Kartothek v4.0.0

Compare
Choose a tag to compare
@github-actions github-actions released this 17 Mar 17:04
08a8094

Kartothek 4.0.0 (2021-03-17)

This is a major release of kartothek with breaking API changes.

  • Removal of complex user input (see gh427)
  • Removal of multi table feature
  • Removal of [kartothek.io.merge]{.title-ref} module
  • class ~kartothek.core.dataset.DatasetMetadata{.interpreted-text
    role="class"} now has an attribute called [schema]{.title-ref} which
    replaces the previous attribute [table_meta]{.title-ref} and returns
    only a single schema
  • All outputs which previously returned a sequence of dictionaries
    where each key-value pair would correspond to a table-data pair now
    returns only one pandas.DataFrame{.interpreted-text role="class"}
  • All read pipelines will now automatically infer the table to read
    such that it is no longer necessary to provide [table]{.title-ref}
    or [table_name]{.title-ref} as an input argument
  • All writing pipelines which previously supported a complex user
    input type now expose an argument [table_name]{.title-ref} which can
    be used to continue usage of legacy datasets (i.e. datasets with an
    intrinsic, non-trivial table name). This usage is discouraged and we
    recommend users to migrate to a default table name (i.e. leave it
    None / [table]{.title-ref})
  • All pipelines which previously accepted an argument
    [tables]{.title-ref} to select the subset of tables to load no
    longer accept this keyword. Instead the to-be-loaded table will be
    inferred
  • Trying to read a multi-tabled dataset will now cause an exception
    telling users that this is no longer supported with kartothek 4.0
  • The dict schema for
    ~kartothek.core.dataset.DatasetMetadataBase.to_dict{.interpreted-text
    role="meth"} and
    ~kartothek.core.dataset.DatasetMetadata.from_dict{.interpreted-text
    role="meth"} changed replacing a dictionary in
    [table_meta]{.title-ref} with the simple [schema]{.title-ref}
  • All pipeline arguments which previously accepted a dictionary of
    sequences to describe a table specific subset of columns now accept
    plain sequences (e.g. [columns]{.title-ref},
    [categoricals]{.title-ref})
  • Remove the following list of deprecated arguments for io pipelines
    • label_filter
    • central_partition_metadata
    • load_dynamic_metadata
    • load_dataset_metadata
    • concat_partitions_on_primary_index
  • Remove [output_dataset_uuid]{.title-ref} and
    [df_serializer]{.title-ref} from
    kartothek.io.eager.commit_dataset{.interpreted-text role="func"}
    since these arguments didn't have any effect
  • Remove [metadata]{.title-ref}, [df_serializer]{.title-ref},
    [overwrite]{.title-ref}, [metadata_merger]{.title-ref} from
    kartothek.io.eager.write_single_partition{.interpreted-text
    role="func"}
  • ~kartothek.io.eager.store_dataframes_as_dataset{.interpreted-text
    role="func"} now requires a list as an input
  • Default value for argument [date_as_object]{.title-ref} is now
    universally set to True. The behaviour for [False]{.title-ref}
    will be deprecated and removed in the next major release
  • No longer allow to pass [delete_scope]{.title-ref} as a delayed
    object to
    ~kartothek.io.dask.dataframe.update_dataset_from_ddf{.interpreted-text
    role="func"}
  • ~kartothek.io.dask.dataframe.update_dataset_from_ddf{.interpreted-text
    role="func"} and
    ~kartothek.io.dask.dataframe.store_dataset_from_ddf{.interpreted-text
    role="func"} now return a [dd.core.Scalar]{.title-ref} object. This
    enables all [dask.DataFrame]{.title-ref} graph optimizations by
    default.
  • Remove argument [table_name]{.title-ref} from
    ~kartothek.io.dask.dataframe.collect_dataset_metadata{.interpreted-text
    role="func"}