Skip to content

v1.13.0 - 2024-05-15

Compare
Choose a tag to compare
@amontanez24 amontanez24 released this 15 May 20:38
· 181 commits to main since this release

This release adds a utility function called get_random_subset that helps users get a subset of their multi-table data so that modeling can be done quicker. Given a dictionary of table names mapped to DataFrames, metadata, a main table and a desired number of rows to use for the main table, it will subsample the data in a way that maintains referential integrity.

This release also adds two new local file handlers: the CSVHandler and the ExcelHandler. This enables users to easily load from and save synthetic data to these files types. These handlers return data and metadata in the multi-table format, so we also added the function get_table_metadata to get a SingleTableMetadata object from a MultiTableMetadata object.

Finally, this release fixes some bugs that prevented synthesizers from working with data that had numerical column names.

New Features

Bugs Fixed

  • Metadata detection crashes when the column names are integers (AttributeError: 'int' object has no attribute 'lower') - Issue #1933 by @lajohn4747
  • Synthesizers crash when column names are integers (TypeError: unsupported operand) - Issue #1935 by @lajohn4747
  • Switch parameter order in drop_unknown_references - Issue #1944 by @R-Palazzo
  • Unexpected NaN values in sequence_index when dataframe isn't reset - Issue #1973 by @fealho
  • Fix pandas DtypeWarning in download_demo - Issue #1980 by @fealho

Maintenance

  • Only run unit and integration tests on oldest and latest python versions for macos - Issue #1948 by @frances-h

Internal

  • Update code to remove FutureWarning related to 'enforce_uniqueness' parameter - Issue #1995 by @pvk-developer