Release v1.13.0 - 2024-05-15 · sdv-dev/SDV

This release adds a utility function called get_random_subset that helps users get a subset of their multi-table data so that modeling can be done quicker. Given a dictionary of table names mapped to DataFrames, metadata, a main table and a desired number of rows to use for the main table, it will subsample the data in a way that maintains referential integrity.

This release also adds two new local file handlers: the CSVHandler and the ExcelHandler. This enables users to easily load from and save synthetic data to these files types. These handlers return data and metadata in the multi-table format, so we also added the function get_table_metadata to get a SingleTableMetadata object from a MultiTableMetadata object.

Finally, this release fixes some bugs that prevented synthesizers from working with data that had numerical column names.

New Features

Add get_random_subset poc utility function - Issue #1877 by @R-Palazzo
Add usage logging - Issue #1903 by @pvk-developer
Move function drop_unknown_references from poc to be directly under utils - Issue #1947 by @R-Palazzo
Add CSVHandler - Issue #1949 by @pvk-developer
Add ExcelHandler - Issue #1950 by @pvk-developer
Add get_table_metadata function - Issue #1951 by @R-Palazzo
Save usage log file as a csv - Issue #1974 by @frances-h
Split out metadata creation from data import in the local files handlers - Issue #1975 by @pvk-developer
Improve error message when trying to sample before fitting (single table) - Issue #1978 by @R-Palazzo

Bugs Fixed

Metadata detection crashes when the column names are integers (AttributeError: 'int' object has no attribute 'lower') - Issue #1933 by @lajohn4747
Synthesizers crash when column names are integers (TypeError: unsupported operand) - Issue #1935 by @lajohn4747
Switch parameter order in drop_unknown_references - Issue #1944 by @R-Palazzo
Unexpected NaN values in sequence_index when dataframe isn't reset - Issue #1973 by @fealho
Fix pandas DtypeWarning in download_demo - Issue #1980 by @fealho

Maintenance

Only run unit and integration tests on oldest and latest python versions for macos - Issue #1948 by @frances-h

Internal

Update code to remove FutureWarning related to 'enforce_uniqueness' parameter - Issue #1995 by @pvk-developer

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.13.0 - 2024-05-15

New Features

Bugs Fixed

Maintenance

Internal

Contributors