Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use statistics in Faker CTAS #24585

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

nineinchnick
Copy link
Member

@nineinchnick nineinchnick commented Dec 26, 2024

Description

Use statistics when using CREATE TABLE AS SELECT in the Faker connector to:

  • set the default_limit table property to the estimated number of rows from the source table
  • set the min and max column properties based on the statistics
  • detect high-cardinality integer columns and use sequences for them
  • detect low-cardinality columns and generate dictionaries to select values from

Additional context and related issues

Previous attempt #24098 was abandoned after #24147 was reported. This time we only use views for sequence columns, and if this is not very useful, we can avoid creating the views automatically. Or this could be yet another column property.

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text:

## Faker
* Use statistics when using `CREATE TABLE AS SELECT` in the Faker connector. ({issue}`issuenumber`)

@nineinchnick
Copy link
Member Author

@raunaqmorarka this is the last one, I promise :-)

When creating a table in the Faker connector from an existing table,
gather column statistics to determine range constraints, set them as
column properties.
When creating a table in the Faker connector from an existing table,
using column statistics determine low cardinality columns, and generate
values from a randomly generated set.
@nineinchnick nineinchnick force-pushed the faker-range-constraint-views branch from 8af4d29 to c40b9fa Compare December 31, 2024 15:12
@nineinchnick
Copy link
Member Author

@raunaqmorarka and @losipiuk this is ready for a review. It's the last one about Faker, I don't have anything else planned for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

1 participant