Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ingest fails when columns don't align across sets #100

Open
shntnu opened this issue May 3, 2018 · 3 comments
Open

Ingest fails when columns don't align across sets #100

shntnu opened this issue May 3, 2018 · 3 comments
Labels

Comments

@shntnu
Copy link
Member

shntnu commented May 3, 2018

Some image.csv 's may have a few missing columns. However, because odo enforces a NOT NULL constraint on all columns, this throws an error.

@shntnu
Copy link
Member Author

shntnu commented May 4, 2018

We'd want to optionally specify a "reference" Image.csv which has all the columns

One way to do this would be to specify the folder name in seed

def seed(source, target, config_file, skip_image_prefix=True, reference_set=None):

And then append reference_set to directories appropriately (i.e. check whether reference set exists, and then move it to the front)

directories = sorted(list(cytominer_database.utils.find_directories(source)))

This would now reduce the complexity a bit, because the only thing we need to worry about is CSVs with fewer columns that the image CSV in reference_set

@shntnu
Copy link
Member Author

shntnu commented May 4, 2018

@bethac07 proposed an alternative – just randomly sampled n image CSVs and pick the one with most number of columns as the reference

@diskontinuum
Copy link
Contributor

diskontinuum commented Sep 24, 2020

This was addressed in the latest merged PR "Parquet_integration #122" .
Choose the --parquet option and the added functionality (determining a reference schema for the table columns, opening a writer and converting all subsequent files to that reference schema) solves the issue.
The --sqlite option does not use the additional functionality (yet).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants