-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Artifact cleanup #1341
Artifact cleanup #1341
Conversation
Now that the column names have been cleaned up quite a bit, is this a good PR to add docstrings to each workflow step to document the expected schema of the input/output dataframes? As an alternative approach to docstrings, what do you think about using pydantic models to reference and validate the schema of each dataframe? |
I would much prefer Pydantic models, as I've noticed some of those docstrings were out of date anyway. We can add that at any time though I think. FWIW, I also added a docs page for the parquet schemas to help answer user questions. |
Cleans up a large number of duplicated or unused fields in the output artifacts, and renames a few things for consistency.