Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: data list metadata #2529

Merged
merged 8 commits into from
Nov 22, 2024
Merged

Feat: data list metadata #2529

merged 8 commits into from
Nov 22, 2024

Conversation

chrismclarke
Copy link
Member

@chrismclarke chrismclarke commented Nov 15, 2024

PR Checklist

  • PR title descriptive (can be used in release notes)

Description

Add methods to parser to add a _metadata column to data_lists and track data types of non-string columns

This is designed to improve on the type infer methods used in #2513 (can update post-merge), so that set_data operations can correctly use the expected column data types.

Author Notes

Use
There are 2 methods now included to infer data types within the parser

  1. If the author includes an @metadata row can manually specify type: number/boolean within the corresponding column cell.
  2. Find first non-empty data entry and use type from that (will check all rows if required).

Note - only number or boolean types need to be specified as assume string by default, an if using nested object types these will automatically be detected when creating the data_list (columns would be formatted with . notation, e.g. time.hour and time.minute.

Data Changes
The corresponding generated _metadata will be included in the parsed data_lists github. Metadata is only included if specific entries exist (so will not appear for simple data_lists where every column is represented by string values)

Quality Control
I had originally planned to include errors or warnings when the data_type could not be inferred (e.g. no data in the column), however realised this is probably a fairly common case and having lots of warnings would not be so helpful (example screenshot)

Example - warning log output - not currently in use
image

So instead, when data types cannot be inferred it will simply be assumed that they are designed to be strings. The data types shouldn't really be all that important unless using set_data operation to update data from action list strings (or in the future when adding new rows).

Similarly metadata is only included for columns that are not formatted as strings, to avoid overload of potentially less helpful information.

Future Plans
With the use of the metadata column we could look to expand in the future to include more things features such as data validation or default values

Review Notes

The easiest way to examine the new metadata is to sync and review the deployment repo diffs generated

yarn workflow sync;
yarn workflow repo open

Dev Notes

If this happens to be merged before #2513 I can update, but I don't think it should block it in any way (non-breaking changes, can just be implemented in future update when reviewing add_data action)

Git Issues

Closes #2524

Screenshots/Videos

Example - metadata now included in data_lists for all non-string columns (as inferred from data)
image

@github-actions github-actions bot added feature Work on app features/modules scripts Work on backend scripts and devops labels Nov 15, 2024
@chrismclarke chrismclarke marked this pull request as ready for review November 15, 2024 01:31
@github-actions github-actions bot added feature Work on app features/modules and removed feature Work on app features/modules labels Nov 15, 2024
Copy link
Collaborator

@esmeetewinkel esmeetewinkel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without @metadata row (plh_kids_tz deployment): Functional test passed.
image

With @metadata row (debug deployment): I didn't expect to see the item {"id": "@metadata", "no": "type: number"} in the list of rows. Did you leave it there intentionally? This causes a problem when looping over this list.
debug_data_list
debug_metadata template
image
image

image

image

@chrismclarke
Copy link
Member Author

chrismclarke commented Nov 22, 2024

With @metadata row (debug deployment): I didn't expect to see the item {"id": "@metadata", "no": "type: number"} in the list of rows. Did you leave it there intentionally? This causes a problem when looping over this list. debug_data_list debug_metadata template

Thanks for catching this @esmeetewinkel , no I had forgotten to strip away the metadata row after processing so have now updated the PR if you're able to take another look?

Should now display as expected with metadata row removed from list:
image

image

image

Copy link
Collaborator

@esmeetewinkel esmeetewinkel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functional test passed.

Copy link
Collaborator

@jfmcquade jfmcquade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice solution, looks good. And nice to have tests as always, I can confirm they are passing.

@chrismclarke chrismclarke merged commit 0bbf468 into master Nov 22, 2024
6 checks passed
@chrismclarke chrismclarke deleted the feat/data-list-metadata branch November 22, 2024 20:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Work on app features/modules scripts Work on backend scripts and devops
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE] Provide alternate ways to define the schema of a data_list
3 participants