Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Table output in Data.read_many #11546

Merged
merged 87 commits into from
Nov 29, 2024

Conversation

radeusgd
Copy link
Member

@radeusgd radeusgd commented Nov 13, 2024

Pull Request Description

Important Notes

Checklist

Please ensure that the following checklist has been satisfied before submitting the PR:

  • The documentation has been updated, if necessary.
  • Screenshots/screencasts have been attached, if there are any visual changes. For interactive or animated visual changes, a screencast is preferred.
  • All code follows the
    Scala,
    Java,
    TypeScript,
    and
    Rust
    style guides. In case you are using a language not listed above, follow the Rust style guide.
  • Unit tests have been written where possible.
  • If meaningful changes were made to logic or tests affecting Enso Cloud integration in the libraries,
    or the Snowflake database integration, a run of the Extra Tests has been scheduled.
    • If applicable, it is suggested to paste a link to a successful run of the Extra Tests.

@radeusgd radeusgd added the CI: No changelog needed Do not require a changelog entry for this PR. label Nov 13, 2024
@radeusgd radeusgd self-assigned this Nov 13, 2024
@radeusgd radeusgd force-pushed the wip/radeusgd/11311-read-many-to-table branch 2 times, most recently from 4c4b4e0 to 68df118 Compare November 14, 2024 13:44
@radeusgd radeusgd marked this pull request as ready for review November 14, 2024 13:58
Comment on lines 11 to 21
type Return_As_Table
## Returns a table with a new column `Value` containing the objects loaded
from each file.

When the source for files to load was a table, all columns from the
original table are also retained. In case of name clashes, the newly
added columns will get a suffix.

When the source was a simple Vector, the returned table will also contain
a `File Name` column.
Table_Of_Objects
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name Table_Of_Objects was a working name. I'm not sure if we want to keep it or come up with something better?

Definitely Table_Of_Tables in Excel sounded much better. But here it may not necessarily be tables, it may be various objects. So Table_Of_Objects is quite precise, but it sounds weird.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should change approach altogether and try something like Added_Column? (meaning that the loaded files are added as a column to the input)

  • Values_Column
  • Table_With_New_Column

any thoughts on these names?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Table_With_New_Column or Table_Of_Objects seem the best to me, Values_Column I think could end up being confusing

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about more of a boolean? Merged/UnMerged. Merged True/False? I feel like it would be hard for someone to guess what table_of_objects is going to do without reading the help or trying it

Copy link
Member

@jdunkerley jdunkerley Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if:

type Return_As
   ## Single table with values expanded
   Merged_Table

   ## PRIVATE
       Add this for backwards compatiblity then hide in widget?
   Table_Of_Tables

   ## 
   Individual_Cells

   ## Objects returned as a raw vector allowing zipping to input
   Vector

Not sure on name yet either - will think some more

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes sorry that wasn't clear. When I said boolean I didn't mean a boolean type. More text options that were Merged or Not Merged.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vector is handled by Return_As_Base - the type needs to be split up so that the Vector return works even if Table is not imported. It is opaque to the user - the dropdown displays all available types and autoscoping works as shown in tests.

I'd not add Table_Of_Tables to Data.read_many. It is supported in Excel_Workbook.read_many to retain backwards compatibility. But adding it to a brand new function seems unnecessary - why add an obsolete option?

I was in my head thinking of it as a single type used by both functions but I appreciate we are actually building this dynamically. Just for working out how it might look in the dropdown thinking as a single type is a bit easier!

If the ..Table_Of_Tables was an option not shown in the dropdown but in one of the mix in types then no special handling would be needed for the two reads to share the same object was my thought.

I wonder if:

  • ..As_Vector
  • ..With_New_Column
  • ..As_Merged_Table

Might be cleanest?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the ..Table_Of_Tables was an option not shown in the dropdown but in one of the mix in types then no special handling would be needed for the two reads to share the same object was my thought.

But then more code would be affected by the legacy options, I'd rather not grow places that take legacy options when not needed. Unless there's any problem with the current approach, I'd keep the compatibility conversion as-is.

I wonder if:

  • ..As_Vector
  • ..With_New_Column
  • ..As_Merged_Table

Might be cleanest?

That sounds quite clear to me 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I like those options

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The options outlined in the comment above have been implemented. Excel_Workbook.read_many also allows old options for compatibility, but Data.read_many (the new component) only works with new ones.

@@ -16,146 +7,3 @@ type Match_Columns

Note: column names are not compared.
By_Position

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No meaningful code changes in this file, just moving the helpers to separate module - Match_Columns_Helpers.

@radeusgd
Copy link
Member Author

  • Updating the widget to be a vector of file pickers.
  • Fixing In_List to throw missing argument.
    image

@radeusgd
Copy link
Member Author

Maybe out of scope for this MR, but this warning is very long

image

Warning: No common type was found for types: Char (variable length, max_size=unlimited), Char (variable length, max_size=unlimited), Char (variable length, max_size=unlimited), Integer (64 bits) when unifying column [Sales], so the values were converted to text. If you want to have mixed types instead, please cast one of the columns to Mixed beforehand.

Wonder if we should

  1. Remove the duplicate types
  2. Remove the last sentence about Mixed

So then

Warning: No common type was found for types: Char (variable length, max_size=unlimited), Integer (64 bits) when unifying column [Sales], so the values were converted to text.

Sure that's an easy change - just added distinct and removed the suffix, now we have:
image

Comment on lines 45 to 46
# TODO ideally we'd like to use `Meta` but it currently has no support for UnresolvedConstructor so we use a 'hack'.
is_the_default = value.to_text == "..As_Merged_Table"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking to_text feels like a bit of a hack, but currently Meta seems to lack support for UnresolvedConstructor - Meta.meta returns Primitive and does not allow any inspecting.

It feels like it may prove useful in the future, @jdunkerley do we want to create a ticket for the engine team to add support for Unresolved_Constructor in Meta?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have done it via

type My_Private_Type
    As_Merged_Table

and trying to convert to that.

I agree we could probably do with something on Meta.
cc: @JaroslavTulach for visibility.

Comment on lines 45 to 46
# TODO ideally we'd like to use `Meta` but it currently has no support for UnresolvedConstructor so we use a 'hack'.
is_the_default = value.to_text == "..As_Merged_Table"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have done it via

type My_Private_Type
    As_Merged_Table

and trying to convert to that.

I agree we could probably do with something on Meta.
cc: @JaroslavTulach for visibility.

@radeusgd radeusgd added the CI: Clean build required CI runners will be cleaned before and after this PR is built. label Nov 28, 2024
@radeusgd radeusgd added the CI: Ready to merge This PR is eligible for automatic merge label Nov 29, 2024
@mergify mergify bot merged commit dc55b4e into develop Nov 29, 2024
39 checks passed
@mergify mergify bot deleted the wip/radeusgd/11311-read-many-to-table branch November 29, 2024 12:06
jdunkerley pushed a commit that referenced this pull request Nov 29, 2024
- Closes #11311

(cherry picked from commit dc55b4e)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI: Clean build required CI runners will be cleaned before and after this PR is built. CI: No changelog needed Do not require a changelog entry for this PR. CI: Ready to merge This PR is eligible for automatic merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement Data.read_many
5 participants