test: disallow empty structs #9310

NickCrews · 2024-06-04T18:47:23Z

working towards #8289

I'm not sure how useful empty structs are, since it seems like
only bigquery, dask, and pandas actually support them.
But still, if you stay in ibis-land, perhaps it is useful.
ie doing datatype kung-fu or only using them for intermediate calculations.
Not that hard for us to support it, so why not.

I'm not sure of the history of the specific disallowment
that I am removing from the type inference.

Relevant context:

ibis/backends/tests/test_struct.py

NickCrews · 2024-06-06T00:37:09Z

I don't think this is supported in many of our backends. How is this spelled in DuckDB for example?

Indeed, it is only supported on bigquery, dask, and pandas currently. Possibly could be implemented in other backends with some effort. But still, if you stay in ibis-land, perhaps it is useful. ie doing datatype kung-fu or only using them for intermediate calculations. Not that hard for us to support it, so why not.

Based off your comment in this other PR, I was thinking that your desired behavior was to allow empty structs, so I was working towards that.

But fine switching these tests to test for disallowment instead, if that is the desired behavior. but I just want to nail down the semantics here before I go back to #8666

NickCrews · 2024-06-06T16:40:14Z

@jcrist @kszucs @gforsyth curious what yall think about the costs/benefits of allowing/disallowing empty structs

cpcloud · 2024-06-07T15:33:01Z

Not that hard for us to support it, so why not.

Well, famous last words :)

I was thinking that your desired behavior was to allow empty structs

I don't think I've seen a strong justification for their support in the form of a concrete use case, so I'd be in favor of dropping support for empty struct.

If we go down that route, we should also drop support for empty schemas for consistency, which also do not have any documented concrete user-facing use case.

kszucs · 2024-06-10T09:24:00Z

I'm in favor of allowing both empty structs and empty schemas, this is the pyarrow behaviour.

cpcloud · 2024-06-10T12:40:40Z

Someone really needs to provide a concrete use case for empty structs and schemas to justify keeping it around, given that effectively one backend supports it.

"PyArrow supports it" is not a strong enough reason to do something that isn't well supported across our backends at all.

See ibis-project#9310 for a discussion of whether empty structs should be allowed or disallowed. We settled on disallowing them for now, since pyarrow is like the only backend to support them, and it is definitely easier to change our mind and allow them later rather than disallow them later. Perhaps if you stay in ibis-land, emptry structs are useful ie for doing type manipulations, or maybe you only use them for intermediate calculations? But until we see this concrete use case, don't worry about it.

See ibis-project#9310 for a discussion of whether empty structs should be allowed or disallowed. We settled on disallowing them for now, since pyarrow is like the only backend to support them, and it is definitely easier to change our mind and allow them later rather than disallow them later. Perhaps if you stay in ibis-land, emptry structs are useful ie for doing type manipulations, or maybe you only use them for intermediate calculations? But until we see this concrete use case, don't worry about it. I had to adjust the dask testcase generation because this change revealed a bug in our existing dask tests. `ibis.backends.dask.Backend.get_schema()` uses a method of `PandasData.infer_table(df.head(1))`, so we were infering the type of the `map_of_*` testing column as `struct<>` since the first element in that test array of values was `{}`. So the simple solution was to just put the "representative" value first. This doesn't really solve the root problem in how dask does type inference, we probably want to improve this. I'd guess we should infer the column type from a larger sample of values. 1 seems really absurdly small. I tried doing this but it was trickier than I would have thought, with a head(10) I was only getting two values back, even thought there should be 3?? IDK that should get solved in a different PR.

NickCrews · 2024-06-29T21:29:14Z

hmm, our pyarrow type roundtrip fuzzing tests are failing. Can I get some help with hypothesis to limit it to only give us fields of length 1+? I don't understand hypothesis.

See ibis-project#9310 for a discussion of whether empty structs should be allowed or disallowed. We settled on disallowing them for now, since pyarrow is like the only backend to support them, and it is definitely easier to change our mind and allow them later rather than disallow them later. Perhaps if you stay in ibis-land, emptry structs are useful ie for doing type manipulations, or maybe you only use them for intermediate calculations? But until we see this concrete use case, don't worry about it. I had to adjust the dask testcase generation because this change revealed a bug in our existing dask tests. `ibis.backends.dask.Backend.get_schema()` uses a method of `PandasData.infer_table(df.head(1))`, so we were infering the type of the `map_of_*` testing column as `struct<>` since the first element in that test array of values was `{}`. So the simple solution was to just put the "representative" value first. This doesn't really solve the root problem in how dask does type inference, we probably want to improve this. I'd guess we should infer the column type from a larger sample of values. 1 seems really absurdly small. I tried doing this but it was trickier than I would have thought, with a head(10) I was only getting two values back, even thought there should be 3?? IDK that should get solved in a different PR.

cpcloud · 2024-06-30T12:52:59Z

@kszucs Can you comment on why empty structs are useful, independent of their existence in PyArrow? I still don't follow what their utility is from the perspective of an end user.

NickCrews · 2024-06-30T16:20:35Z

Just to be clear, I swapped this PR to now be checking for disallowment

cpcloud · 2024-06-30T16:48:58Z

Yep, jut wanting to keep discussion all in one place!

kszucs · 2024-07-01T09:26:27Z

@kszucs Can you comment on why empty structs are useful, independent of their existence in PyArrow? I still don't follow what their utility is from the perspective of an end user.

Struct values might be the outcome of other operations like converting a JSON typed column to a strictly typed struct typed column which would be fallible if empty struct typed are not supported. User defined functions may produce empty struct types as well.

cpcloud · 2024-07-16T13:20:28Z

Struct values might be the outcome of other operations like converting a JSON typed column to a strictly typed struct typed column which would be fallible if empty struct typed are not supported. User defined functions may produce empty struct types as well.

None of these operations can actually be executed on anything other than BigQuery, so being able to spell them in the type system is entirely moot.

jcrist · 2024-08-12T17:54:27Z

What's the status of this PR? Does it supersede the work in #8876?

cpcloud · 2024-09-30T11:20:46Z

What's the status of this PR? Does it supersede the work in #8876?

I don't think so. This PR contains tests and a bit of implementation while 8876 contains more implementation work.

cpcloud · 2024-09-30T11:21:28Z

That said, I'm going to close both of them out in an effort to set expectations on the amount of maintenance I can take on right now.

NickCrews marked this pull request as draft June 4, 2024 18:47

NickCrews mentioned this pull request Jun 4, 2024

feat: improve map(), struct(), array() #8666

Closed

cpcloud reviewed Jun 4, 2024

View reviewed changes

ibis/backends/tests/test_struct.py Outdated Show resolved Hide resolved

NickCrews force-pushed the empty-struct branch 2 times, most recently from 51e35ce to c2da371 Compare June 4, 2024 19:30

NickCrews marked this pull request as ready for review June 4, 2024 20:29

cpcloud added this to the 9.2 milestone Jun 13, 2024

NickCrews force-pushed the empty-struct branch from c2da371 to 03d853d Compare June 29, 2024 19:52

NickCrews force-pushed the empty-struct branch from 03d853d to fd3303e Compare June 29, 2024 19:52

NickCrews changed the title ~~feat: allow empty structs~~ test: disallow empty structs Jun 29, 2024

NickCrews force-pushed the empty-struct branch from fd3303e to 628134e Compare June 29, 2024 20:59

NickCrews force-pushed the empty-struct branch from 628134e to cb79f00 Compare June 29, 2024 21:28

NickCrews force-pushed the empty-struct branch from cb79f00 to 3ae3754 Compare June 29, 2024 22:59

cpcloud removed this from the 9.2 milestone Jul 15, 2024

cpcloud closed this Sep 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: disallow empty structs #9310

test: disallow empty structs #9310

NickCrews commented Jun 4, 2024 •

edited

Loading

NickCrews commented Jun 6, 2024

NickCrews commented Jun 6, 2024

cpcloud commented Jun 7, 2024

kszucs commented Jun 10, 2024

cpcloud commented Jun 10, 2024 •

edited

Loading

NickCrews commented Jun 29, 2024 •

edited

Loading

cpcloud commented Jun 30, 2024

NickCrews commented Jun 30, 2024

cpcloud commented Jun 30, 2024

kszucs commented Jul 1, 2024

cpcloud commented Jul 16, 2024

jcrist commented Aug 12, 2024

cpcloud commented Sep 30, 2024

cpcloud commented Sep 30, 2024

test: disallow empty structs #9310

test: disallow empty structs #9310

Conversation

NickCrews commented Jun 4, 2024 • edited Loading

NickCrews commented Jun 6, 2024

NickCrews commented Jun 6, 2024

cpcloud commented Jun 7, 2024

kszucs commented Jun 10, 2024

cpcloud commented Jun 10, 2024 • edited Loading

NickCrews commented Jun 29, 2024 • edited Loading

cpcloud commented Jun 30, 2024

NickCrews commented Jun 30, 2024

cpcloud commented Jun 30, 2024

kszucs commented Jul 1, 2024

cpcloud commented Jul 16, 2024

jcrist commented Aug 12, 2024

cpcloud commented Sep 30, 2024

cpcloud commented Sep 30, 2024

NickCrews commented Jun 4, 2024 •

edited

Loading

cpcloud commented Jun 10, 2024 •

edited

Loading

NickCrews commented Jun 29, 2024 •

edited

Loading