-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Schema hints not working properly for json reads #1845
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1845 +/- ##
==========================================
- Coverage 85.62% 85.53% -0.10%
==========================================
Files 55 55
Lines 6102 6147 +45
==========================================
+ Hits 5225 5258 +33
- Misses 877 889 +12 |
a9fa929
to
e58d3ef
Compare
tests/dataframe/test_creation.py
Outdated
@@ -797,6 +797,45 @@ def test_create_dataframe_json_schema_hints_ignore_random_hint(valid_data: list[ | |||
assert len(pd_df) == len(valid_data) | |||
|
|||
|
|||
def test_create_dataframe_json_schema_hints_large_file() -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice test 😵
Another maybe easier way could be to have 2 files (Schema inference is only performed on one file by default at the moment):
# File 1
{"foo": 1}
# File 2
{"bar": 2}
Then we pass in schema hints: {"foo": Int32, "bar": Int32}
and we should see the correct result for both rows
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh yeah i like that a lot better! will do this instead of generating a 1mb json haha
Schema hints were not being propagated, leading to fields being dropped.
This stemmed from an issue when reading large jsons from s3 where the fields changed only late into the file, so schema inference doesn't pick it up.