You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm profiling parquet files and they are failing as shown below.
File "/usr/lib/python3.4/site-packages/spark_df_profiling/base.py", line 326, in describe_date_1d
stats["min"] = str(stats["min"].to_pydatetime())
AttributeError: 'datetime.datetime' object has no attribute 'to_pydatetime'
Line 323, copied below, only checks the 'max' value and its failing on the 'min' value.
if isinstance(stats["max"], pd.tslib.Timestamp):
Should there be a test of 'min' the same way 'max' is tested?
The text was updated successfully, but these errors were encountered:
It is an issue with a date with a year which the pandas code considers too big, e.g., 2388. This is the code that is ultimately called by line 326. Is it appropriate to add some kind of logic to change dates that are too big to some special value or be skipped rather than exiting with an exception? Pandas code that checks the year copied below.
I'm profiling parquet files and they are failing as shown below.
File "/usr/lib/python3.4/site-packages/spark_df_profiling/base.py", line 326, in describe_date_1d
stats["min"] = str(stats["min"].to_pydatetime())
AttributeError: 'datetime.datetime' object has no attribute 'to_pydatetime'
Line 323, copied below, only checks the 'max' value and its failing on the 'min' value.
if isinstance(stats["max"], pd.tslib.Timestamp):
Should there be a test of 'min' the same way 'max' is tested?
The text was updated successfully, but these errors were encountered: