str(stats["min"].to_pydatetime()) error #7

mike-vogel · 2017-05-18T01:05:05Z

I'm profiling parquet files and they are failing as shown below.

File "/usr/lib/python3.4/site-packages/spark_df_profiling/base.py", line 326, in describe_date_1d
stats["min"] = str(stats["min"].to_pydatetime())
AttributeError: 'datetime.datetime' object has no attribute 'to_pydatetime'

Line 323, copied below, only checks the 'max' value and its failing on the 'min' value.
if isinstance(stats["max"], pd.tslib.Timestamp):

Should there be a test of 'min' the same way 'max' is tested?

mike-vogel · 2017-05-18T15:22:11Z

It is an issue with a date with a year which the pandas code considers too big, e.g., 2388. This is the code that is ultimately called by line 326. Is it appropriate to add some kind of logic to change dates that are too big to some special value or be skipped rather than exiting with an exception? Pandas code that checks the year copied below.

cdef inline _check_dts_bounds(pandas_datetimestruct *dts):
cdef:
bint error = False

if dts.year <= 1677 and cmp_pandas_datetimestruct(dts, &_NS_MIN_DTS) == -1:
    error = True
elif (
        dts.year >= 2262 and
        cmp_pandas_datetimestruct(dts, &_NS_MAX_DTS) == 1):
    error = True

if error:
    fmt = '%d-%.2d-%.2d %.2d:%.2d:%.2d' % (dts.year, dts.month,
                                           dts.day, dts.hour,
                                           dts.min, dts.sec)

    raise OutOfBoundsDatetime(
        'Out of bounds nanosecond timestamp: %s' % fmt)

julioasotodv · 2017-07-05T15:37:03Z

I see, great point!

Well, my guess is that the best way to deal with it is that your dates "make sense for the time being", literally.

But feel free to send a PR should you desire :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

str(stats["min"].to_pydatetime()) error #7

str(stats["min"].to_pydatetime()) error #7

mike-vogel commented May 18, 2017

mike-vogel commented May 18, 2017 •

edited

Loading

julioasotodv commented Jul 5, 2017

str(stats["min"].to_pydatetime()) error #7

str(stats["min"].to_pydatetime()) error #7

Comments

mike-vogel commented May 18, 2017

mike-vogel commented May 18, 2017 • edited Loading

julioasotodv commented Jul 5, 2017

mike-vogel commented May 18, 2017 •

edited

Loading