-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adding datetime-like dtypes to ndarray #270
Comments
I like this idea. I don't see this as being an undue burden on other languages, since they're free to deserialize the timestamps into a regular integer array. ASDF implementations would need to "remember" that the type was timestamp, but there are already other properties of ndarrays that need to be tracked and handled. Some alternative implementation ideas:
|
I also thought about something like When introducing anyOf:
- tag: http://asdf-format.org/schemas/ndarray
- tag: http://asdf-format.org/schemas/ndarray_timedelta in other schemas where both cases should be allowed. Could this be prevented? (same for using |
That's a good point, It is possible to create |
@perrygreenfield do you have any thoughts on this one? |
I guess implementing |
quick reminder that tag: http://asdf-format.org/schemas/ndarray* should also be possible now
|
Thanking for mentioning this issue! I'll read through this issue and start taking a look. |
I spent some time looking into this today. One complication (that I don't yet have a solution for) is the associated unit for a >> dt0 = np.datetime64(0xFFFF, ("s", 42))
>> dt0
numpy.datetime64('1970-02-01T20:34:30','42s')
>> dt0.tobytes()
b'\xff\xff\x00\x00\x00\x00\x00\x00'
>> dt1 = np.datetime64(0xFFFF, "D")
>> dt1
numpy.datetime64('2149-06-06')
>> dt1.tobytes()
b'\xff\xff\x00\x00\x00\x00\x00\x00'
>> dt0 == dt1
False
>> dt0.tobytes() == dt1.tobytes()
True Conversion to a 'standard' unit will mean that some valid |
True, but that would have to be stored in the for u in ["as", "fs", "ps", "ns", "us", "ms", "s", "m", "h", "D", "W", "M", "Y"]:
dtype = np.datetime64(0xFFFF, u).dtype
print(dtype.__repr__() + " : " + dtype.__str__())
dtype('<M8[as]') : datetime64[as]
dtype('<M8[fs]') : datetime64[fs]
dtype('<M8[ps]') : datetime64[ps]
dtype('<M8[ns]') : datetime64[ns]
dtype('<M8[us]') : datetime64[us]
dtype('<M8[ms]') : datetime64[ms]
dtype('<M8[s]') : datetime64[s]
dtype('<M8[m]') : datetime64[m]
dtype('<M8[h]') : datetime64[h]
dtype('<M8[D]') : datetime64[D]
dtype('<M8[W]') : datetime64[W]
dtype('<M8[M]') : datetime64[M]
dtype('<M8[Y]') : datetime64[Y] of course, it seems improbable to cover any possible "custom" datetime type dtype like |
I wanted to update this with something more substantial at this point but unfortunately all i can say is I'm still looking into this. I tried implementing this via an extension and things were complicated by the extension needing to follow every asdf standard version (like the Do you have an example of code that works around this limitation (perhaps by converting The |
In the light of discussions around version 2.0 of the asdf-standard (and the version bump of all schemas) I would be interested to hear some opinions about extending the supported dtypes of
ndarray
. Specifically I am interested in adding support fordatetime
andtimedelta
like dtypes directly to thendarray
schema.I am aware of the existing
time/time-1.1.0
schema which while versatile and complex seems to be rather specific to astropy use cases in some regards. I think working with POSIX/unix datetimes with high (ns) precision is common in many scientific applications.Currently
core/ndarray-1.0.0
supports the basic (u)int, float and complex dtypes defined in the schema:asdf-standard/schemas/stsci.edu/asdf/core/ndarray-1.0.0.yaml
Lines 190 to 191 in 29d3410
The asdf python library handles the corresponding numpy mappings here:
When looking at numpy datetime arrays those are basically just integers interpreted as POSIX timestamps or timedeltas.
Unfortunately we cannot store these in an ndarray directly without casting back to integer:
This makes handling of numpy
datetime
arrays somewhat irritating (I noticed this when working with pandas and xarray objects in asdf).I think natively supporting numpy
datetime
dtypes would simplify a lot of things when using asdf with other libraries that make use of numpysdatetime
dtypes, thus possibly expanding asdf to be used more widely (at least throughout the python/scipy ecosystem).In principle supporting more dtypes should be as easy as extending the standard schema und plugin lists for the asdf-standard schema as well as the python mapping (it seems to work but I have not looked into it in detail)
Of course one issues with adding dtypes to the core ndarray schema is that all libraries implementing the asdf-standard (asdf-cpp?) would have to add support for these specific datetime dtypes. Honestly I am not aware of how many asdf implementations there are for other languages and how difficult this would be to implement (probably not as easy as with python/numpy).
Another option could be to somehow allow an extension to add support for specific dtypes to ndarray. However I don't know if this can be done in the current implementation of the asdf-standard.
The text was updated successfully, but these errors were encountered: