Support float16 datatype #410

eschnett · 2023-11-11T21:07:09Z

I want to store float16 types in ndarrays. Would it be possible to extend scalar-datatype to allow for float16 and complex32 types?

I am specifically looking for the official float16 type (as available in many compilers and in CUDA) and not for bfloat16.

The text was updated successfully, but these errors were encountered:

braingram · 2023-11-13T17:11:19Z

Thanks for bringing this up.

I'm currently undecided on adding these types to the standard (vs adding them via an extension). My main concerns to adding more datatypes to the ndarray standard are:

there are many possible datatypes and a standard that supports them all will become unwieldy
by including them in the standard, we create the expectation that implementations will need to support all of these datatypes (even if only a small subset are useful)
this increases the distance between the asdf-standard ndarray definition and the developing array-api standard

@perrygreenfield or @eslavich do either of you have some input on what motivated the current datatypes in ndarray?

The current datatypes:

asdf-standard/resources/schemas/stsci.edu/asdf/core/ndarray-1.1.0.yaml

Line 199 in 3f5d5e8

- type: string

do appear to be a relatively close match to the array-api standard:
https://data-apis.org/array-api/latest/API_specification/data_types.html
(except for bool8 in asdf-standard vs bool in array-api and the inclusion of ascii and ucs4 datatypes in asdf-standard).

I haven't looked into what would be required to "extend" the ndarray schema with an updated datatype (to provide a schema for an extension that could implement float16, complex32 etc). @eschnett have you looked into that and are there changes to the ndarray schema that might make that easier?

eschnett · 2023-11-13T18:07:18Z

I understand your concerns. Let me make an argument from a different point of view:

(1) The array api standard is a standard for Python. I am not really using Python; I'm usually using either C++ or Julia. Here is a list of datatypes supported by C++. That's a relatively small number of types, and these types will come up naturally in many circumstances. Julia supports float16, float32, and float64.

(2) It isn't really important to me whether this is part of the standard or part of an extension. However, asdftool should support these types in a relatively straightforward manner. When I look at the section on extensions then I am afraid that using an extension (not writing – using, as an end user) is somewhat complicated since extensions need to be installed. They probably need to be installed on every system that I am using, and by every collaborator of mine, and in every CI setup that I'm using. I'm afraid that installing a set of extensions might end up significantly more complex than just apt install python-asdf or pip install asdf, and if that is the case, then the implementation overhead of adding float16 to the asdf library could be justified. For example, adding float16 support to asdf-cxx was a relatively small effort and didn't increase the boilerplate code by very much.

(3) Independent of the above it would be very convenient if the content of ASDF files were accessible without loading an extension for datatypes. The HDF5 standard defines a way to define floating-point types, and contains generic code to convert between any kind of floating-point numbers. This allows reading any HDF5 file into any floating-point format. For example, reading float16 into a float32 in memory is possible, even if the local system does not support float16. I understand that this generic code might not be the most efficient, but such a feature would be valuable to have. I assume the layout of a floating-point number is defined in terms of position and length (in number of bits) of sign, mantissa, exponent, etc., with maybe a flag whether denormalized numbers etc. are supported. If ASDF was to add support for additional floating point numbers, be it be an extension or not, the such a mechanism would be quite convenient.

braingram · 2023-11-15T21:00:41Z

To keep this conversation going (which I think is great!) I'm going to comment on a few (but not all) of the points raised.

re 1) float16 would be relatively easy for the python asdf library to support (as it's supported by numpy). complex32 is more difficult as it's not supported by numpy. asdf heavily relies on numpy for array handling and may need to define a new dtype and/or change how NDArrayType interacts with ndarray to support complex32 data. This is not to say that float16 and complex32 shouldn't be supported, I just wanted to highlight the difficulty this would present for the python library.

re 2) Is there a command(s) in asdftool that you'd like to see support the new types? Many parts of this tool automatically detect any installed extensions (via python entry points) but there are certain parts of the tool (like diff) that use the 'raw' tree. Thankfully installing extensions is as easy as pip install asdf-astropy (for adding asdf support for a large number of astropy objects) via the magic of entry points. I'm happy to go into details (and expand the documentation if you see spots that left you with more questions).

re 3) You definitely have me curious to look at the HDF5 standard and implementations to see how that's handled. Since ASDF blocks are just a collection of bytes it should be possible to make an extension that reads these bytes in and converts them to any other format. The python library does something like this for arrays where the block is read as a sequence of bytes and then converted to the dtype from the tree. Internally this is using the same extension API. The asdf-zarr extension also accesses the ASDF block data and converts it to types supported by zarr.

CagtayFabry · 2023-11-21T08:21:20Z

In light of this issue, would you also consider again adding support for datetime and timedelta dtypes as discussed in #270 ?

braingram · 2023-12-06T00:32:56Z

@eschnett I just merged #411 adding float16 to the standard. I'm not marking this issue as closed since complex32 is also discussed.

I am hoping to make a new asdf-standard release soon to allow the companion PR adding float16 to the python asdf library asdf-format/asdf#1692 to be brought out of draft and reviewed.

braingram mentioned this issue Nov 20, 2023

add float16 support to version 1.6.0 #411

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support float16 datatype #410

Support float16 datatype #410

eschnett commented Nov 11, 2023

braingram commented Nov 13, 2023

eschnett commented Nov 13, 2023

braingram commented Nov 15, 2023

CagtayFabry commented Nov 21, 2023

braingram commented Dec 6, 2023

Support float16 datatype #410

Support float16 datatype #410

Comments

eschnett commented Nov 11, 2023

braingram commented Nov 13, 2023

eschnett commented Nov 13, 2023

braingram commented Nov 15, 2023

CagtayFabry commented Nov 21, 2023

braingram commented Dec 6, 2023