Add binary/opaque dtype #34
Labels
category: proposal
discussion of proposed enhancements or new features
priority: low
alternative solution already working and/or relevant to only specific user(s)
Milestone
Related to NeurodataWithoutBorders/nwb-schema#574 to allow the storage of raw binary data that follows a particular format, e.g., MP4, PNG.
In the hdmf schema language, dtype "bytes" maps to variable length string with ascii encoding.
In HDMF, if I try to write a MP4 byte stream with dtype "bytes" to an HDF5 file, I get the error
ValueError: VLEN strings do not support embedded NULLs
.Here is the error with a simple h5py-based exmple:
H5py docs recommend against storing raw binary data as variable length strings with an encoding. It says:
To enable storage of raw binary data, I propose we add a new dtype to the schema language that maps to HDF5 OPAQUE / void dtype. We can't use the dtype name
"bytes"
because we use that for ascii data. What about"binary"
?Alternatively, raw binary data could be stored as a 1-D array of uint8 values, but using dtype uint8, as opposed to OPAQUE, may cause accidental conversion.
The text was updated successfully, but these errors were encountered: