-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HEALPix Formats #80
Comments
I'd like to have a format that would support pointlike. Currently a pointlike binned data file has the following format with HDU tables for BANDS, PIXELS, and GTI
For 8 years of data, front/back 4 bins per decade from 10 MeV to 1 TeV, the file length is 120 MB; with the PSF event classes it is 165 MB. |
@woodmd - This is what I mentioned earlier:
I'm not sure if HIPS has anything concretely that affects our specs here. I'm just mentioning it as an example of a multi-resolution format built on top of HEALPIX. They do have "cubes" (see section 4.3), but there is not much to their cube format, and it's not directly applicable to what we have. The NUNIQ pixel ID scheme might be something to adopt if we ever want to have a format that supports multi-resolution HEALPIX in a single table. It could be a change to the current sparse format: http://gamma-astro-data-formats.readthedocs.io/en/latest/skymaps/healpix/index.html#sparse-format currently to support multi-resolution pixels (within one band), or just a new format added later if deemed useful for some use cases. |
I discussed the proposal for quite some time with Matt yesterday. Although there are a lot of nice things there, there are two things that I'm not really in favor of.
Specifically, for the SPARSE format, the proposal requires the user to access the BANDS to get the slicing information about the pixel index needed to unroll the index. Although it is true that you already have to read extra HDUs to get things like energy bounds, a very important distinction is that these extra HDUs only define the binning of the data, not internals of the data structure. I.e., you can treat the data as an n-dimensional array, and then tack on the bin edges after the fact once you need them. On the other hand, with the proposal here, simply in order to read the data into an n-dimensional array requires you to also access the BANDS table. This means, that for example, to copy a single image from one file to another will require you also to copy the BANDS table. In practice that is often not the case now, as typically the other information in the BANDS table (or the ENERGIES or EBOUNDS) table remains the same for the entire analysis chain. |
I'd also prefer to keep all image data in a single HDU but the current design is partially constrained by the limitations of the FITS format. The solution for other binary formats is to have a hierarchical structure that groups associated data tables together. FITS does support HDU groups (see https://heasarc.gsfc.nasa.gov/fitsio/c/c_user/node60.html) but it's not clear to me how well this functionality is supported by different FITS implementations (for instance I'm pretty doubtful that the FITS I/O layer in the STs currently supports this) and it's not a feature of FITS that I have much familiarity with. Note that I do think it will be valuable to have the flexibility to write maps of different geometries to the same file. This would allow FITS files to be used more as general purpose data containers and would reduce the amount of file bookkeeping required of the user. For instance currently for multi-component LAT analyses one has to keep track of separate counts, exposure, FT1, and source map files for each component. Being able to put counts maps for all components in a single file seems like it could be a nice convenience. Note that the spec as currently written should be entirely backwards-compatible with the current convention of one EBOUNDS table per file and in principle no changes would be required of existing software like the STs or ctools. With regard to the SPARSE format Eric preferred to keep to the convention of having one column per energy slice (band) which for variable-sized images means transposing the table to a single row. Further one could use columns of the same table to define quantities that are currently written to the BANDS table (e.g. energy bounds, nside, npix etc.). This arrangement has the nice feature that it is entirely self-contained in a single HDU and doesn't require one to look in another HDU to figure out the band index mapping. The potential disadvantage is that when dealing with a large number of bands (e.g. as for a time series) one may run into the limit on the number of columns in a FITS BinTable (999). A workaround might be to have the option to put all bands into a single column. However with this one would lose the ability to perform random access into individual bands. I did some performance comparisons between the current proposal and the alternatives I discussed with Eric:
I performed all tests with the astropy.io.fits reader with memmap=True. I found that getting consistent performance numbers is quite challenging and seems to be heavily dependent on the type of disk access (local or NFS-mounted). However I was able to conclude that the relative speed among these solutions differs by no more than a factor of 2-3. Given that disk I/O is a pretty small part of the runtime for photon analysis the guiding factor for choosing a format should probably be convenience and simplicity rather than raw performance. On this basis I'm now leaning more toward going with a column-wise SPARSE format but I'm curious to know if other people have thoughts on this. |
@woodmd did some HEALPIX spec updates in a65a70d and updated the example files in a1a4e3f . So everyone - please have a look at the latest spec here: http://gamma-astro-data-formats.readthedocs.io/en/latest/skymaps/index.html @jknodlseder - Maybe you also want to have a look? Personally, I'm not sure I have much to add to the discussion, but I'll make some comments: @eacharles - I agree that having "one image = one HDU" is very important. But I think this was lost already with EBOUNDS and ENERGIES being in separate HDUs, and there's no simple way to join that into one HDU. (Did anyone talk to FITS WCS experts and try to embed it as a table WCS axis in the header or change to BINTABLE in order to join them?) Is it really important to be able to access pixel data in the SPARSE HEALPIX format without accessing the BANDS table? Are there important use cases to work with the pixels at all without knowing what energy band or event type they are? That said, I haven't thought about SPARSE HEALPIX much, and don't have a preference for how it's serialised.
For me, this would be more important to look at: what do the WCS-based maps formats look like? Can the ENERGIES / EBOUNDS / BANDS table be used for them 1:1 as they are now defined in the HEALPIX maps section? What would be your proposed serialisation format for the pixels from the WCS data (e.g. for different event types as extra axis, or separate IMAGE HDUs, or change to BINTABLE)? Do we want to support non-homogeneous geometry across energy bins or event types there as well? I assume yes for efficiency / flexibility, and then extra axis in an IMAGE HDU doesn't work and we need a BANDS table pointing to multiple 2-dim image HDUs with the pixels? With whatever solution we end up with, I think it's important to clearly state if having multiple maps in a given FITS file is supported or not. And if yes, given an example file that does have multiple maps in a single file, and state how tools can figure out which HDUs (pixels, ENERGIES, EBOUNDS, BANDS) belong to which sky map. |
I think WCS-based maps could probably use most of the conventions of the HEALPix maps:
There's currently a placeholder for documenting the WCS-based map formats (see http://gamma-astro-data-formats.readthedocs.io/en/latest/skymaps/wcs/index.html) so it would probably be good to start working on that in parallel with the HEALPix formats. |
I have posted a first draft of a proposal for HEALPix FITS formats to this page. This thread is intended to collect comments and feedback on this proposal.
The current draft spec was based mostly on the conventions we currently use in the Fermi Science Tools. The HEALPix-specific ST functionality is not yet widely used so we are also open to updating the ST implementation to reflect whatever we agree on for this specification.
The text was updated successfully, but these errors were encountered: