-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about FORMAT specifier's purpose and usefulness in a DataLink-dominated world #7
Comments
On Tue, May 21, 2024 at 10:23:21AM -0700, gpdf wrote:
`FORMAT` seems sufficiently useful that it's a shame to, in effect,
Hm. Is it? Useful, I mean? With all my time in the VO I've
yet to find a credibly useful application. In particular I cannot
believe that someone would want to have a "good" result swallowed
just because it may be in a format they cannot immediately read.
Also, in the rare cases in which I have put multiple formats for the
same dataset into a table (in SSAP), I've always regretted it. If
that is a typical experience, then FORMAT in effect becomes a
parameter "either hit the sort of format the resource has and get the
same result that you'd get if you left out FORMAT, or miss it and
you'll have an empty result". That, again, does not seem to be
useful behaviour to me.
So... now that we have datalink, I'd all be for dropping FORMAT and
telling people who want to disseminate their data in multiple formats
to use datalink.
|
I would support dropping it for same reasons expressed above: not actually useful and leads to surprising results. |
OK, I'm actually pretty glad to hear you both say this. The recommendation to interactive clients' developers, then, might be "if #this, or any other entry in a DataLink table appears in multiple rows differing only by content-type, the client may wish to display to the user that they have a choice of available data formats for retrieving the data". (Noting that this is not about "science FITS vs. preview JPEG" but more about "science FITS vs. science HDF5" or, say, "VOTable-TABLEDATA vs. Parquet" -- choices of content-equivalent representations.) |
How do we get this rolling? Is this a deprecation warning to be added to the next SIA 2.x, along with an explanatory sentence or three? Or do we view |
I have never used or implemented SIA, so my opinion may not be the best here. But I would tend to say, let's start DAP without the |
Yes, I think we would just drop FORMAT from DAP entirely. Maybe we don't need to do anything with SIAv2... I can't really think of an erratum that would be at all helpful. |
That sounds reasonable to me. Regarding SIAv2: The message one would like to convey is "On some archives, FORMAT is unlikely to do what you think it should do" but it almost seems like that's something that should be in client documentation rather than in the standard. If we were going to issue a new SIAv2.* for some reason, I can think of some better wording, but I don't think issuing an erratum (which we don't fold back into the source document anyway) is going to get the message out effectively. |
Humm !!! Not sure I follow you there. I think many of us agreed that DAP will come with the new RETRIEVEMODE capability (cutout or retrieval for example). So if the service is able to transform the data it can also create on the fly a new format. IN that case the FORMAT parameter is forcing that transformation. This is consistent with the access_url being a SODA one and consistent with FORMAT in SODA. For the issue raised by Gregory (D-F) and extensively discussed in Sydney (https://wiki.ivoa.net/internal/IVOA/InterOpMay2024DAL/DAL-2-notes) I Think the solution would be to change the text to say that FORMAT in combination with DataLink forces the value of the content_type of #this item in the DataLink service. By the way notice that after SIAV2 Erratum-2 the initial text for FORMAT is now : "The FORMAT parameter specifies the format returned by the access link. The value is compared with the access_format column from the ObsCore data model in order to select the datasets with the required format. This column describes the format of the response from the access_url (see 3.1.3) so the values could be data file types (e.g. application/fits) or they could be the DataLink [8] MIME type." |
Colloquially, the
FORMAT
parameter is intended to apply a constraint to the persistence format of the data product associated with a row in a conceptual underlying ObsCore table. (Obviously there doesn't have to be such a table in order to implement SIAv2/DAP, but the standard is written with reference to a table model.)From a user perspective, this is clearly meant to enable, for instance, limiting a query to data in FITS format. In the DAP era where tabular datasets may be available from a service, a user might say "I'm only interested in Parquet".
The way the standard is written, though, it's clear that
FORMAT
is meant to be evaluated against the value ofaccess_format
in the query response. In many archives (CADC, Rubin, probably at least parts of IRSA in the future) we have adopted the "DataLink model" for providing ObsTAP/SIAv2 dataset access, though, where the actualaccess_format
value is always the DataLink MIME type.The standard even acknowledges this:
It seems like we've accidentally shot ourselves in the foot here. No non-IVOA-aware science user would be expected to be know about the "DataLink model" -- they go to Firefly, say, do a query, and, if possible, Firefly will show them the
#this
target from the DataLink links service, rather than making them navigate the indirection on their own. Unless they deliberately click on the part of the UI that lets them see the links response and any associated additional datasets, they won't be aware of DataLink at all. That's a good thing.So in this situation if someone does
FORMAT=fits
they are likely to be very surprised by the results.I realize the difficulties involved in potentially prying
FORMAT
off its mandatory link toaccess_format
, but I think it would be worth our having a conversation about whether an interpretation "ifaccess_format
is the DataLink MIME type, then evaluate the restriction against thecontent_type
for the#this
entry in the resulting links table" would be sustainable, at least as an option.I recognize that this might require data publishers to add a column to the underlying table to make sure that the "real" data type is efficiently queryable.
FORMAT
seems sufficiently useful that it's a shame to, in effect, be forced to lose its usefulness in exchange for all the other big advantages of the "DataLink model". For Rubin (IMO) it's still a worthwhile tradeoff if we can't fix this, but... let's try to think this through and fix it.My guess is that this must have been discussed before, but I haven't found the trail yet.
The text was updated successfully, but these errors were encountered: