Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

expose links to all export formats via Signposting #11045

Open
wants to merge 8 commits into
base: develop
Choose a base branch
from
9 changes: 9 additions & 0 deletions doc/release-notes/10542-signposting.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Signposting Output Now Contains Links to All Dataset Metadata Export Formats

When Signposting was added in Dataverse 5.14 (#8981), it only provided links for the `schema.org` metadata export format.

The output of HEAD, GET, and the Signposting "linkset" API have all been updated to include links to all available dataset metadata export formats (including any external exporters, such as Croissant, that have been enabled).

This provides a lightweight machine-readable way to first retrieve a list of links (via a HTTP HEAD request, for example) to each available metadata export format and then follow up with a request for the export format of interest.

See also [the docs](https://preview.guides.gdcc.io/en/develop/api/native-api.html#retrieve-signposting-information) and #10542.
2 changes: 1 addition & 1 deletion doc/sphinx-guides/source/admin/discoverability.rst
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ The Dataverse team has been working with Google on both formats. Google has `ind
Signposting
+++++++++++

The Dataverse software supports `Signposting <https://signposting.org>`_. This allows machines to request more information about a dataset through the `Link <https://tools.ietf.org/html/rfc5988>`_ HTTP header.
The Dataverse software supports `Signposting <https://signposting.org>`_. This allows machines to request more information about a dataset through the `Link <https://tools.ietf.org/html/rfc5988>`_ HTTP header. Links to all enabled metadata export formats are given. See :ref:`metadata-export-formats` for a list.

There are 2 Signposting profile levels, level 1 and level 2. In this implementation,
* Level 1 links are shown `as recommended <https://signposting.org/FAIR/>`_ in the "Link"
Expand Down
52 changes: 43 additions & 9 deletions doc/sphinx-guides/source/api/native-api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1336,6 +1336,8 @@ Export Metadata of a Dataset in Various Formats

|CORS| Export the metadata of the current published version of a dataset in various formats.

To get a list of available formats, see :ref:`available-exporters` and :ref:`get-export-formats`.

See also :ref:`batch-exports-through-the-api` and the note below:

.. code-block:: bash
Expand All @@ -1352,9 +1354,30 @@ The fully expanded example above (without environment variables) looks like this

curl "https://demo.dataverse.org/api/datasets/export?exporter=ddi&persistentId=doi:10.5072/FK2/J8SJZB"

.. note:: Supported exporters (export formats) are ``ddi``, ``oai_ddi``, ``dcterms``, ``oai_dc``, ``schema.org`` , ``OAI_ORE`` , ``Datacite``, ``oai_datacite`` and ``dataverse_json``. Descriptive names can be found under :ref:`metadata-export-formats` in the User Guide.
.. _available-exporters:

Available Dataset Metadata Exporters
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The following dataset metadata exporters ship with Dataverse:

- ``Datacite``
- ``dataverse_json``
- ``dcterms``
- ``ddi``
- ``oai_datacite``
- ``oai_dc``
- ``oai_ddi``
- ``OAI_ORE``
- ``schema.org``

These are the strings to pass as ``$METADATA_FORMAT`` in the examples above. Descriptive names for each format can be found under :ref:`metadata-export-formats` in the User Guide.

Additional exporters can be enabled, as described under :ref:`external-exporters` in the Installation Guide. The machine-readable name/identifier for each external exporter can be found under :ref:`inventory-of-external-exporters`. If you are interested in creating your own exporter, see :doc:`/developers/metadataexport`.

.. note:: Additional exporters can be enabled, as described under :ref:`external-exporters` in the Installation Guide. To discover the machine-readable name of each exporter (e.g. ``ddi``), check :ref:`inventory-of-external-exporters` or ``getFormatName`` in the exporter's source code.
To discover the machine-readable name of exporters (e.g. ``ddi``) that have been enabled on the installation of Dataverse you are using see :ref:`get-export-formats`. Alternatively, you can use the Signposting "linkset" API documented under :ref:`signposting-api`.

To discover the machine-readable name of exporters generally, check :ref:`inventory-of-external-exporters` or ``getFormatName`` in the exporter's source code.

Schema.org JSON-LD
^^^^^^^^^^^^^^^^^^
Expand All @@ -1368,6 +1391,7 @@ Both forms are valid according to Google's Structured Data Testing Tool at https

The standard has further evolved into a format called Croissant. For details, see :ref:`schema.org-head` in the Admin Guide.

The ``schema.org`` format changed after Dataverse 6.4 as well. Previously its content type was "application/json" but now it is "application/ld+json".
List Files in a Dataset
~~~~~~~~~~~~~~~~~~~~~~~

Expand Down Expand Up @@ -2934,15 +2958,23 @@ Retrieve Signposting Information
Dataverse supports :ref:`discovery-sign-posting` as a discovery mechanism.
Signposting involves the addition of a `Link <https://tools.ietf.org/html/rfc5988>`__ HTTP header providing summary information on GET and HEAD requests to retrieve the dataset page and a separate /linkset API call to retrieve additional information.

Here is an example of a "Link" header:
Signposting Link HTTP Header
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Here is an example of a HTTP "Link" header from a GET or HEAD request for a dataset landing page:

``Link: <https://doi.org/10.5072/FK2/YD5QDG>;rel="cite-as", <https://doi.org/10.5072/FK2/YD5QDG>;rel="describedby";type="application/vnd.citationstyles.csl+json",<https://demo.dataverse.org/api/datasets/export?exporter=schema.org&persistentId=doi:10.5072/FK2/YD5QDG>;rel="describedby";type="application/ld+json", <https://schema.org/AboutPage>;rel="type",<https://schema.org/Dataset>;rel="type", <https://demo.dataverse.org/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.5072/FK2/YD5QDG>;rel="license", <https://demo.dataverse.org/api/datasets/:persistentId/versions/1.0/linkset?persistentId=doi:10.5072/FK2/YD5QDG> ; rel="linkset";type="application/linkset+json"``
``Link: <https://doi.org/10.5072/FK2/YD5QDG>;rel="cite-as", <https://doi.org/10.5072/FK2/YD5QDG>;rel="describedby";type="application/vnd.citationstyles.csl+json",<https://demo.dataverse.org/api/datasets/export?exporter=OAI_ORE&persistentId=doi:10.5072/FK2/YD5QDG>;rel="describedby";type="application/json",<https://demo.dataverse.org/api/datasets/export?exporter=Datacite&persistentId=doi:10.5072/FK2/YD5QDG>;rel="describedby";type="application/xml",<https://demo.dataverse.org/api/datasets/export?exporter=oai_dc&persistentId=doi:10.5072/FK2/YD5QDG>;rel="describedby";type="application/xml",<https://demo.dataverse.org/api/datasets/export?exporter=oai_datacite&persistentId=doi:10.5072/FK2/YD5QDG>;rel="describedby";type="application/xml",<https://demo.dataverse.org/api/datasets/export?exporter=schema.org&persistentId=doi:10.5072/FK2/YD5QDG>;rel="describedby";type="application/ld+json",<https://demo.dataverse.org/api/datasets/export?exporter=ddi&persistentId=doi:10.5072/FK2/YD5QDG>;rel="describedby";type="application/xml",<https://demo.dataverse.org/api/datasets/export?exporter=dcterms&persistentId=doi:10.5072/FK2/YD5QDG>;rel="describedby";type="application/xml",<https://demo.dataverse.org/api/datasets/export?exporter=html&persistentId=doi:10.5072/FK2/YD5QDG>;rel="describedby";type="text/html",<https://demo.dataverse.org/api/datasets/export?exporter=dataverse_json&persistentId=doi:10.5072/FK2/YD5QDG>;rel="describedby";type="application/json",<https://demo.dataverse.org/api/datasets/export?exporter=oai_ddi&persistentId=doi:10.5072/FK2/YD5QDG>;rel="describedby";type="application/xml", <https://schema.org/AboutPage>;rel="type",<https://schema.org/Dataset>;rel="type", <http://creativecommons.org/publicdomain/zero/1.0>;rel="license", <https://demo.dataverse.org/api/datasets/:persistentId/versions/1.0/linkset?persistentId=doi:10.5072/FK2/YD5QDG> ; rel="linkset";type="application/linkset+json"``

The URL for linkset information is discoverable under the ``rel="linkset";type="application/linkset+json`` entry in the "Link" header, such as in the example above.
The URL for linkset information (described below) is discoverable under the ``rel="linkset";type="application/linkset+json`` entry in the "Link" header, such as in the example above.

Signposting Linkset API Endpoint
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The reponse includes a JSON object conforming to the `Signposting <https://signposting.org>`__ specification. As part of this conformance, unlike most Dataverse API responses, the output is not wrapped in a ``{"status":"OK","data":{`` object.
Signposting is not supported for draft dataset versions.

Like :ref:`get-export-formats`, this API can be used to get URLs to dataset metadata export formats, but with URLs for the dataset in question.

.. code-block:: bash

export SERVER_URL=https://demo.dataverse.org
Expand Down Expand Up @@ -4881,12 +4913,14 @@ The fully expanded example above (without environment variables) looks like this

curl "https://demo.dataverse.org/api/info/settings/:MaxEmbargoDurationInMonths"

Get Export Formats
~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. _get-export-formats:

Get Dataset Metadata Export Formats
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Get the available export formats, including custom formats.
Get the available dataset metadata export formats, including formats from external exporters (see :ref:`available-exporters`).

The response contains an object with available format names as keys, and as values an object with the following properties:
The response contains a JSON object with the available format names as keys (these can be passed to :ref:`export-dataset-metadata-api`), and values as objects with the following properties:

* ``displayName``
* ``mediaType``
Expand Down
2 changes: 2 additions & 0 deletions doc/sphinx-guides/source/user/dataset-management.rst
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,8 @@ Additional formats can be enabled. See :ref:`inventory-of-external-exporters` in

Each of these metadata exports contains the metadata of the most recently published version of the dataset.

For each dataset, links to each enabled metadata format are available programmatically via Signposting. For details, see :ref:`discovery-sign-posting` in the Admin Guide and :ref:`signposting-api` in the API Guide.

.. _adding-new-dataset:

Adding a New Dataset
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,11 @@ public Boolean isAvailableToUsers() {

@Override
public String getMediaType() {
return MediaType.APPLICATION_JSON;
/**
* Changed from "application/json" to "application/ld+json" because
* that's what Signposting expects.
*/
return "application/ld+json";
}

}
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ Two configurable options allow changing the limit for the number of authors or d

import edu.harvard.iq.dataverse.*;
import edu.harvard.iq.dataverse.dataset.DatasetUtil;
import edu.harvard.iq.dataverse.export.ExportService;
import jakarta.json.Json;
import jakarta.json.JsonArrayBuilder;
import jakarta.json.JsonObjectBuilder;
Expand All @@ -28,6 +29,8 @@ Two configurable options allow changing the limit for the number of authors or d
import java.util.logging.Logger;

import static edu.harvard.iq.dataverse.util.json.NullSafeJsonBuilder.jsonObjectBuilder;
import io.gdcc.spi.export.ExportException;
import io.gdcc.spi.export.Exporter;

public class SignpostingResources {
private static final Logger logger = Logger.getLogger(SignpostingResources.class.getCanonicalName());
Expand Down Expand Up @@ -72,8 +75,18 @@ public String getLinks() {
}

String describedby = "<" + ds.getGlobalId().asURL().toString() + ">;rel=\"describedby\"" + ";type=\"" + "application/vnd.citationstyles.csl+json\"";
describedby += ",<" + systemConfig.getDataverseSiteUrl() + "/api/datasets/export?exporter=schema.org&persistentId="
+ ds.getProtocol() + ":" + ds.getAuthority() + "/" + ds.getIdentifier() + ">;rel=\"describedby\"" + ";type=\"application/ld+json\"";
ExportService instance = ExportService.getInstance();
for (String[] labels : instance.getExportersLabels()) {
String formatName = labels[1];
Exporter exporter;
try {
exporter = ExportService.getInstance().getExporter(formatName);
describedby += ",<" + systemConfig.getDataverseSiteUrl() + "/api/datasets/export?exporter=" + formatName + "&persistentId="
+ ds.getProtocol() + ":" + ds.getAuthority() + "/" + ds.getIdentifier() + ">;rel=\"describedby\"" + ";type=\"" + exporter.getMediaType() + "\"";
Copy link
Member

@qqmyers qqmyers Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These (this and line 137) won't work for all permalinks since they don't necessarily have / as a separator. I think you can just ds.getGlobalId().asString() instead. For a real dataset, I don't think you can ever have a null GlobalId so not sure you even need to check for that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qqmyers thanks, in ca93d60 I corrected the two you found plus two more.

} catch (ExportException ex) {
logger.warning("Could not look up exporter based on " + formatName + ". Exception: " + ex);
}
}
valueList.add(describedby);

String type = "<https://schema.org/AboutPage>;rel=\"type\"";
Expand Down Expand Up @@ -112,15 +125,25 @@ public JsonArrayBuilder getJsonLinkset() {
)
);

mediaTypes.add(
jsonObjectBuilder().add(
"href",
systemConfig.getDataverseSiteUrl() + "/api/datasets/export?exporter=schema.org&persistentId=" + ds.getProtocol() + ":" + ds.getAuthority() + "/" + ds.getIdentifier()
).add(
"type",
"application/ld+json"
)
);
ExportService instance = ExportService.getInstance();
for (String[] labels : instance.getExportersLabels()) {
String formatName = labels[1];
Exporter exporter;
try {
exporter = ExportService.getInstance().getExporter(formatName);
mediaTypes.add(
jsonObjectBuilder().add(
"href",
systemConfig.getDataverseSiteUrl() + "/api/datasets/export?exporter=" + formatName + "&persistentId=" + ds.getProtocol() + ":" + ds.getAuthority() + "/" + ds.getIdentifier()
).add(
"type",
exporter.getMediaType()
)
);
} catch (ExportException ex) {
logger.warning("Could not look up exporter based on " + formatName + ". Exception: " + ex);
}
}
JsonArrayBuilder linksetJsonObj = Json.createArrayBuilder();

JsonObjectBuilder mandatory;
Expand Down
22 changes: 22 additions & 0 deletions src/test/java/edu/harvard/iq/dataverse/api/SignpostingIT.java
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,16 @@ public void testSignposting() {
Response getHtml = given().get(datasetLandingPage);

System.out.println("Link header: " + getHtml.getHeader("Link"));
if (false) {
// Split on commas to make the output more readable.
System.out.println("---");
String header = getHtml.getHeader("Link");
for (String string : header.split(",")) {
System.out.println(string + ",");
}
System.out.println("returning early...");
return;
}

getHtml.then().assertThat().statusCode(OK.getStatusCode());

Expand All @@ -67,6 +77,8 @@ public void testSignposting() {
assertTrue(linkHeader.contains(datasetPid));
assertTrue(linkHeader.contains("cite-as"));
assertTrue(linkHeader.contains("describedby"));
// Make sure we get more exporters besides just "schema.org".
assertTrue(linkHeader.contains("oai_datacite"));

Response headHtml = given().head(datasetLandingPage);

Expand All @@ -76,6 +88,7 @@ public void testSignposting() {

// Make sure there's Signposting stuff in the "Link" header such as
// the dataset PID, cite-as, etc.
// TODO: The comment above is a repeat and so are some of the assertions below. Consolidate?
linkHeader = getHtml.getHeader("Link");
assertTrue(linkHeader.contains(datasetPid));
assertTrue(linkHeader.contains("cite-as"));
Expand All @@ -90,8 +103,10 @@ public void testSignposting() {
System.out.println("Linkset URL: " + linksetUrl);

Response linksetResponse = given().accept(ContentType.JSON).get(linksetUrl);
linksetResponse.prettyPrint();

String responseString = linksetResponse.getBody().asString();
System.out.println("response string: " + responseString);

JsonObject data = JsonUtil.getJsonObject(responseString);
JsonObject lso = data.getJsonArray("linkset").getJsonObject(0);
Expand All @@ -107,6 +122,13 @@ public void testSignposting() {
Pattern exporterPattern = Pattern.compile("[<\\[][^()\\[\\]]*?exporter=schema.org[^()\\[\\]]*[>\\]]");
Matcher exporterMatcher = exporterPattern.matcher(linkHeader);
exporterMatcher.find();
// TODO: make an assertion
//assertTrue(exporterMatcher.find());

// Test another
Pattern exporterPattern2 = Pattern.compile("exporter=oai_datacite");
Matcher exporterMatcher2 = exporterPattern2.matcher(linkHeader);
assertTrue(exporterMatcher2.find());

Response exportDataset = UtilIT.exportDataset(datasetPid, "schema.org");
exportDataset.prettyPrint();
Expand Down
Loading