-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
views: FAIR signposting level 1 support (HTTP Link headers) #2938
base: master
Are you sure you want to change the base?
Changes from 1 commit
0042190
15672de
17401b6
2e533ed
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -10,19 +10,21 @@ | |
"""Routes for record-related pages provided by Invenio-App-RDM.""" | ||
|
||
from functools import wraps | ||
from itertools import islice | ||
|
||
from flask import g, make_response, redirect, request, session, url_for | ||
from flask import current_app, g, make_response, redirect, request, session, url_for | ||
from flask_login import login_required | ||
from invenio_communities.communities.resources.serializer import ( | ||
UICommunityJSONSerializer, | ||
) | ||
from invenio_communities.proxies import current_communities | ||
from invenio_pidstore.errors import PIDDoesNotExistError | ||
from invenio_rdm_records.proxies import current_rdm_records | ||
from invenio_rdm_records.resources.serializers.utils import get_vocabulary_props | ||
from invenio_records_resources.services.errors import PermissionDeniedError | ||
from sqlalchemy.orm.exc import NoResultFound | ||
|
||
from invenio_app_rdm.urls import record_url_for | ||
from invenio_app_rdm.urls import download_url_for, export_url_for, record_url_for | ||
|
||
|
||
def service(): | ||
|
@@ -365,20 +367,169 @@ def view(**kwargs): | |
return view | ||
|
||
|
||
def add_signposting(f): | ||
"""Add signposting link to view's response headers.""" | ||
def _get_header(rel, value, link_type=None): | ||
header = f'<{value}> ; rel="{rel}"' | ||
if link_type: | ||
header += f' ; type="{link_type}"' | ||
return header | ||
|
||
|
||
def _get_signposting_cite_as(record): | ||
"""Release self url points to RDM record. | ||
|
||
It points to DataCite URL if the integration is enabled, otherwise it points to the HTML URL. | ||
""" | ||
doi_url = record["links"].get("doi") | ||
html_url = record["links"]["self_html"] | ||
return _get_header("cite-as", doi_url or html_url) | ||
|
||
|
||
def _get_signposting_types(record): | ||
resource_type = record["metadata"]["resource_type"] | ||
props = get_vocabulary_props( | ||
"resourcetypes", | ||
[ | ||
"props.schema.org", | ||
], | ||
resource_type["id"], | ||
) | ||
url_schema_org = props.get("schema.org") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure if there's a better way to do this lookup. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Perhaps just check that it is cached so we don't query db on every landing page request There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. From what I see these are indeed cached here, which is also mentioned in |
||
return [ | ||
_get_header("type", url_schema_org), | ||
_get_header("type", "https://schema.org/AboutPage"), | ||
] | ||
|
||
|
||
def _get_signposting_authors(record): | ||
authors = [] | ||
# Limit authors to the first 10. | ||
for creator in islice(record["metadata"]["creators"], 0, 10): | ||
for identifier in creator["person_or_org"].get("identifiers", []): | ||
if identifier["scheme"] == "orcid": | ||
authors.append( | ||
_get_header( | ||
"author", "https://orcid.org/" + identifier["identifier"] | ||
) | ||
) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we support other schemes like ROR, etc? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It might be that the safer option would be to use something like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm now relying on invenio_rdm_records/resources/serializers/signposting/schema.py's serialize_author which picks the first linkable ID. |
||
return authors | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Lars suggested that we might choose to not include authors at all since the list might be long and the full list can be found in the linkset. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe we could apply some sensible limit? E.g. if less than 50 authors, include, otherwise don't include at all and basically have people rely on the explicit authors linkset? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm now relying on invenio_rdm_records/resources/serializers/signposting/schema.py's serialize_author which serializes all the authors. |
||
|
||
|
||
def _get_signposting_describedbys(pid_value): | ||
describedbys = [] | ||
for export_format, val in current_app.config.get( | ||
"APP_RDM_RECORD_EXPORTERS", {} | ||
).items(): | ||
url = export_url_for(pid_value=pid_value, export_format=export_format) | ||
content_type = val["content-type"] | ||
describedbys.append(_get_header("describedby", url, content_type)) | ||
return describedbys | ||
|
||
|
||
def _get_signposting_licenses(record): | ||
licenses = [] | ||
for right in record["metadata"].get("rights", []): | ||
# First try to get `props.url` from the standard licenses, | ||
# then try to get the optional `link` from the custom license. | ||
url = right.get("props", {}).get("url") or right.get("link") | ||
if url: | ||
licenses.append(_get_header("license", url)) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The FAIR Signposting docs recommends to use SPDX license identifier (e.g. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Unfortunately our IDs are lower-cased (e.g. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ouch, I tried in the browser and copy-pasting URLs for some reason kept the original case... Ok, this is a bummer, I think we'll have to add the original I think it would be fine to shelve this and just use the |
||
return licenses | ||
|
||
|
||
def _get_signposting_items(files, pid_value): | ||
items = [] | ||
# Checking if the user has access to the files. | ||
if files: | ||
# Limiting the iteration to 100 files maximum. | ||
for file in islice(files.to_dict()["entries"], 0, 100): | ||
url = download_url_for(pid_value=pid_value, filename=file["key"]) | ||
items.append(_get_header("item", url, file["mimetype"])) | ||
return items | ||
|
||
|
||
def _get_signposting_collection(pid_value): | ||
ui_url = record_url_for(pid_value=pid_value) | ||
return _get_header("collection", ui_url, "text/html") | ||
|
||
|
||
def _get_signposting_describes(pid_value): | ||
ui_url = record_url_for(pid_value=pid_value) | ||
return _get_header("describes", ui_url, "text/html") | ||
|
||
|
||
def _get_signposting_linkset(pid_value): | ||
api_url = record_url_for(_app="api", pid_value=pid_value) | ||
return _get_header("linkset", api_url, "application/linkset+json") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note: this is required for level 2 support and was already added in a previous pull request. |
||
|
||
|
||
def add_signposting_landing_page(f): | ||
"""Add signposting links to the landing page view's response headers.""" | ||
|
||
@wraps(f) | ||
def view(*args, **kwargs): | ||
response = make_response(f(*args, **kwargs)) | ||
|
||
# Relies on other decorators having operated before it | ||
pid_value = kwargs["pid_value"] | ||
signposting_link = record_url_for(_app="api", pid_value=pid_value) | ||
record = kwargs["record"] | ||
files = kwargs["files"] | ||
|
||
signposting_headers = [ | ||
_get_signposting_cite_as(record), | ||
*_get_signposting_types(record), | ||
*_get_signposting_authors(record), | ||
*_get_signposting_describedbys(pid_value), | ||
*_get_signposting_licenses(record), | ||
*_get_signposting_items(files, pid_value), | ||
_get_signposting_linkset(pid_value), | ||
] | ||
|
||
response.headers["Link"] = " , ".join(signposting_headers) | ||
|
||
return response | ||
|
||
return view | ||
|
||
|
||
def add_signposting_content_resources(f): | ||
"""Add signposting links to the content resources view's response headers.""" | ||
|
||
@wraps(f) | ||
def view(*args, **kwargs): | ||
response = make_response(f(*args, **kwargs)) | ||
|
||
# Relies on other decorators having operated before it | ||
pid_value = kwargs["pid_value"] | ||
|
||
signposting_headers = [ | ||
_get_signposting_collection(pid_value), | ||
_get_signposting_linkset(pid_value), | ||
] | ||
|
||
response.headers["Link"] = " , ".join(signposting_headers) | ||
|
||
return response | ||
|
||
return view | ||
|
||
|
||
def add_signposting_metadata_resources(f): | ||
"""Add signposting links to the metadata resources view's response headers.""" | ||
|
||
@wraps(f) | ||
def view(*args, **kwargs): | ||
response = make_response(f(*args, **kwargs)) | ||
|
||
# Relies on other decorators having operated before it | ||
pid_value = kwargs["pid_value"] | ||
|
||
signposting_headers = [ | ||
_get_signposting_describes(pid_value), | ||
_get_signposting_linkset(pid_value), | ||
] | ||
|
||
response.headers["Link"] = " , ".join(signposting_headers) | ||
|
||
response.headers["Link"] = ( | ||
f'<{signposting_link}> ; rel="linkset" ; type="application/linkset+json"' # fmt: skip | ||
) | ||
return response | ||
|
||
return view | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -11,23 +11,58 @@ | |
""" | ||
|
||
|
||
def test_link_in_landing_page_response_headers(running_app, client, record): | ||
res = client.head(f"/records/{record.id}") | ||
def test_link_in_landing_page_response_headers(running_app, client, record_with_file): | ||
ui_url = f"https://127.0.0.1:5000/records/{record_with_file.id}" | ||
api_url = f"https://127.0.0.1:5000/api/records/{record_with_file.id}" | ||
filename = "article.txt" | ||
|
||
res = client.head(f"/records/{record_with_file.id}") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. question/comment: I think the IMHO, it's ok to keep as is, since none of the logic done for generating the header links is that much more complex or adds that big of an overhead compared to the rest of the GET response. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The
|
||
|
||
assert ( | ||
res.headers["Link"] | ||
== f'<https://127.0.0.1:5000/api/records/{record.id}> ; rel="linkset" ; type="application/linkset+json"' # noqa | ||
) | ||
assert res.headers["Link"].split(" , ") == [ | ||
f'<{ui_url}> ; rel="cite-as"', | ||
'<https://schema.org/Photograph> ; rel="type"', | ||
'<https://schema.org/AboutPage> ; rel="type"', | ||
# The test record does not have an author with an identifier. | ||
f'<{ui_url}/export/json> ; rel="describedby" ; type="application/json"', | ||
f'<{ui_url}/export/json-ld> ; rel="describedby" ; type="application/ld+json"', | ||
f'<{ui_url}/export/csl> ; rel="describedby" ; type="application/vnd.citationstyles.csl+json"', | ||
f'<{ui_url}/export/datacite-json> ; rel="describedby" ; type="application/vnd.datacite.datacite+json"', | ||
f'<{ui_url}/export/datacite-xml> ; rel="describedby" ; type="application/vnd.datacite.datacite+xml"', | ||
f'<{ui_url}/export/dublincore> ; rel="describedby" ; type="application/x-dc+xml"', | ||
f'<{ui_url}/export/marcxml> ; rel="describedby" ; type="application/marcxml+xml"', | ||
f'<{ui_url}/export/bibtex> ; rel="describedby" ; type="application/x-bibtex"', | ||
f'<{ui_url}/export/geojson> ; rel="describedby" ; type="application/vnd.geo+json"', | ||
f'<{ui_url}/export/dcat-ap> ; rel="describedby" ; type="application/dcat+xml"', | ||
f'<{ui_url}/export/codemeta> ; rel="describedby" ; type="application/ld+json"', | ||
f'<{ui_url}/export/cff> ; rel="describedby" ; type="application/x-yaml"', | ||
# The test record does not have a license. | ||
f'<{ui_url}/files/{filename}> ; rel="item" ; type="text/plain"', | ||
f'<{api_url}> ; rel="linkset" ; type="application/linkset+json"', | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The logic for the landing page is implemented in |
||
] | ||
|
||
|
||
def test_link_in_content_resource_response_headers( | ||
running_app, client, record_with_file | ||
): | ||
ui_url = f"https://127.0.0.1:5000/records/{record_with_file.id}" | ||
api_url = f"https://127.0.0.1:5000/api/records/{record_with_file.id}" | ||
filename = "article.txt" | ||
|
||
res = client.head(f"/records/{record_with_file.id}/files/{filename}") | ||
|
||
assert ( | ||
res.headers["Link"] | ||
== f'<https://127.0.0.1:5000/api/records/{record_with_file.id}> ; rel="linkset" ; type="application/linkset+json"' # noqa | ||
) | ||
assert res.headers["Link"].split(" , ") == [ | ||
f'<{ui_url}> ; rel="collection" ; type="text/html"', | ||
f'<{api_url}> ; rel="linkset" ; type="application/linkset+json"', | ||
] | ||
|
||
|
||
def test_link_in_metadata_resource_response_headers(running_app, client, record): | ||
ui_url = f"https://127.0.0.1:5000/records/{record.id}" | ||
api_url = f"https://127.0.0.1:5000/api/records/{record.id}" | ||
|
||
res = client.head(f"/records/{record.id}/export/bibtex") | ||
|
||
assert res.headers["Link"].split(" , ") == [ | ||
f'<{ui_url}> ; rel="describes" ; type="text/html"', | ||
f'<{api_url}> ; rel="linkset" ; type="application/linkset+json"', | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is quite a lot of methods added to
decorators.py
, should it be moved to a signposting-specific file?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Agree, I thought the was already some signposting-related directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[ ] Move the code to a signposting-related file or directory.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is now much less code in
decorators.py
now that I rely oninvenio_rdm_records/resources/serializers/signposting/schema.py
.