Skip to content

Commit

Permalink
Merge branch 'develop' into 6783-s3-tests #6783
Browse files Browse the repository at this point in the history
  • Loading branch information
pdurbin committed Dec 5, 2023
2 parents d4bab07 + b15b89f commit e51bc22
Show file tree
Hide file tree
Showing 47 changed files with 919 additions and 379 deletions.
3 changes: 3 additions & 0 deletions conf/solr/9.3.0/solrconfig.xml
Original file line number Diff line number Diff line change
Expand Up @@ -588,6 +588,7 @@
check for "Circuit Breakers tripped" in logs and the corresponding error message should tell
you what transpired (if the failure was caused by tripped circuit breakers).
-->

<!--
<str name="memEnabled">true</str>
<str name="memThreshold">75</str>
Expand All @@ -599,10 +600,12 @@
whether the circuit breaker is enabled and the average load over the last minute at which the
circuit breaker should start rejecting queries.
-->

<!--
<str name="cpuEnabled">true</str>
<str name="cpuThreshold">75</str>
-->

</circuitBreaker>

<!-- Request Dispatcher
Expand Down
5 changes: 5 additions & 0 deletions doc/release-notes/10093-signedUrl_improvements.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
A new version of the standard Dataverse Previewers from https://github/com/gdcc/dataverse-previewers is available. The new version supports the use of signedUrls rather than API keys when previewing restricted files (including files in draft dataset versions). Upgrading is highly recommended.

SignedUrls can now be used with PrivateUrl access tokens, which allows PrivateUrl users to view previewers that are configured to use SignedUrls. See #10093.

Launching a dataset-level configuration tool will automatically generate an API token when needed. This is consistent with how other types of tools work. See #10045.
1 change: 1 addition & 0 deletions doc/release-notes/10104-dataset-citation-deaccessioned.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
The getDatasetVersionCitation (/api/datasets/{id}/versions/{versionId}/citation) endpoint now accepts a new boolean optional query parameter "includeDeaccessioned", which, if enabled, causes the endpoint to consider deaccessioned versions when searching for versions to obtain the citation.
9 changes: 9 additions & 0 deletions doc/release-notes/9547-validation-for-geospatial-metadata.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
Validation has been added for the Geographic Bounding Box values in the Geospatial metadata block. This will prevent improperly defined bounding boxes from being created via the edit page or metadata imports. (issue 9547). This also fixes the issue where existing datasets with invalid geoboxes were quietly failing to get reindexed.

For the "upgrade" steps section:

Update Geospatial Metadata Block

- `wget https://github.com/IQSS/dataverse/releases/download/v6.1/geospatial.tsv`
- `curl http://localhost:8080/api/admin/datasetfield/load -H "Content-type: text/tab-separated-values" -X POST --upload-file @geospatial.tsv`

4 changes: 4 additions & 0 deletions doc/release-notes/9635-solr-improvements.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
- As of this release application-side support is added for the "circuit breaker" mechanism in Solr that makes it drop requests more gracefully when the search engine is experiencing load issues.

Please see the "Installing Solr" section of the Installation Prerequisites guide.

7 changes: 7 additions & 0 deletions doc/sphinx-guides/source/api/apps.rst
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,13 @@ This series of Python scripts offers a starting point for migrating datasets fro

https://github.com/scholarsportal/dataverse-migration-scripts

idsc.dataverse
~~~~~~~~~~~~~~

This module can, among others, help you migrate one dataverse to another. (see `migrate.md <https://github.com/iza-institute-of-labor-economics/idsc.dataverse/blob/main/migrate.md>`_)

https://github.com/iza-institute-of-labor-economics/idsc.dataverse

Java
----

Expand Down
5 changes: 5 additions & 0 deletions doc/sphinx-guides/source/api/auth.rst
Original file line number Diff line number Diff line change
Expand Up @@ -80,3 +80,8 @@ To test if bearer tokens are working, you can try something like the following (
export TOKEN=`curl -s -X POST --location "http://keycloak.mydomain.com:8090/realms/test/protocol/openid-connect/token" -H "Content-Type: application/x-www-form-urlencoded" -d "username=user&password=user&grant_type=password&client_id=test&client_secret=94XHrfNRwXsjqTqApRrwWmhDLDHpIYV8" | jq '.access_token' -r | tr -d "\n"`
curl -H "Authorization: Bearer $TOKEN" http://localhost:8080/api/users/:me
Signed URLs
-----------

See :ref:`signed-urls`.
11 changes: 9 additions & 2 deletions doc/sphinx-guides/source/api/changelog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,16 @@ API Changelog
:local:
:depth: 1

6.1
---

Changes
~~~~~~~
- **/api/datasets/{id}/versions/{versionId}/citation**: This endpoint now accepts a new boolean optional query parameter "includeDeaccessioned", which, if enabled, causes the endpoint to consider deaccessioned versions when searching for versions to obtain the citation. See :ref:`get-citation`.

6.0
-----
---

Changes
~~~~~~~
- **/api/access/datafile**: When a null or invalid API token is provided to download a public (non-restricted) file with this API call, it will result on a ``401`` error response. Previously, the download was allowed (``200`` response). Please note that we noticed this change sometime between 5.9 and 6.0. If you can help us pinpoint the exact version (or commit!), please get in touch.
- **/api/access/datafile**: When a null or invalid API token is provided to download a public (non-restricted) file with this API call, it will result on a ``401`` error response. Previously, the download was allowed (``200`` response). Please note that we noticed this change sometime between 5.9 and 6.0. If you can help us pinpoint the exact version (or commit!), please get in touch. See :doc:`dataaccess`.
2 changes: 2 additions & 0 deletions doc/sphinx-guides/source/api/client-libraries.rst
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,8 @@ There are multiple Python modules for interacting with Dataverse APIs.

`Pooch <https://github.com/fatiando/pooch>`_ is a Python library that allows library and application developers to download data. Among other features, it takes care of various protocols, caching in OS-specific locations, checksum verification and adds optional features like progress bars or log messages. Among other popular repositories, Pooch supports Dataverse in the sense that you can reference Dataverse-hosted datasets by just a DOI and Pooch will determine the data repository type, query the Dataverse API for contained files and checksums, giving you an easy interface to download them.

`idsc.dataverse <https://github.com/iza-institute-of-labor-economics/idsc.dataverse>`_ reads metadata and files of datasets from a dataverse dataverse.example1.com and writes them into ~/.idsc/dataverse/api/dataverse.example1.com organized in directories PID_type/prefix/suffix, where PID_type is one of: hdl, doi or ark. It can then ''export'' the local copy of the dataverse from ~/.idsc/dataverse/api/dataverse.example1.com to ~/.idsc/.cache/dataverse.example2.com so that one can upload them to dataverse.example2.com.

R
-

Expand Down
25 changes: 17 additions & 8 deletions doc/sphinx-guides/source/api/external-tools.rst
Original file line number Diff line number Diff line change
Expand Up @@ -160,17 +160,25 @@ Authorization Options

When called for datasets or data files that are not public (i.e. in a draft dataset or for a restricted file), external tools are allowed access via the user's credentials. This is accomplished by one of two mechanisms:

* Signed URLs (more secure, recommended)
.. _signed-urls:

- Configured via the ``allowedApiCalls`` section of the manifest. The tool will be provided with signed URLs allowing the specified access to the given dataset or datafile for the specified amount of time. The tool will not be able to access any other datasets or files the user may have access to and will not be able to make calls other than those specified.
- For tools invoked via a GET call, Dataverse will include a callback query parameter with a Base64 encoded value. The decoded value is a signed URL that can be called to retrieve a JSON response containing all of the queryParameters and allowedApiCalls specified in the manfiest.
- For tools invoked via POST, Dataverse will send a JSON body including the requested queryParameters and allowedApiCalls. Dataverse expects the response to the POST to indicate a redirect which Dataverse will use to open the tool.
Signed URLs
^^^^^^^^^^^

* API Token (deprecated, less secure, not recommended)
The signed URL mechanism is more secure than exposing API tokens and therefore recommended.

- Configured via the ``queryParameters`` by including an ``{apiToken}`` value. When this is present Dataverse will send the user's apiToken to the tool. With the user's API token, the tool can perform any action via the Dataverse API that the user could. External tools configured via this method should be assessed for their trustworthiness.
- For tools invoked via GET, this will be done via a query parameter in the request URL which could be cached in the browser's history. Dataverse expects the response to the POST to indicate a redirect which Dataverse will use to open the tool.
- For tools invoked via POST, Dataverse will send a JSON body including the apiToken.
- Configured via the ``allowedApiCalls`` section of the manifest. The tool will be provided with signed URLs allowing the specified access to the given dataset or datafile for the specified amount of time. The tool will not be able to access any other datasets or files the user may have access to and will not be able to make calls other than those specified.
- For tools invoked via a GET call, Dataverse will include a callback query parameter with a Base64 encoded value. The decoded value is a signed URL that can be called to retrieve a JSON response containing all of the queryParameters and allowedApiCalls specified in the manfiest.
- For tools invoked via POST, Dataverse will send a JSON body including the requested queryParameters and allowedApiCalls. Dataverse expects the response to the POST to indicate a redirect which Dataverse will use to open the tool.

API Token
^^^^^^^^^

The API token mechanism is deprecated. Because it is less secure than signed URLs, it is not recommended for new external tools.

- Configured via the ``queryParameters`` by including an ``{apiToken}`` value. When this is present Dataverse will send the user's apiToken to the tool. With the user's API token, the tool can perform any action via the Dataverse API that the user could. External tools configured via this method should be assessed for their trustworthiness.
- For tools invoked via GET, this will be done via a query parameter in the request URL which could be cached in the browser's history. Dataverse expects the response to the POST to indicate a redirect which Dataverse will use to open the tool.
- For tools invoked via POST, Dataverse will send a JSON body including the apiToken.

Internationalization of Your External Tool
++++++++++++++++++++++++++++++++++++++++++
Expand All @@ -187,6 +195,7 @@ Using Example Manifests to Get Started
++++++++++++++++++++++++++++++++++++++

Again, you can use :download:`fabulousFileTool.json <../_static/installation/files/root/external-tools/fabulousFileTool.json>` or :download:`dynamicDatasetTool.json <../_static/installation/files/root/external-tools/dynamicDatasetTool.json>` as a starting point for your own manifest file.
Additional working examples, including ones using :ref:`signed-urls`, are available at https://github.com/gdcc/dataverse-previewers .

Testing Your External Tool
--------------------------
Expand Down
12 changes: 12 additions & 0 deletions doc/sphinx-guides/source/api/native-api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2491,6 +2491,8 @@ Get Dataset By Private URL Token
curl "$SERVER_URL/api/datasets/privateUrlDatasetVersion/$PRIVATE_URL_TOKEN"
.. _get-citation:

Get Citation
~~~~~~~~~~~~

Expand All @@ -2502,6 +2504,16 @@ Get Citation
curl -H "Accept:application/json" "$SERVER_URL/api/datasets/:persistentId/versions/$VERSION/{version}/citation?persistentId=$PERSISTENT_IDENTIFIER"
By default, deaccessioned dataset versions are not included in the search when applying the :latest or :latest-published identifiers. Additionally, when filtering by a specific version tag, you will get a "not found" error if the version is deaccessioned and you do not enable the ``includeDeaccessioned`` option described below.

If you want to include deaccessioned dataset versions, you must set ``includeDeaccessioned`` query parameter to ``true``.

Usage example:

.. code-block:: bash
curl -H "Accept:application/json" "$SERVER_URL/api/datasets/:persistentId/versions/$VERSION/{version}/citation?persistentId=$PERSISTENT_IDENTIFIER&includeDeaccessioned=true"
Get Citation by Private URL Token
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down
37 changes: 36 additions & 1 deletion doc/sphinx-guides/source/installation/config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -539,6 +539,27 @@ If you wish to change which store is used by default, you'll need to delete the
It is also possible to set maximum file upload size limits per store. See the :ref:`:MaxFileUploadSizeInBytes` setting below.

.. _labels-file-stores:

Labels for File Stores
++++++++++++++++++++++

If you find yourself adding many file stores with various configurations such as per-file limits and direct upload, you might find it helpful to make the label descriptive.

For example, instead of simply labeling an S3 store as "S3"...

.. code-block:: none
./asadmin create-jvm-options "\-Ddataverse.files.s3xl.label=S3"
... you might want to include some extra information such as the example below.

.. code-block:: none
./asadmin create-jvm-options "\-Ddataverse.files.s3xl.label=S3XL, Filesize limit: 100GB, direct-upload"
Please keep in mind that the UI will only show so many characters, so labels are best kept short.

.. _storage-files-dir:

File Storage
Expand Down Expand Up @@ -2868,7 +2889,6 @@ To enable setting file-level PIDs per collection::

When :AllowEnablingFilePIDsPerCollection is true, setting File PIDs to be enabled/disabled for a given collection can be done via the Native API - see :ref:`collection-attributes-api` in the Native API Guide.


.. _:IndependentHandleService:

:IndependentHandleService
Expand Down Expand Up @@ -3109,6 +3129,21 @@ If ``:SolrFullTextIndexing`` is set to true, the content of files of any size wi

``curl -X PUT -d 314572800 http://localhost:8080/api/admin/settings/:SolrMaxFileSizeForFullTextIndexing``


.. _:DisableSolrFacets:

:DisableSolrFacets
++++++++++++++++++

Setting this to ``true`` will make the collection ("dataverse") page start showing search results without the usual search facets on the left side of the page. A message will be shown in that column informing the users that facets are temporarily unavailable. Generating the facets is more resource-intensive for Solr than the main search results themselves, so applying this measure will significantly reduce the load on the search engine when its performance becomes an issue.

This setting can be used in combination with the "circuit breaker" mechanism on the Solr side (see the "Installing Solr" section of the Installation Prerequisites guide). An admin can choose to enable it, or even create an automated system for enabling it in response to Solr beginning to drop incoming requests with the HTTP code 503.

To enable the setting::

curl -X PUT -d true "http://localhost:8080/api/admin/settings/:DisableSolrFacets"


.. _:SignUpUrl:

:SignUpUrl
Expand Down
19 changes: 19 additions & 0 deletions doc/sphinx-guides/source/installation/prerequisites.rst
Original file line number Diff line number Diff line change
Expand Up @@ -211,6 +211,25 @@ Finally, you need to tell Solr to create the core "collection1" on startup::

echo "name=collection1" > /usr/local/solr/solr-9.3.0/server/solr/collection1/core.properties

Dataverse collection ("dataverse") page uses Solr very heavily. On a busy instance this may cause the search engine to become the performance bottleneck, making these pages take increasingly longer to load, potentially affecting the overall performance of the application and/or causing Solr itself to crash. If this is observed on your instance, we recommend uncommenting the following lines in the ``<circuitBreaker ...>`` section of the ``solrconfig.xml`` file::

<str name="memEnabled">true</str>
<str name="memThreshold">75</str>

and::

<str name="cpuEnabled">true</str>
<str name="cpuThreshold">75</str>

This will activate Solr "circuit breaker" mechanisms that make it start dropping incoming requests with the HTTP code 503 when it starts experiencing load issues. As of Dataverse 6.1, the collection page will recognize this condition and display a customizeable message to the users informing them that the search engine is unavailable because of heavy load, with the assumption that the condition is transitive and suggesting that they try again later. This is still an inconvenience to the users, but still a more graceful handling of the problem, rather than letting the pages time out or causing crashes. You may need to experiment and adjust the threshold values defined in the lines above.

If this becomes a common issue, another temporary workaround an admin may choose to use is to enable the following setting::

curl -X PUT -d true "http://localhost:8080/api/admin/settings/:DisableSolrFacets"

This will make the collection page show the search results without the usual search facets on the left side of the page. Another customizeable message will be shown in that column informing the users that facets are temporarily unavailable. Generating these facets is more resource-intensive for Solr than the main search results themselves, so applying this measure will significantly reduce the load on the search engine.


Solr Init Script
================

Expand Down
8 changes: 4 additions & 4 deletions scripts/api/data/metadatablocks/geospatial.tsv
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,10 @@
otherGeographicCoverage Other Other information on the geographic coverage of the data. text 4 #VALUE, FALSE FALSE FALSE TRUE FALSE FALSE geographicCoverage geospatial
geographicUnit Geographic Unit Lowest level of geographic aggregation covered by the Dataset, e.g., village, county, region. text 5 TRUE FALSE TRUE TRUE FALSE FALSE geospatial
geographicBoundingBox Geographic Bounding Box The fundamental geometric description for any Dataset that models geography is the geographic bounding box. It describes the minimum box, defined by west and east longitudes and north and south latitudes, which includes the largest geographic extent of the Dataset's geographic coverage. This element is used in the first pass of a coordinate-based search. Inclusion of this element in the codebook is recommended, but is required if the bound polygon box is included. none 6 FALSE FALSE TRUE FALSE FALSE FALSE geospatial
westLongitude West Longitude Westernmost coordinate delimiting the geographic extent of the Dataset. A valid range of values, expressed in decimal degrees, is -180,0 <= West Bounding Longitude Value <= 180,0. text 7 FALSE FALSE FALSE FALSE FALSE FALSE geographicBoundingBox geospatial
eastLongitude East Longitude Easternmost coordinate delimiting the geographic extent of the Dataset. A valid range of values, expressed in decimal degrees, is -180,0 <= East Bounding Longitude Value <= 180,0. text 8 FALSE FALSE FALSE FALSE FALSE FALSE geographicBoundingBox geospatial
northLongitude North Latitude Northernmost coordinate delimiting the geographic extent of the Dataset. A valid range of values, expressed in decimal degrees, is -90,0 <= North Bounding Latitude Value <= 90,0. text 9 FALSE FALSE FALSE FALSE FALSE FALSE geographicBoundingBox geospatial
southLongitude South Latitude Southernmost coordinate delimiting the geographic extent of the Dataset. A valid range of values, expressed in decimal degrees, is -90,0 <= South Bounding Latitude Value <= 90,0. text 10 FALSE FALSE FALSE FALSE FALSE FALSE geographicBoundingBox geospatial
westLongitude Westernmost (Left) Longitude Westernmost coordinate delimiting the geographic extent of the Dataset. A valid range of values, expressed in decimal degrees, is -180,0 <= West Bounding Longitude Value <= 180,0. text 7 FALSE FALSE FALSE FALSE FALSE FALSE geographicBoundingBox geospatial
eastLongitude Easternmost (Right) Longitude Easternmost coordinate delimiting the geographic extent of the Dataset. A valid range of values, expressed in decimal degrees, is -180,0 <= East Bounding Longitude Value <= 180,0. text 8 FALSE FALSE FALSE FALSE FALSE FALSE geographicBoundingBox geospatial
northLongitude Northernmost (Top) Latitude Northernmost coordinate delimiting the geographic extent of the Dataset. A valid range of values, expressed in decimal degrees, is -90,0 <= North Bounding Latitude Value <= 90,0. text 9 FALSE FALSE FALSE FALSE FALSE FALSE geographicBoundingBox geospatial
southLongitude Southernmost (Bottom) Latitude Southernmost coordinate delimiting the geographic extent of the Dataset. A valid range of values, expressed in decimal degrees, is -90,0 <= South Bounding Latitude Value <= 90,0. text 10 FALSE FALSE FALSE FALSE FALSE FALSE geographicBoundingBox geospatial
#controlledVocabulary DatasetField Value identifier displayOrder
country Afghanistan 0
country Albania 1
Expand Down
Loading

0 comments on commit e51bc22

Please sign in to comment.