Merge branch 'IQSS:develop' into 9002_allow_direct_upload_setting

ErykKul · Sep 27, 2023 · 06b7255 · 06b7255
2 parents 81be260 + 5fc7b30
commit 06b7255
Show file tree

Hide file tree

Showing 39 changed files with 827 additions and 72 deletions.
diff --git a/conf/solr/9.3.0/schema.xml b/conf/solr/9.3.0/schema.xml
@@ -246,7 +246,7 @@
     <!-- SCHEMA-FIELDS::BEGIN -->
     <field name="accessToSources" type="text_en" multiValued="false" stored="true" indexed="true"/>
     <field name="actionsToMinimizeLoss" type="text_en" multiValued="false" stored="true" indexed="true"/>
-    <field name="alternativeTitle" type="text_en" multiValued="false" stored="true" indexed="true"/>
+    <field name="alternativeTitle" type="text_en" multiValued="true" stored="true" indexed="true"/>
     <field name="alternativeURL" type="text_en" multiValued="false" stored="true" indexed="true"/>
     <field name="astroFacility" type="text_en" multiValued="true" stored="true" indexed="true"/>
     <field name="astroInstrument" type="text_en" multiValued="true" stored="true" indexed="true"/>

diff --git a/doc/release-notes/9428-alternative-title.md b/doc/release-notes/9428-alternative-title.md
@@ -0,0 +1,9 @@
+Alternative Title is made repeatable. 
+- One will need to update database with updated citation block.
+`curl http://localhost:8080/api/admin/datasetfield/load -H "Content-type: text/tab-separated-values" -X POST --upload-file scripts/api/data/metadatablocks/citation.tsv`
+- One will also need to update solr schema:
+Change in "alternativeTitle" field  multiValued="true" in `/usr/local/solr/solr-8.11.1/server/solr/collection1/conf/schema.xml` 
+Reload solr schema: `curl "http://localhost:8983/solr/admin/cores?action=RELOAD&core=collection1"`
+
+Since Alternative Title is repeatable now, old json apis would not be compatable with a new version since value of alternative title has changed from simple string to an array.
+For example, instead "value": "Alternative Title", the value canbe "value": ["Alternative Title1", "Alternative Title2"]  
diff --git a/doc/release-notes/9692-files-api-extension.md b/doc/release-notes/9692-files-api-extension.md
@@ -0,0 +1,7 @@
+The following API endpoints have been added:
+
+- /api/files/{id}/downloadCount
+- /api/files/{id}/dataTables
+- /access/datafile/{id}/userPermissions
+
+The getVersionFiles endpoint (/api/datasets/{id}/versions/{versionId}/files) has been extended to support pagination and ordering
diff --git a/doc/release-notes/9859-ORE and Bag updates.md b/doc/release-notes/9859-ORE and Bag updates.md
@@ -0,0 +1,14 @@
+Dataverse's OAI_ORE Metadata Export format and archival BagIT exports 
+(which include the OAI-ORE metadata export file) have been updated to include 
+information about the dataset version state, e.g. RELEASED or DEACCESSIONED 
+and to indicate which version of Dataverse was used to create the archival Bag.
+As part of the latter, the current OAI_ORE Metadata format has been given a 1.0.0 
+version designation and it is expected that any future changes to the OAI_ORE export
+format will result in a version change and that tools such as DVUploader that can
+recreate datasets from archival Bags will start indicating which version(s) of the 
+OAI_ORE format they can read.
+
+Dataverse installations that have been using archival Bags may wish to update any
+existing archival Bags they have, e.g. by deleting existing Bags and using the Dataverse
+[archival Bag export API](https://guides.dataverse.org/en/latest/installation/config.html#bagit-export-api-calls)
+to generate updated versions.
diff --git a/doc/sphinx-guides/source/admin/integrations.rst b/doc/sphinx-guides/source/admin/integrations.rst
@@ -217,7 +217,14 @@ Sponsored by the `Ontario Council of University Libraries (OCUL) <https://ocul.o
 RDA BagIt (BagPack) Archiving
 +++++++++++++++++++++++++++++
 
-A Dataverse installation can be configured to submit a copy of published Datasets, packaged as `Research Data Alliance conformant <https://www.rd-alliance.org/system/files/Research%20Data%20Repository%20Interoperability%20WG%20-%20Final%20Recommendations_reviewed_0.pdf>`_ zipped `BagIt <https://tools.ietf.org/html/draft-kunze-bagit-17>`_ bags to the `Chronopolis <https://libraries.ucsd.edu/chronopolis/>`_ via `DuraCloud <https://duraspace.org/duracloud/>`_, to a local file system, or to `Google Cloud Storage <https://cloud.google.com/storage>`_.
+A Dataverse installation can be configured to submit a copy of published Dataset versions, packaged as `Research Data Alliance conformant <https://www.rd-alliance.org/system/files/Research%20Data%20Repository%20Interoperability%20WG%20-%20Final%20Recommendations_reviewed_0.pdf>`_ zipped `BagIt <https://tools.ietf.org/html/draft-kunze-bagit-17>`_ bags to `Chronopolis <https://libraries.ucsd.edu/chronopolis/>`_ via `DuraCloud <https://duraspace.org/duracloud/>`_, a local file system, any S3 store, or to `Google Cloud Storage <https://cloud.google.com/storage>`_.
+Submission can be automated to occur upon publication, or can be done periodically (via external scripting).
+The archival status of each Dataset version can be seen in the Dataset page version table and queried via API.
+
+The archival Bags include all of the files and metadata in a given dataset version and are sufficient to recreate the dataset, e.g. in a new Dataverse instance, or potentially in another RDA-conformant repository.
+Specifically, the archival Bags include an OAI-ORE Map serialized as JSON-LD that describe the dataset and it's files, as well as information about the version of Dataverse used to export the archival Bag.
+
+The `DVUploader <https://github.com/GlobalDataverseCommunityConsortium/dataverse-uploader>`_ includes functionality to recreate a Dataset from an archival Bag produced by Dataverse (using the Dataverse API to do so).
 
 For details on how to configure this integration, see :ref:`BagIt Export` in the :doc:`/installation/config` section of the Installation Guide.
 

diff --git a/doc/sphinx-guides/source/api/client-libraries.rst b/doc/sphinx-guides/source/api/client-libraries.rst
@@ -52,8 +52,12 @@ There are multiple Python modules for interacting with Dataverse APIs.
 
 `EasyDataverse <https://github.com/gdcc/easyDataverse>`_ is a Python library designed to simplify the management of Dataverse datasets in an object-oriented way, giving users the ability to upload, download, and update datasets with ease. By utilizing metadata block configurations, EasyDataverse automatically generates Python objects that contain all the necessary details required to create the native Dataverse JSON format used to create or edit datasets. Adding files and directories is also possible with EasyDataverse and requires no additional API calls. This library is particularly well-suited for client applications such as workflows and scripts as it minimizes technical complexities and facilitates swift development.
 
+`python-dvuploader <https://github.com/gdcc/python-dvuploader>`_ implements Jim Myers' excellent `dv-uploader <https://github.com/GlobalDataverseCommunityConsortium/dataverse-uploader>`_ as a Python module. It offers parallel direct uploads to Dataverse backend storage, streams files directly instead of buffering them in memory, and supports multi-part uploads, chunking data accordingly.
+
 `pyDataverse <https://github.com/gdcc/pyDataverse>`_ primarily allows developers to manage Dataverse collections, datasets and datafiles. Its intention is to help with data migrations and DevOps activities such as testing and configuration management. The module is developed by `Stefan Kasberger <http://stefankasberger.at>`_ from `AUSSDA - The Austrian Social Science Data Archive <https://aussda.at>`_.  
 
+`UBC's Dataverse Utilities <https://ubc-library-rc.github.io/dataverse_utils/>`_ are a set of Python console utilities which allow one to upload datasets from a tab-separated-value spreadsheet, bulk release multiple datasets, bulk delete unpublished datasets, quickly duplicate records. replace licenses, and more. For additional information see their `PyPi page <https://pypi.org/project/dataverse-utils/>`_.
+
 `dataverse-client-python <https://github.com/IQSS/dataverse-client-python>`_ had its initial release in 2015. `Robert Liebowitz <https://github.com/rliebz>`_ created this library while at the `Center for Open Science (COS) <https://centerforopenscience.org>`_ and the COS uses it to integrate the `Open Science Framework (OSF) <https://osf.io>`_ with Dataverse installations via an add-on which itself is open source and listed on the :doc:`/api/apps` page.
 
 `Pooch <https://github.com/fatiando/pooch>`_ is a Python library that allows library and application developers to download data. Among other features, it takes care of various protocols, caching in OS-specific locations, checksum verification and adds optional features like progress bars or log messages. Among other popular repositories, Pooch supports Dataverse in the sense that you can reference Dataverse-hosted datasets by just a DOI and Pooch will determine the data repository type, query the Dataverse API for contained files and checksums, giving you an easy interface to download them.

diff --git a/doc/sphinx-guides/source/api/dataaccess.rst b/doc/sphinx-guides/source/api/dataaccess.rst
@@ -83,7 +83,7 @@ Basic access URI:
 
 ``/api/access/datafile/$id``
 
-.. note:: Files can be accessed using persistent identifiers. This is done by passing the constant ``:persistentId`` where the numeric id of the file is expected, and then passing the actual persistent id as a query parameter with the name ``persistentId``.
+.. note:: Files can be accessed using persistent identifiers. This is done by passing the constant ``:persistentId`` where the numeric id of the file is expected, and then passing the actual persistent id as a query parameter with the name ``persistentId``. However, this file access method is only effective when the FilePIDsEnabled option is enabled, which can be authorized by the admin. For further information, refer to :ref:`:FilePIDsEnabled`. 
 
   Example: Getting the file whose DOI is *10.5072/FK2/J8SJZB* ::
 
@@ -403,3 +403,19 @@ This method returns a list of Authenticated Users who have requested access to t
 A curl example using an ``id``::
 
     curl -H "X-Dataverse-key:$API_TOKEN" -X GET http://$SERVER/api/access/datafile/{id}/listRequests
+
+Get User Permissions on a File:
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+``/api/access/datafile/{id}/userPermissions``
+
+This method returns the permissions that the calling user has on a particular file.
+
+In particular, the user permissions that this method checks, returned as booleans, are the following:
+
+* Can download the file
+* Can edit the file owner dataset
+
+A curl example using an ``id``::
+
+    curl -H "X-Dataverse-key:$API_TOKEN" -X GET "http://$SERVER/api/access/datafile/{id}/userPermissions"
diff --git a/doc/sphinx-guides/source/api/metrics.rst b/doc/sphinx-guides/source/api/metrics.rst
@@ -163,3 +163,10 @@ The following table lists the available metrics endpoints (not including the Mak
     /api/info/metrics/uniquefiledownloads/toMonth/{yyyy-MM},"count by id, pid","json, csv",collection subtree,published,y,cumulative up to month specified,unique download counts per file id to the specified month. PIDs are also included in output if they exist
     /api/info/metrics/tree,"id, ownerId, alias, depth, name, children",json,collection subtree,published,y,"tree of dataverses starting at the root or a specified parentAlias with their id, owner id, alias, name, a computed depth, and array of children dataverses","underlying code can also include draft dataverses, this is not currently accessible via api, depth starts at 0"
     /api/info/metrics/tree/toMonth/{yyyy-MM},"id, ownerId, alias, depth, name, children",json,collection subtree,published,y,"tree of dataverses in existence as of specified date starting at the root or a specified parentAlias with their id, owner id, alias, name, a computed depth, and array of children dataverses","underlying code can also include draft dataverses, this is not currently accessible via api, depth starts at 0"
+
+Related API Endpoints
+---------------------
+
+The following endpoints are not under the metrics namespace but also return counts:
+
+- :ref:`file-download-count` 
diff --git a/doc/sphinx-guides/source/api/native-api.rst b/doc/sphinx-guides/source/api/native-api.rst
@@ -958,6 +958,29 @@ The fully expanded example above (without environment variables) looks like this
  
   curl "https://demo.dataverse.org/api/datasets/24/versions/1.0/files"
 
+This endpoint supports optional pagination, through the ``limit`` and ``offset`` query params:
+
+.. code-block:: bash
+
+  curl "https://demo.dataverse.org/api/datasets/24/versions/1.0/files?limit=10&offset=20"
+
+Ordering criteria for sorting the results is also optionally supported. In particular, by the following possible values:
+
+* ``NameAZ`` (Default)
+* ``NameZA``
+* ``Newest``
+* ``Oldest``
+* ``Size``
+* ``Type``
+
+Please note that these values are case sensitive and must be correctly typed for the endpoint to recognize them.
+
+Usage example:
+
+.. code-block:: bash
+
+  curl "https://demo.dataverse.org/api/datasets/24/versions/1.0/files?orderCriteria=Newest"
+
 View Dataset Files and Folders as a Directory Index
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
@@ -2088,10 +2111,12 @@ The API call requires a Json body that includes the list of the fileIds that the
   curl -H "X-Dataverse-key: $API_TOKEN" -H "Content-Type:application/json" "$SERVER_URL/api/datasets/:persistentId/files/actions/:unset-embargo?persistentId=$PERSISTENT_IDENTIFIER" -d "$JSON"
   
   
+.. _Archival Status API:
+
 Get the Archival Status of a Dataset By Version
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-Archiving is an optional feature that may be configured for a Dataverse installation. When that is enabled, this API call be used to retrieve the status. Note that this requires "superuser" credentials.
+Archival :ref:`BagIt Export` is an optional feature that may be configured for a Dataverse installation. When that is enabled, this API call be used to retrieve the status. Note that this requires "superuser" credentials.
 
 ``GET /api/datasets/$dataset-id/$version/archivalStatus`` returns the archival status of the specified dataset version.
 
@@ -2702,6 +2727,85 @@ The fully expanded example above (without environment variables) looks like this
 
 Note: The ``id`` returned in the json response is the id of the file metadata version.
 
+Getting File Data Tables
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+This endpoint is oriented toward tabular files and provides a JSON representation of the file data tables for an existing tabular file. ``ID`` is the database id of the file to get the data tables from or ``PERSISTENT_ID`` is the persistent id (DOI or Handle) of the file.
+
+A curl example using an ``ID``
+
+.. code-block:: bash
+
+  export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
+  export SERVER_URL=https://demo.dataverse.org
+  export ID=24
+
+  curl $SERVER_URL/api/files/$ID/dataTables
+
+The fully expanded example above (without environment variables) looks like this:
+
+.. code-block:: bash
+
+  curl https://demo.dataverse.org/api/files/24/dataTables
+
+A curl example using a ``PERSISTENT_ID``
+
+.. code-block:: bash
+
+  export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
+  export SERVER_URL=https://demo.dataverse.org
+  export PERSISTENT_ID=doi:10.5072/FK2/AAA000
+
+  curl "$SERVER_URL/api/files/:persistentId/dataTables?persistentId=$PERSISTENT_ID"
+
+The fully expanded example above (without environment variables) looks like this:
+
+.. code-block:: bash
+
+  curl "https://demo.dataverse.org/api/files/:persistentId/dataTables?persistentId=doi:10.5072/FK2/AAA000"
+
+Note that if the requested file is not tabular, the endpoint will return an error.
+
+.. _file-download-count:
+
+Getting File Download Count
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Provides the download count for a particular file, where ``ID`` is the database id of the file to get the download count from or ``PERSISTENT_ID`` is the persistent id (DOI or Handle) of the file.
+
+A curl example using an ``ID``
+
+.. code-block:: bash
+
+  export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
+  export SERVER_URL=https://demo.dataverse.org
+  export ID=24
+
+  curl "$SERVER_URL/api/files/$ID/downloadCount"
+
+The fully expanded example above (without environment variables) looks like this:
+
+.. code-block:: bash
+
+  curl "https://demo.dataverse.org/api/files/24/downloadCount"
+
+A curl example using a ``PERSISTENT_ID``
+
+.. code-block:: bash
+
+  export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
+  export SERVER_URL=https://demo.dataverse.org
+  export PERSISTENT_ID=doi:10.5072/FK2/AAA000
+
+  curl "$SERVER_URL/api/files/:persistentId/downloadCount?persistentId=$PERSISTENT_ID"
+
+The fully expanded example above (without environment variables) looks like this:
+
+.. code-block:: bash
+
+  curl "https://demo.dataverse.org/api/files/:persistentId/downloadCount?persistentId=doi:10.5072/FK2/AAA000"
+
+If you are interested in download counts for multiple files, see :doc:`/api/metrics`.
 
 Updating File Metadata
 ~~~~~~~~~~~~~~~~~~~~~~

diff --git a/doc/sphinx-guides/source/container/configbaker-image.rst b/doc/sphinx-guides/source/container/configbaker-image.rst
@@ -86,7 +86,7 @@ Maven modules packaging target with activated "container" profile from the proje
 
 If you specifically want to build a config baker image *only*, try
 
-``mvn -Pct package -Ddocker.filter=dev_bootstrap``
+``mvn -Pct docker:build -Ddocker.filter=dev_bootstrap``
 
 The build of config baker involves copying Solr configset files. The Solr version used is inherited from Maven,
 acting as the single source of truth. Also, the tag of the image should correspond the application image, as

diff --git a/doc/sphinx-guides/source/developers/testing.rst b/doc/sphinx-guides/source/developers/testing.rst
@@ -225,6 +225,20 @@ If ``dataverse.siteUrl`` is absent, you can add it with:
 
 ``./asadmin create-jvm-options "-Ddataverse.siteUrl=http\://localhost\:8080"``
 
+dataverse.oai.server.maxidentifiers
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The OAI Harvesting tests require that the paging limit for ListIdentifiers must be set to 2, in order to be able to trigger this paging behavior without having to create and export too many datasets:
+
+``./asadmin create-jvm-options "-Ddataverse.oai.server.maxidentifiers=2"``
+
+dataverse.oai.server.maxrecords
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The OAI Harvesting tests require that the paging limit for ListRecords must be set to 2, in order to be able to trigger this paging behavior without having to create and export too many datasets:
+
+``./asadmin create-jvm-options "-Ddataverse.oai.server.maxrecords=2"``
+
 Identifier Generation
 ^^^^^^^^^^^^^^^^^^^^^