Skip to content

Commit

Permalink
Merge pull request #9783 from IQSS/9714-files-api-extension-filters
Browse files Browse the repository at this point in the history
Dataset files API extension for filters
  • Loading branch information
kcondon authored Oct 4, 2023
2 parents 7b0bdc8 + c00ad79 commit 9174e38
Show file tree
Hide file tree
Showing 20 changed files with 1,110 additions and 174 deletions.
14 changes: 14 additions & 0 deletions doc/release-notes/9714-files-api-extension-filters.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
The getVersionFiles endpoint (/api/datasets/{id}/versions/{versionId}/files) has been extended to support optional filtering by:

- Access status: through the `accessStatus` query parameter, which supports the following values:

- Public
- Restricted
- EmbargoedThenRestricted
- EmbargoedThenPublic


- Category name: through the `categoryName` query parameter. To return files to which the particular category has been added.


- Content type: through the `contentType` query parameter. To return files matching the requested content type. For example: "image/png".
3 changes: 3 additions & 0 deletions doc/release-notes/9785-files-api-extension-search-text.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
The getVersionFiles endpoint (/api/datasets/{id}/versions/{versionId}/files) has been extended to support optional filtering by search text through the `searchText` query parameter.

The search will be applied to the labels and descriptions of the dataset files.
6 changes: 6 additions & 0 deletions doc/release-notes/9834-files-api-extension-counts.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
Implemented the following new endpoints:

- getVersionFileCounts (/api/datasets/{id}/versions/{versionId}/files/counts): Given a dataset and its version, retrieves file counts based on different criteria (Total count, per content type, per access status and per category name).


- setFileCategories (/api/files/{id}/metadata/categories): Updates the categories (by name) for an existing file. If the specified categories do not exist, they will be created.
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
Implemented the following new endpoints:

- userFileAccessRequested (/api/access/datafile/{id}/userFileAccessRequested): Returns true or false depending on whether or not the calling user has requested access to a particular file.


- hasBeenDeleted (/api/files/{id}/hasBeenDeleted): Know if a particular file that existed in a previous version of the dataset no longer exists in the latest version.


In addition, the DataFile API payload has been extended to include the following fields:

- tabularData: Boolean field to know if the DataFile is of tabular type


- fileAccessRequest: Boolean field to know if the file access requests are enabled on the Dataset (DataFile owner)
12 changes: 12 additions & 0 deletions doc/sphinx-guides/source/api/dataaccess.rst
Original file line number Diff line number Diff line change
Expand Up @@ -404,6 +404,18 @@ A curl example using an ``id``::

curl -H "X-Dataverse-key:$API_TOKEN" -X GET http://$SERVER/api/access/datafile/{id}/listRequests

User Has Requested Access to a File:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

``/api/access/datafile/{id}/userFileAccessRequested``

This method returns true or false depending on whether or not the calling user has requested access to a particular file.

A curl example using an ``id``::

curl -H "X-Dataverse-key:$API_TOKEN" -X GET "http://$SERVER/api/access/datafile/{id}/userFileAccessRequested"


Get User Permissions on a File:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down
165 changes: 159 additions & 6 deletions doc/sphinx-guides/source/api/native-api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -970,6 +970,45 @@ This endpoint supports optional pagination, through the ``limit`` and ``offset``
curl "https://demo.dataverse.org/api/datasets/24/versions/1.0/files?limit=10&offset=20"
Category name filtering is also optionally supported. To return files to which the requested category has been added.

Usage example:

.. code-block:: bash
curl "https://demo.dataverse.org/api/datasets/24/versions/1.0/files?categoryName=Data"
Content type filtering is also optionally supported. To return files matching the requested content type.

Usage example:

.. code-block:: bash
curl "https://demo.dataverse.org/api/datasets/24/versions/1.0/files?contentType=image/png"
Filtering by search text is also optionally supported. The search will be applied to the labels and descriptions of the dataset files, to return the files that contain the text searched in one of such fields.

Usage example:

.. code-block:: bash
curl "https://demo.dataverse.org/api/datasets/24/versions/1.0/files?searchText=word"
File access filtering is also optionally supported. In particular, by the following possible values:

* ``Public``
* ``Restricted``
* ``EmbargoedThenRestricted``
* ``EmbargoedThenPublic``

If no filter is specified, the files will match all of the above categories.

Usage example:

.. code-block:: bash
curl "https://demo.dataverse.org/api/datasets/24/versions/1.0/files?accessStatus=Public"
Ordering criteria for sorting the results is also optionally supported. In particular, by the following possible values:

* ``NameAZ`` (Default)
Expand All @@ -979,14 +1018,42 @@ Ordering criteria for sorting the results is also optionally supported. In parti
* ``Size``
* ``Type``

Please note that these values are case sensitive and must be correctly typed for the endpoint to recognize them.

Usage example:

.. code-block:: bash
curl "https://demo.dataverse.org/api/datasets/24/versions/1.0/files?orderCriteria=Newest"
Please note that both filtering and ordering criteria values are case sensitive and must be correctly typed for the endpoint to recognize them.

Keep in mind that you can combine all of the above query params depending on the results you are looking for.

Get File Counts in a Dataset
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Get file counts, for the given dataset and version.

The returned file counts are based on different criteria:

- Total (The total file count)
- Per content type
- Per category name
- Per access status (Possible values: Public, Restricted, EmbargoedThenRestricted, EmbargoedThenPublic)

.. code-block:: bash
export SERVER_URL=https://demo.dataverse.org
export ID=24
export VERSION=1.0
curl "$SERVER_URL/api/datasets/$ID/versions/$VERSION/files/counts"
The fully expanded example above (without environment variables) looks like this:

.. code-block:: bash
curl "https://demo.dataverse.org/api/datasets/24/versions/1.0/files/counts"
View Dataset Files and Folders as a Directory Index
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down Expand Up @@ -2832,13 +2899,13 @@ A curl example using an ``ID``
export SERVER_URL=https://demo.dataverse.org
export ID=24
curl "$SERVER_URL/api/files/$ID/downloadCount"
curl -H "X-Dataverse-key:$API_TOKEN" -X GET "$SERVER_URL/api/files/$ID/downloadCount"
The fully expanded example above (without environment variables) looks like this:
.. code-block:: bash
curl "https://demo.dataverse.org/api/files/24/downloadCount"
curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X GET "https://demo.dataverse.org/api/files/24/downloadCount"
A curl example using a ``PERSISTENT_ID``
Expand All @@ -2848,16 +2915,53 @@ A curl example using a ``PERSISTENT_ID``
export SERVER_URL=https://demo.dataverse.org
export PERSISTENT_ID=doi:10.5072/FK2/AAA000
curl "$SERVER_URL/api/files/:persistentId/downloadCount?persistentId=$PERSISTENT_ID"
curl -H "X-Dataverse-key:$API_TOKEN" -X GET "$SERVER_URL/api/files/:persistentId/downloadCount?persistentId=$PERSISTENT_ID"
The fully expanded example above (without environment variables) looks like this:
.. code-block:: bash
curl "https://demo.dataverse.org/api/files/:persistentId/downloadCount?persistentId=doi:10.5072/FK2/AAA000"
curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X GET "https://demo.dataverse.org/api/files/:persistentId/downloadCount?persistentId=doi:10.5072/FK2/AAA000"
If you are interested in download counts for multiple files, see :doc:`/api/metrics`.
File Has Been Deleted
~~~~~~~~~~~~~~~~~~~~~
Know if a particular file that existed in a previous version of the dataset no longer exists in the latest version.
A curl example using an ``ID``
.. code-block:: bash
export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export SERVER_URL=https://demo.dataverse.org
export ID=24
curl -H "X-Dataverse-key:$API_TOKEN" -X GET "$SERVER_URL/api/files/$ID/hasBeenDeleted"
The fully expanded example above (without environment variables) looks like this:
.. code-block:: bash
curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X GET "https://demo.dataverse.org/api/files/24/hasBeenDeleted"
A curl example using a ``PERSISTENT_ID``
.. code-block:: bash
export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export SERVER_URL=https://demo.dataverse.org
export PERSISTENT_ID=doi:10.5072/FK2/AAA000
curl -H "X-Dataverse-key:$API_TOKEN" -X GET "$SERVER_URL/api/files/:persistentId/hasBeenDeleted?persistentId=$PERSISTENT_ID"
The fully expanded example above (without environment variables) looks like this:
.. code-block:: bash
curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X GET "https://demo.dataverse.org/api/files/:persistentId/hasBeenDeleted?persistentId=doi:10.5072/FK2/AAA000"
Updating File Metadata
~~~~~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -2907,6 +3011,55 @@ Also note that dataFileTags are not versioned and changes to these will update t
.. _EditingVariableMetadata:
Updating File Metadata Categories
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Updates the categories for an existing file where ``ID`` is the database id of the file to update or ``PERSISTENT_ID`` is the persistent id (DOI or Handle) of the file. Requires a ``jsonString`` expressing the category names.
Although updating categories can also be done with the previous endpoint, this has been created to be more practical when it is only necessary to update categories and not other metadata fields.
A curl example using an ``ID``
.. code-block:: bash
export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export SERVER_URL=https://demo.dataverse.org
export ID=24
curl -H "X-Dataverse-key:$API_TOKEN" -X POST \
-F 'jsonData={"categories":["Category1","Category2"]}' \
"$SERVER_URL/api/files/$ID/metadata/categories"
The fully expanded example above (without environment variables) looks like this:
.. code-block:: bash
curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST \
-F 'jsonData={"categories":["Category1","Category2"]}' \
"http://demo.dataverse.org/api/files/24/metadata/categories"
A curl example using a ``PERSISTENT_ID``
.. code-block:: bash
export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export SERVER_URL=https://demo.dataverse.org
export PERSISTENT_ID=doi:10.5072/FK2/AAA000
curl -H "X-Dataverse-key:$API_TOKEN" -X POST \
-F 'jsonData={"categories":["Category1","Category2"]}' \
"$SERVER_URL/api/files/:persistentId/metadata/categories?persistentId=$PERSISTENT_ID"
The fully expanded example above (without environment variables) looks like this:
.. code-block:: bash
curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST \
-F 'jsonData={"categories":["Category1","Category2"]}' \
"https://demo.dataverse.org/api/files/:persistentId/metadata/categories?persistentId=doi:10.5072/FK2/AAA000"
Note that if the specified categories do not exist, they will be created.
Editing Variable Level Metadata
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down
3 changes: 3 additions & 0 deletions modules/dataverse-parent/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -199,6 +199,9 @@

<!-- Container related -->
<fabric8-dmp.version>0.43.4</fabric8-dmp.version>

<!-- Persistence -->
<querydsl.version>5.0.0</querydsl.version>
</properties>

<pluginRepositories>
Expand Down
14 changes: 14 additions & 0 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -252,6 +252,20 @@
<artifactId>expressly</artifactId>
<scope>provided</scope>
</dependency>

<dependency>
<groupId>com.querydsl</groupId>
<artifactId>querydsl-apt</artifactId>
<version>${querydsl.version}</version>
<classifier>jakarta</classifier>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>com.querydsl</groupId>
<artifactId>querydsl-jpa</artifactId>
<version>${querydsl.version}</version>
<classifier>jakarta</classifier>
</dependency>

<dependency>
<groupId>commons-io</groupId>
Expand Down
2 changes: 1 addition & 1 deletion src/main/java/edu/harvard/iq/dataverse/DataFileTag.java
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ public enum TagType {Survey, TimeSeries, Panel, Event, Genomics, Network, Geospa

private static final Map<TagType, String> TagTypeToLabels = new HashMap<>();

private static final Map<String, TagType> TagLabelToTypes = new HashMap<>();
public static final Map<String, TagType> TagLabelToTypes = new HashMap<>();


static {
Expand Down
Loading

0 comments on commit 9174e38

Please sign in to comment.