Merge branch 'develop' into 8243-improve-language-controlled-vocab

IQSS · Feb 14, 2024 · 56f3b4b · 56f3b4b
2 parents 8796a1d + f456e51
commit 56f3b4b
Show file tree

Hide file tree

Showing 22 changed files with 322 additions and 65 deletions.
diff --git a/doc/release-notes/3437-new-index-api-added.md b/doc/release-notes/3437-new-index-api-added.md
@@ -0,0 +1,4 @@
+(this API was added as a side feature of the pr #10222. the main point of the pr was an improvement in the OAI set housekeeping logic, I believe it's too obscure part of the system to warrant a relase note by itself. but the new API below needs to be announced).
+
+A new Index API endpoint has been added allowing an admin to clear an individual dataset from Solr.
+
diff --git a/doc/release-notes/9983-unique-constraints.md b/doc/release-notes/9983-unique-constraints.md
@@ -0,0 +1,14 @@
+This release adds two missing database constraints that will assure that the externalvocabularyvalue table only has one entry for each uri and that the oaiset table only has one set for each spec. (In the very unlikely case that your existing database has duplicate entries now, install would fail. This can be checked by running
+
+SELECT uri, count(*) FROM externalvocabularyvaluet group by uri;
+
+and
+
+SELECT spec, count(*) FROM oaiset group by spec;
+
+and then removing any duplicate rows (where count>1).
+
+
+
+
+TODO: Whoever puts the release notes together should make sure there is the standard note about reloading metadata blocks for the citation, astrophysics, and biomedical blocks (plus any others from other PRs) after upgrading.
diff --git a/doc/sphinx-guides/source/_static/admin/dataverse-external-tools.tsv b/doc/sphinx-guides/source/_static/admin/dataverse-external-tools.tsv
@@ -6,3 +6,4 @@ File Previewers	explore	file	"A set of tools that display the content of files -
 Data Curation Tool	configure	file	"A GUI for curating data by adding labels, groups, weights and other details to assist with informed reuse. See the README.md file at https://github.com/scholarsportal/Dataverse-Data-Curation-Tool for the installation instructions."
 Ask the Data	query	file	Ask the Data is an experimental tool that allows you ask natural language questions about the data contained in Dataverse tables (tabular data). See the README.md file at https://github.com/IQSS/askdataverse/tree/main/askthedata for the instructions on adding Ask the Data to your Dataverse installation. 
 TurboCurator by ICPSR	configure	dataset	TurboCurator generates metadata improvements for title, description, and keywords. It relies on open AI's ChatGPT & ICPSR best practices. See the `TurboCurator Dataverse Administrator <https://turbocurator.icpsr.umich.edu/tc/adminabout/>`_ page for more details on how it works and adding TurboCurator to your Dataverse installation.
+JupyterHub	explore	file	The `Dataverse-to-JupyterHub Data Transfer Connector <https://forgemia.inra.fr/dipso/eosc-pillar/dataverse-jupyterhub-connector>`_ is a tool that simplifies the transfer of data between Dataverse repositories and the cloud-based platform JupyterHub. It is designed for researchers, scientists, and data analysts, facilitating collaboration on projects by seamlessly moving datasets and files. The tool is a lightweight client-side web application built using React and relies on the Dataverse External Tool feature, allowing for easy deployment on modern integration systems. Currently optimized for small to medium-sized files, future plans include extending support for larger files and signed Dataverse endpoints. For more details, you can refer to the external tool manifest: https://forgemia.inra.fr/dipso/eosc-pillar/dataverse-jupyterhub-connector/-/blob/master/externalTools.json
diff --git a/doc/sphinx-guides/source/admin/integrations.rst b/doc/sphinx-guides/source/admin/integrations.rst
@@ -197,6 +197,16 @@ Avgidea Data Search
 
 Researchers can use a Google Sheets add-on to search for Dataverse installation's CSV data and then import that data into a sheet. See `Avgidea Data Search <https://www.avgidea.io/avgidea-data-platform.html>`_ for details.
 
+JupyterHub
+++++++++++
+
+The `Dataverse-to-JupyterHub Data Transfer Connector <https://forgemia.inra.fr/dipso/eosc-pillar/dataverse-jupyterhub-connector>`_ streamlines data transfer between Dataverse repositories and the cloud-based platform JupyterHub, enhancing collaborative research.
+This connector facilitates seamless two-way transfer of datasets and files, emphasizing the potential of an integrated research environment.
+It is a lightweight client-side web application built using React and relying on the Dataverse External Tool feature, allowing for easy deployment on modern integration systems. Currently, it supports small to medium-sized files, with plans to enable support for large files and signed Dataverse endpoints in the future.
+
+What kind of user is the feature intended for?
+The feature is intended for researchers, scientists and data analyst who are working with Dataverse instances and JupyterHub looking to ease the data transfer process. See `presentation <https://harvard.zoom.us/rec/share/0RpoN_a7HPXF9jpBovtvxVgcaEbqrv5ZBSIKISVemdZjswGxOzbalQYpjebCbLA1.y2ZjRXYxhq8C_SU7>`_ for details.
+
 .. _integrations-discovery:
 
 Discoverability

diff --git a/doc/sphinx-guides/source/admin/metadatacustomization.rst b/doc/sphinx-guides/source/admin/metadatacustomization.rst
@@ -37,8 +37,8 @@ tab-separated value (TSV). [1]_\ :sup:`,`\  [2]_ While it is technically
 possible to define more than one metadata block in a TSV file, it is
 good organizational practice to define only one in each file.
 
-The metadata block TSVs shipped with the Dataverse Software are in `/tree/develop/scripts/api/data/metadatablocks
-<https://github.com/IQSS/dataverse/tree/develop/scripts/api/data/metadatablocks>`__ and the corresponding ResourceBundle property files `/tree/develop/src/main/java <https://github.com/IQSS/dataverse/tree/develop/src/main/java>`__ of the Dataverse Software GitHub repo. Human-readable copies are available in `this Google Sheets
+The metadata block TSVs shipped with the Dataverse Software are in `/scripts/api/data/metadatablocks
+<https://github.com/IQSS/dataverse/tree/develop/scripts/api/data/metadatablocks>`__ with the corresponding ResourceBundle property files in `/src/main/java/propertyFiles <https://github.com/IQSS/dataverse/tree/develop/src/main/java/propertyFiles>`__ of the Dataverse Software GitHub repo. Human-readable copies are available in `this Google Sheets
 document <https://docs.google.com/spreadsheets/d/13HP-jI_cwLDHBetn9UKTREPJ_F4iHdAvhjmlvmYdSSw/edit#gid=0>`__ but they tend to get out of sync with the TSV files, which should be considered authoritative. The Dataverse Software installation process operates on the TSVs, not the Google spreadsheet.
 
 About the metadata block TSV

diff --git a/doc/sphinx-guides/source/admin/solr-search-index.rst b/doc/sphinx-guides/source/admin/solr-search-index.rst
@@ -26,8 +26,8 @@ Remove all Solr documents that are orphaned (i.e. not associated with objects in
 
 ``curl http://localhost:8080/api/admin/index/clear-orphans``
 
-Clearing Data from Solr
-~~~~~~~~~~~~~~~~~~~~~~~
+Clearing ALL Data from Solr
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 Please note that the moment you issue this command, it will appear to end users looking at the root Dataverse installation page that all data is gone! This is because the root Dataverse installation page is powered by the search index.
 
@@ -86,6 +86,16 @@ To re-index a dataset by its database ID:
 
 ``curl http://localhost:8080/api/admin/index/datasets/7504557``
 
+Clearing a Dataset from Solr
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+This API will clear the Solr entry for the dataset specified. It can be useful if you have reasons to want to hide a published dataset from showing in search results and/or on Collection pages, but don't want to destroy and purge it from the database just yet. 
+
+``curl -X DELETE http://localhost:8080/api/admin/index/datasets/<DATABASE_ID>``
+
+This can be reversed of course by re-indexing the dataset with the API above. 
+
+
 Manually Querying Solr
 ----------------------
 

diff --git a/scripts/api/data/metadatablocks/astrophysics.tsv b/scripts/api/data/metadatablocks/astrophysics.tsv
@@ -2,13 +2,13 @@
 	astrophysics		Astronomy and Astrophysics Metadata													
 #datasetField	name	title	description	watermark	 fieldType	displayOrder	displayFormat	advancedSearchField	allowControlledVocabulary	allowmultiples	facetable	displayoncreate	required	parent	metadatablock_id
 	astroType	Type	The nature or genre of the content of the files in the dataset.		text	0		TRUE	TRUE	TRUE	TRUE	FALSE	FALSE		astrophysics
-	astroFacility	Facility	The observatory or facility where the data was obtained. 		text	1		TRUE	TRUE	TRUE	TRUE	FALSE	FALSE		astrophysics
-	astroInstrument	Instrument	The instrument used to collect the data.		text	2		TRUE	TRUE	TRUE	TRUE	FALSE	FALSE		astrophysics
+	astroFacility	Facility	The observatory or facility where the data was obtained. 		text	1		TRUE	FALSE	TRUE	TRUE	FALSE	FALSE		astrophysics
+	astroInstrument	Instrument	The instrument used to collect the data.		text	2		TRUE	FALSE	TRUE	TRUE	FALSE	FALSE		astrophysics
 	astroObject	Object	Astronomical Objects represented in the data (Given as SIMBAD recognizable names preferred).		text	3		TRUE	FALSE	TRUE	TRUE	FALSE	FALSE		astrophysics
 	resolution.Spatial	Spatial Resolution	The spatial (angular) resolution that is typical of the observations, in decimal degrees.		text	4		TRUE	FALSE	FALSE	TRUE	FALSE	FALSE		astrophysics
 	resolution.Spectral	Spectral Resolution	The spectral resolution that is typical of the observations, given as the ratio \u03bb/\u0394\u03bb.		text	5		TRUE	FALSE	FALSE	TRUE	FALSE	FALSE		astrophysics
 	resolution.Temporal	Time Resolution	The temporal resolution that is typical of the observations, given in seconds.		text	6		FALSE	FALSE	FALSE	FALSE	FALSE	FALSE		astrophysics
-	coverage.Spectral.Bandpass	Bandpass	Conventional bandpass name		text	7		TRUE	TRUE	TRUE	TRUE	FALSE	FALSE		astrophysics
+	coverage.Spectral.Bandpass	Bandpass	Conventional bandpass name		text	7		TRUE	FALSE	TRUE	TRUE	FALSE	FALSE		astrophysics
 	coverage.Spectral.CentralWavelength	Central Wavelength (m)	The central wavelength of the spectral bandpass, in meters.	Enter a floating-point number.	float	8		TRUE	FALSE	TRUE	TRUE	FALSE	FALSE		astrophysics
 	coverage.Spectral.Wavelength	Wavelength Range	The minimum and maximum wavelength of the spectral bandpass.	Enter a floating-point number.	none	9		FALSE	FALSE	TRUE	FALSE	FALSE	FALSE		astrophysics
 	coverage.Spectral.MinimumWavelength	Minimum (m)	The minimum wavelength of the spectral bandpass, in meters.	Enter a floating-point number.	float	10		TRUE	FALSE	FALSE	TRUE	FALSE	FALSE	coverage.Spectral.Wavelength	astrophysics

diff --git a/scripts/api/data/metadatablocks/biomedical.tsv b/scripts/api/data/metadatablocks/biomedical.tsv
@@ -13,7 +13,7 @@
 	studyAssayOtherTechnologyType	Other Technology Type	If Other was selected in Technology Type, list any other technology types that were used in this Dataset.		text	9		TRUE	FALSE	TRUE	TRUE	FALSE	FALSE		biomedical
 	studyAssayPlatform	Technology Platform	The manufacturer and name of the technology platform used in the assay (e.g. Bruker AVANCE).		text	10		TRUE	TRUE	TRUE	TRUE	FALSE	FALSE		biomedical
 	studyAssayOtherPlatform	Other Technology Platform	If Other was selected in Technology Platform, list any other technology platforms that were used in this Dataset.		text	11		TRUE	FALSE	TRUE	TRUE	FALSE	FALSE		biomedical
-	studyAssayCellType	Cell Type	The name of the cell line from which the source or sample derives.		text	12		TRUE	TRUE	TRUE	TRUE	FALSE	FALSE		biomedical
+	studyAssayCellType	Cell Type	The name of the cell line from which the source or sample derives.		text	12		TRUE	FALSE	TRUE	TRUE	FALSE	FALSE		biomedical
 #controlledVocabulary	DatasetField	Value	identifier	displayOrder											
 	studyDesignType	Case Control	EFO_0001427	0											
 	studyDesignType	Cross Sectional	EFO_0001428	1											

diff --git a/scripts/api/data/metadatablocks/citation.tsv b/scripts/api/data/metadatablocks/citation.tsv
@@ -70,7 +70,7 @@
 	seriesName	Name	The name of the dataset series		text	66	#VALUE	TRUE	FALSE	FALSE	TRUE	FALSE	FALSE	series	citation	
 	seriesInformation	Information	Can include 1) a history of the series and 2) a summary of features that apply to the series		textbox	67	#VALUE	FALSE	FALSE	FALSE	FALSE	FALSE	FALSE	series	citation	
 	software	Software	Information about the software used to generate the Dataset		none	68	,	FALSE	FALSE	TRUE	FALSE	FALSE	FALSE		citation	https://www.w3.org/TR/prov-o/#wasGeneratedBy
-	softwareName	Name	The name of software used to generate the Dataset		text	69	#VALUE	FALSE	TRUE	FALSE	FALSE	FALSE	FALSE	software	citation	
+	softwareName	Name	The name of software used to generate the Dataset		text	69	#VALUE	FALSE	FALSE	FALSE	FALSE	FALSE	FALSE	software	citation
 	softwareVersion	Version	The version of the software used to generate the Dataset, e.g. 4.11		text	70	#NAME: #VALUE	FALSE	FALSE	FALSE	FALSE	FALSE	FALSE	software	citation	
 	relatedMaterial	Related Material	Information, such as a persistent ID or citation, about the material related to the Dataset, such as appendices or sampling information available outside of the Dataset		textbox	71		FALSE	FALSE	TRUE	FALSE	FALSE	FALSE		citation	
 	relatedDatasets	Related Dataset	Information, such as a persistent ID or citation, about a related dataset, such as previous research on the Dataset's subject		textbox	72		FALSE	FALSE	TRUE	FALSE	FALSE	FALSE		citation	http://purl.org/dc/terms/relation

diff --git a/src/main/java/edu/harvard/iq/dataverse/DatasetFieldServiceBean.java b/src/main/java/edu/harvard/iq/dataverse/DatasetFieldServiceBean.java
@@ -19,6 +19,8 @@
 
 import jakarta.ejb.EJB;
 import jakarta.ejb.Stateless;
+import jakarta.ejb.TransactionAttribute;
+import jakarta.ejb.TransactionAttributeType;
 import jakarta.inject.Named;
 import jakarta.json.Json;
 import jakarta.json.JsonArray;
@@ -34,6 +36,7 @@
 import jakarta.persistence.NoResultException;
 import jakarta.persistence.NonUniqueResultException;
 import jakarta.persistence.PersistenceContext;
+import jakarta.persistence.PersistenceException;
 import jakarta.persistence.TypedQuery;
 
 import org.apache.commons.codec.digest.DigestUtils;
@@ -46,7 +49,6 @@
 import org.apache.http.impl.client.HttpClients;
 import org.apache.http.protocol.HttpContext;
 import org.apache.http.util.EntityUtils;
-
 import edu.harvard.iq.dataverse.settings.SettingsServiceBean;
 
 /**
@@ -448,6 +450,7 @@ public JsonObject getExternalVocabularyValue(String termUri) {
      * @param cvocEntry - the configuration for the DatasetFieldType associated with this term 
      * @param term - the term uri as a string
      */
+    @TransactionAttribute(TransactionAttributeType.REQUIRES_NEW)
     public void registerExternalTerm(JsonObject cvocEntry, String term) {
         String retrievalUri = cvocEntry.getString("retrieval-uri");
         String prefix = cvocEntry.getString("prefix", null);
@@ -518,6 +521,8 @@ public void process(HttpResponse response, HttpContext context) throws HttpExcep
                             logger.fine("Wrote value for term: " + term);
                         } catch (JsonException je) {
                             logger.severe("Error retrieving: " + retrievalUri + " : " + je.getMessage());
+                        } catch (PersistenceException e) {
+                            logger.fine("Problem persisting: " + retrievalUri + " : " + e.getMessage());
                         }
                     } else {
                         logger.severe("Received response code : " + statusCode + " when retrieving " + retrievalUri

diff --git a/src/main/java/edu/harvard/iq/dataverse/DatasetFieldType.java b/src/main/java/edu/harvard/iq/dataverse/DatasetFieldType.java
@@ -284,7 +284,7 @@ public void setDisplayOnCreate(boolean displayOnCreate) {
     }
 
     public boolean isControlledVocabulary() {
-        return controlledVocabularyValues != null && !controlledVocabularyValues.isEmpty();
+        return allowControlledVocabulary;
     }
 
     /**

diff --git a/src/main/java/edu/harvard/iq/dataverse/DataversePage.java b/src/main/java/edu/harvard/iq/dataverse/DataversePage.java
@@ -362,7 +362,7 @@ public void initFeaturedDataverses() {
         List<Dataverse> featuredSource = new ArrayList<>();
         List<Dataverse> featuredTarget = new ArrayList<>();
         featuredSource.addAll(dataverseService.findAllPublishedByOwnerId(dataverse.getId()));
-        featuredSource.addAll(linkingService.findLinkingDataverses(dataverse.getId()));
+        featuredSource.addAll(linkingService.findLinkedDataverses(dataverse.getId()));
         List<DataverseFeaturedDataverse> featuredList = featuredDataverseService.findByDataverseId(dataverse.getId());
         for (DataverseFeaturedDataverse dfd : featuredList) {
             Dataverse fd = dfd.getFeaturedDataverse();

diff --git a/src/main/java/edu/harvard/iq/dataverse/api/Index.java b/src/main/java/edu/harvard/iq/dataverse/api/Index.java
@@ -215,7 +215,7 @@ public Response clearSolrIndex() {
             return error(Status.INTERNAL_SERVER_ERROR, ex.getLocalizedMessage());
         }
     }
-
+    
     @GET
     @Path("{type}/{id}")
     public Response indexTypeById(@PathParam("type") String type, @PathParam("id") Long id) {
@@ -326,6 +326,29 @@ public Response indexDatasetByPersistentId(@QueryParam("persistentId") String pe
         }
     }
 
+    /**
+     * Clears the entry for a dataset from Solr
+     * 
+     * @param id numer id of the dataset
+     * @return response; 
+     * will return 404 if no such dataset in the database; but will attempt to 
+     * clear the entry from Solr regardless.
+     */
+    @DELETE
+    @Path("datasets/{id}")
+    public Response clearDatasetFromIndex(@PathParam("id") Long id) {
+        Dataset dataset = datasetService.find(id);
+        // We'll attempt to delete the Solr document regardless of whether the 
+        // dataset exists in the database: 
+        String response = indexService.removeSolrDocFromIndex(IndexServiceBean.solrDocIdentifierDataset + id);
+        if (dataset != null) {
+            return ok("Sent request to clear Solr document for dataset " + id + ": " + response);
+        } else {
+            return notFound("Could not find dataset " + id + " in the database. Requested to clear from Solr anyway: " + response);
+        }
+    }
+
+
     /**
      * This is just a demo of the modular math logic we use for indexAll.
      */

diff --git a/src/main/java/edu/harvard/iq/dataverse/api/TestApi.java b/src/main/java/edu/harvard/iq/dataverse/api/TestApi.java
@@ -44,6 +44,34 @@ public Response getExternalToolsforFile(@PathParam("id") String idSupplied, @Que
             return wr.getResponse();
         }
     }
+
+    @GET
+    @Path("datasets/{id}/externalTool/{toolId}")
+    public Response getExternalToolforDatasetById(@PathParam("id") String idSupplied, @PathParam("toolId") String toolId, @QueryParam("type") String typeSupplied) {
+        ExternalTool.Type type;
+        try {
+            type = ExternalTool.Type.fromString(typeSupplied);
+        } catch (IllegalArgumentException ex) {
+            return error(BAD_REQUEST, ex.getLocalizedMessage());
+        }
+        Dataset dataset;
+        try {
+            dataset = findDatasetOrDie(idSupplied);
+            JsonArrayBuilder tools = Json.createArrayBuilder();
+            List<ExternalTool> datasetTools = externalToolService.findDatasetToolsByType(type);
+            for (ExternalTool tool : datasetTools) {
+                ApiToken apiToken = externalToolService.getApiToken(getRequestApiKey());
+                ExternalToolHandler externalToolHandler = new ExternalToolHandler(tool, dataset, apiToken, null);
+                JsonObjectBuilder toolToJson = externalToolService.getToolAsJsonWithQueryParameters(externalToolHandler);
+                if (tool.getId().toString().equals(toolId)) {
+                    return ok(toolToJson);
+                }
+            }
+        } catch (WrappedResponse wr) {
+            return wr.getResponse();
+        }
+        return error(BAD_REQUEST, "Could not find external tool with id of " + toolId);
+    }
 
     @Path("files/{id}/externalTools")
     @GET
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,4 @@
		(this API was added as a side feature of the pr #10222. the main point of the pr was an improvement in the OAI set housekeeping logic, I believe it's too obscure part of the system to warrant a relase note by itself. but the new API below needs to be announced).

		A new Index API endpoint has been added allowing an admin to clear an individual dataset from Solr.
-Original file line number
+Diff line change
@@ Expand Up / @@ -284,7 +284,7 @@ public void setDisplayOnCreate(boolean displayOnCreate) { @@
         }
         public boolean isControlledVocabulary() {
-            return controlledVocabularyValues != null && !controlledVocabularyValues.isEmpty();
+            return allowControlledVocabulary;
         }
         /**
@@ Expand Down @@