Skip to content

Commit

Permalink
Merge branch 'develop' into 8243-improve-language-controlled-vocab
Browse files Browse the repository at this point in the history
  • Loading branch information
landreev committed Feb 14, 2024
2 parents 8796a1d + f456e51 commit 56f3b4b
Show file tree
Hide file tree
Showing 22 changed files with 322 additions and 65 deletions.
4 changes: 4 additions & 0 deletions doc/release-notes/3437-new-index-api-added.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
(this API was added as a side feature of the pr #10222. the main point of the pr was an improvement in the OAI set housekeeping logic, I believe it's too obscure part of the system to warrant a relase note by itself. but the new API below needs to be announced).

A new Index API endpoint has been added allowing an admin to clear an individual dataset from Solr.

14 changes: 14 additions & 0 deletions doc/release-notes/9983-unique-constraints.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
This release adds two missing database constraints that will assure that the externalvocabularyvalue table only has one entry for each uri and that the oaiset table only has one set for each spec. (In the very unlikely case that your existing database has duplicate entries now, install would fail. This can be checked by running

SELECT uri, count(*) FROM externalvocabularyvaluet group by uri;

and

SELECT spec, count(*) FROM oaiset group by spec;

and then removing any duplicate rows (where count>1).




TODO: Whoever puts the release notes together should make sure there is the standard note about reloading metadata blocks for the citation, astrophysics, and biomedical blocks (plus any others from other PRs) after upgrading.
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,4 @@ File Previewers explore file "A set of tools that display the content of files -
Data Curation Tool configure file "A GUI for curating data by adding labels, groups, weights and other details to assist with informed reuse. See the README.md file at https://github.com/scholarsportal/Dataverse-Data-Curation-Tool for the installation instructions."
Ask the Data query file Ask the Data is an experimental tool that allows you ask natural language questions about the data contained in Dataverse tables (tabular data). See the README.md file at https://github.com/IQSS/askdataverse/tree/main/askthedata for the instructions on adding Ask the Data to your Dataverse installation.
TurboCurator by ICPSR configure dataset TurboCurator generates metadata improvements for title, description, and keywords. It relies on open AI's ChatGPT & ICPSR best practices. See the `TurboCurator Dataverse Administrator <https://turbocurator.icpsr.umich.edu/tc/adminabout/>`_ page for more details on how it works and adding TurboCurator to your Dataverse installation.
JupyterHub explore file The `Dataverse-to-JupyterHub Data Transfer Connector <https://forgemia.inra.fr/dipso/eosc-pillar/dataverse-jupyterhub-connector>`_ is a tool that simplifies the transfer of data between Dataverse repositories and the cloud-based platform JupyterHub. It is designed for researchers, scientists, and data analysts, facilitating collaboration on projects by seamlessly moving datasets and files. The tool is a lightweight client-side web application built using React and relies on the Dataverse External Tool feature, allowing for easy deployment on modern integration systems. Currently optimized for small to medium-sized files, future plans include extending support for larger files and signed Dataverse endpoints. For more details, you can refer to the external tool manifest: https://forgemia.inra.fr/dipso/eosc-pillar/dataverse-jupyterhub-connector/-/blob/master/externalTools.json
10 changes: 10 additions & 0 deletions doc/sphinx-guides/source/admin/integrations.rst
Original file line number Diff line number Diff line change
Expand Up @@ -197,6 +197,16 @@ Avgidea Data Search

Researchers can use a Google Sheets add-on to search for Dataverse installation's CSV data and then import that data into a sheet. See `Avgidea Data Search <https://www.avgidea.io/avgidea-data-platform.html>`_ for details.

JupyterHub
++++++++++

The `Dataverse-to-JupyterHub Data Transfer Connector <https://forgemia.inra.fr/dipso/eosc-pillar/dataverse-jupyterhub-connector>`_ streamlines data transfer between Dataverse repositories and the cloud-based platform JupyterHub, enhancing collaborative research.
This connector facilitates seamless two-way transfer of datasets and files, emphasizing the potential of an integrated research environment.
It is a lightweight client-side web application built using React and relying on the Dataverse External Tool feature, allowing for easy deployment on modern integration systems. Currently, it supports small to medium-sized files, with plans to enable support for large files and signed Dataverse endpoints in the future.

What kind of user is the feature intended for?
The feature is intended for researchers, scientists and data analyst who are working with Dataverse instances and JupyterHub looking to ease the data transfer process. See `presentation <https://harvard.zoom.us/rec/share/0RpoN_a7HPXF9jpBovtvxVgcaEbqrv5ZBSIKISVemdZjswGxOzbalQYpjebCbLA1.y2ZjRXYxhq8C_SU7>`_ for details.

.. _integrations-discovery:

Discoverability
Expand Down
4 changes: 2 additions & 2 deletions doc/sphinx-guides/source/admin/metadatacustomization.rst
Original file line number Diff line number Diff line change
Expand Up @@ -37,8 +37,8 @@ tab-separated value (TSV). [1]_\ :sup:`,`\ [2]_ While it is technically
possible to define more than one metadata block in a TSV file, it is
good organizational practice to define only one in each file.

The metadata block TSVs shipped with the Dataverse Software are in `/tree/develop/scripts/api/data/metadatablocks
<https://github.com/IQSS/dataverse/tree/develop/scripts/api/data/metadatablocks>`__ and the corresponding ResourceBundle property files `/tree/develop/src/main/java <https://github.com/IQSS/dataverse/tree/develop/src/main/java>`__ of the Dataverse Software GitHub repo. Human-readable copies are available in `this Google Sheets
The metadata block TSVs shipped with the Dataverse Software are in `/scripts/api/data/metadatablocks
<https://github.com/IQSS/dataverse/tree/develop/scripts/api/data/metadatablocks>`__ with the corresponding ResourceBundle property files in `/src/main/java/propertyFiles <https://github.com/IQSS/dataverse/tree/develop/src/main/java/propertyFiles>`__ of the Dataverse Software GitHub repo. Human-readable copies are available in `this Google Sheets
document <https://docs.google.com/spreadsheets/d/13HP-jI_cwLDHBetn9UKTREPJ_F4iHdAvhjmlvmYdSSw/edit#gid=0>`__ but they tend to get out of sync with the TSV files, which should be considered authoritative. The Dataverse Software installation process operates on the TSVs, not the Google spreadsheet.

About the metadata block TSV
Expand Down
14 changes: 12 additions & 2 deletions doc/sphinx-guides/source/admin/solr-search-index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,8 @@ Remove all Solr documents that are orphaned (i.e. not associated with objects in

``curl http://localhost:8080/api/admin/index/clear-orphans``

Clearing Data from Solr
~~~~~~~~~~~~~~~~~~~~~~~
Clearing ALL Data from Solr
~~~~~~~~~~~~~~~~~~~~~~~~~~~

Please note that the moment you issue this command, it will appear to end users looking at the root Dataverse installation page that all data is gone! This is because the root Dataverse installation page is powered by the search index.

Expand Down Expand Up @@ -86,6 +86,16 @@ To re-index a dataset by its database ID:

``curl http://localhost:8080/api/admin/index/datasets/7504557``

Clearing a Dataset from Solr
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This API will clear the Solr entry for the dataset specified. It can be useful if you have reasons to want to hide a published dataset from showing in search results and/or on Collection pages, but don't want to destroy and purge it from the database just yet.

``curl -X DELETE http://localhost:8080/api/admin/index/datasets/<DATABASE_ID>``

This can be reversed of course by re-indexing the dataset with the API above.


Manually Querying Solr
----------------------

Expand Down
6 changes: 3 additions & 3 deletions scripts/api/data/metadatablocks/astrophysics.tsv
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,13 @@
astrophysics Astronomy and Astrophysics Metadata
#datasetField name title description watermark fieldType displayOrder displayFormat advancedSearchField allowControlledVocabulary allowmultiples facetable displayoncreate required parent metadatablock_id
astroType Type The nature or genre of the content of the files in the dataset. text 0 TRUE TRUE TRUE TRUE FALSE FALSE astrophysics
astroFacility Facility The observatory or facility where the data was obtained. text 1 TRUE TRUE TRUE TRUE FALSE FALSE astrophysics
astroInstrument Instrument The instrument used to collect the data. text 2 TRUE TRUE TRUE TRUE FALSE FALSE astrophysics
astroFacility Facility The observatory or facility where the data was obtained. text 1 TRUE FALSE TRUE TRUE FALSE FALSE astrophysics
astroInstrument Instrument The instrument used to collect the data. text 2 TRUE FALSE TRUE TRUE FALSE FALSE astrophysics
astroObject Object Astronomical Objects represented in the data (Given as SIMBAD recognizable names preferred). text 3 TRUE FALSE TRUE TRUE FALSE FALSE astrophysics
resolution.Spatial Spatial Resolution The spatial (angular) resolution that is typical of the observations, in decimal degrees. text 4 TRUE FALSE FALSE TRUE FALSE FALSE astrophysics
resolution.Spectral Spectral Resolution The spectral resolution that is typical of the observations, given as the ratio \u03bb/\u0394\u03bb. text 5 TRUE FALSE FALSE TRUE FALSE FALSE astrophysics
resolution.Temporal Time Resolution The temporal resolution that is typical of the observations, given in seconds. text 6 FALSE FALSE FALSE FALSE FALSE FALSE astrophysics
coverage.Spectral.Bandpass Bandpass Conventional bandpass name text 7 TRUE TRUE TRUE TRUE FALSE FALSE astrophysics
coverage.Spectral.Bandpass Bandpass Conventional bandpass name text 7 TRUE FALSE TRUE TRUE FALSE FALSE astrophysics
coverage.Spectral.CentralWavelength Central Wavelength (m) The central wavelength of the spectral bandpass, in meters. Enter a floating-point number. float 8 TRUE FALSE TRUE TRUE FALSE FALSE astrophysics
coverage.Spectral.Wavelength Wavelength Range The minimum and maximum wavelength of the spectral bandpass. Enter a floating-point number. none 9 FALSE FALSE TRUE FALSE FALSE FALSE astrophysics
coverage.Spectral.MinimumWavelength Minimum (m) The minimum wavelength of the spectral bandpass, in meters. Enter a floating-point number. float 10 TRUE FALSE FALSE TRUE FALSE FALSE coverage.Spectral.Wavelength astrophysics
Expand Down
2 changes: 1 addition & 1 deletion scripts/api/data/metadatablocks/biomedical.tsv
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
studyAssayOtherTechnologyType Other Technology Type If Other was selected in Technology Type, list any other technology types that were used in this Dataset. text 9 TRUE FALSE TRUE TRUE FALSE FALSE biomedical
studyAssayPlatform Technology Platform The manufacturer and name of the technology platform used in the assay (e.g. Bruker AVANCE). text 10 TRUE TRUE TRUE TRUE FALSE FALSE biomedical
studyAssayOtherPlatform Other Technology Platform If Other was selected in Technology Platform, list any other technology platforms that were used in this Dataset. text 11 TRUE FALSE TRUE TRUE FALSE FALSE biomedical
studyAssayCellType Cell Type The name of the cell line from which the source or sample derives. text 12 TRUE TRUE TRUE TRUE FALSE FALSE biomedical
studyAssayCellType Cell Type The name of the cell line from which the source or sample derives. text 12 TRUE FALSE TRUE TRUE FALSE FALSE biomedical
#controlledVocabulary DatasetField Value identifier displayOrder
studyDesignType Case Control EFO_0001427 0
studyDesignType Cross Sectional EFO_0001428 1
Expand Down
2 changes: 1 addition & 1 deletion scripts/api/data/metadatablocks/citation.tsv
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@
seriesName Name The name of the dataset series text 66 #VALUE TRUE FALSE FALSE TRUE FALSE FALSE series citation
seriesInformation Information Can include 1) a history of the series and 2) a summary of features that apply to the series textbox 67 #VALUE FALSE FALSE FALSE FALSE FALSE FALSE series citation
software Software Information about the software used to generate the Dataset none 68 , FALSE FALSE TRUE FALSE FALSE FALSE citation https://www.w3.org/TR/prov-o/#wasGeneratedBy
softwareName Name The name of software used to generate the Dataset text 69 #VALUE FALSE TRUE FALSE FALSE FALSE FALSE software citation
softwareName Name The name of software used to generate the Dataset text 69 #VALUE FALSE FALSE FALSE FALSE FALSE FALSE software citation
softwareVersion Version The version of the software used to generate the Dataset, e.g. 4.11 text 70 #NAME: #VALUE FALSE FALSE FALSE FALSE FALSE FALSE software citation
relatedMaterial Related Material Information, such as a persistent ID or citation, about the material related to the Dataset, such as appendices or sampling information available outside of the Dataset textbox 71 FALSE FALSE TRUE FALSE FALSE FALSE citation
relatedDatasets Related Dataset Information, such as a persistent ID or citation, about a related dataset, such as previous research on the Dataset's subject textbox 72 FALSE FALSE TRUE FALSE FALSE FALSE citation http://purl.org/dc/terms/relation
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,8 @@

import jakarta.ejb.EJB;
import jakarta.ejb.Stateless;
import jakarta.ejb.TransactionAttribute;
import jakarta.ejb.TransactionAttributeType;
import jakarta.inject.Named;
import jakarta.json.Json;
import jakarta.json.JsonArray;
Expand All @@ -34,6 +36,7 @@
import jakarta.persistence.NoResultException;
import jakarta.persistence.NonUniqueResultException;
import jakarta.persistence.PersistenceContext;
import jakarta.persistence.PersistenceException;
import jakarta.persistence.TypedQuery;

import org.apache.commons.codec.digest.DigestUtils;
Expand All @@ -46,7 +49,6 @@
import org.apache.http.impl.client.HttpClients;
import org.apache.http.protocol.HttpContext;
import org.apache.http.util.EntityUtils;

import edu.harvard.iq.dataverse.settings.SettingsServiceBean;

/**
Expand Down Expand Up @@ -448,6 +450,7 @@ public JsonObject getExternalVocabularyValue(String termUri) {
* @param cvocEntry - the configuration for the DatasetFieldType associated with this term
* @param term - the term uri as a string
*/
@TransactionAttribute(TransactionAttributeType.REQUIRES_NEW)
public void registerExternalTerm(JsonObject cvocEntry, String term) {
String retrievalUri = cvocEntry.getString("retrieval-uri");
String prefix = cvocEntry.getString("prefix", null);
Expand Down Expand Up @@ -518,6 +521,8 @@ public void process(HttpResponse response, HttpContext context) throws HttpExcep
logger.fine("Wrote value for term: " + term);
} catch (JsonException je) {
logger.severe("Error retrieving: " + retrievalUri + " : " + je.getMessage());
} catch (PersistenceException e) {
logger.fine("Problem persisting: " + retrievalUri + " : " + e.getMessage());
}
} else {
logger.severe("Received response code : " + statusCode + " when retrieving " + retrievalUri
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -284,7 +284,7 @@ public void setDisplayOnCreate(boolean displayOnCreate) {
}

public boolean isControlledVocabulary() {
return controlledVocabularyValues != null && !controlledVocabularyValues.isEmpty();
return allowControlledVocabulary;
}

/**
Expand Down
2 changes: 1 addition & 1 deletion src/main/java/edu/harvard/iq/dataverse/DataversePage.java
Original file line number Diff line number Diff line change
Expand Up @@ -362,7 +362,7 @@ public void initFeaturedDataverses() {
List<Dataverse> featuredSource = new ArrayList<>();
List<Dataverse> featuredTarget = new ArrayList<>();
featuredSource.addAll(dataverseService.findAllPublishedByOwnerId(dataverse.getId()));
featuredSource.addAll(linkingService.findLinkingDataverses(dataverse.getId()));
featuredSource.addAll(linkingService.findLinkedDataverses(dataverse.getId()));
List<DataverseFeaturedDataverse> featuredList = featuredDataverseService.findByDataverseId(dataverse.getId());
for (DataverseFeaturedDataverse dfd : featuredList) {
Dataverse fd = dfd.getFeaturedDataverse();
Expand Down
25 changes: 24 additions & 1 deletion src/main/java/edu/harvard/iq/dataverse/api/Index.java
Original file line number Diff line number Diff line change
Expand Up @@ -215,7 +215,7 @@ public Response clearSolrIndex() {
return error(Status.INTERNAL_SERVER_ERROR, ex.getLocalizedMessage());
}
}

@GET
@Path("{type}/{id}")
public Response indexTypeById(@PathParam("type") String type, @PathParam("id") Long id) {
Expand Down Expand Up @@ -326,6 +326,29 @@ public Response indexDatasetByPersistentId(@QueryParam("persistentId") String pe
}
}

/**
* Clears the entry for a dataset from Solr
*
* @param id numer id of the dataset
* @return response;
* will return 404 if no such dataset in the database; but will attempt to
* clear the entry from Solr regardless.
*/
@DELETE
@Path("datasets/{id}")
public Response clearDatasetFromIndex(@PathParam("id") Long id) {
Dataset dataset = datasetService.find(id);
// We'll attempt to delete the Solr document regardless of whether the
// dataset exists in the database:
String response = indexService.removeSolrDocFromIndex(IndexServiceBean.solrDocIdentifierDataset + id);
if (dataset != null) {
return ok("Sent request to clear Solr document for dataset " + id + ": " + response);
} else {
return notFound("Could not find dataset " + id + " in the database. Requested to clear from Solr anyway: " + response);
}
}


/**
* This is just a demo of the modular math logic we use for indexAll.
*/
Expand Down
28 changes: 28 additions & 0 deletions src/main/java/edu/harvard/iq/dataverse/api/TestApi.java
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,34 @@ public Response getExternalToolsforFile(@PathParam("id") String idSupplied, @Que
return wr.getResponse();
}
}

@GET
@Path("datasets/{id}/externalTool/{toolId}")
public Response getExternalToolforDatasetById(@PathParam("id") String idSupplied, @PathParam("toolId") String toolId, @QueryParam("type") String typeSupplied) {
ExternalTool.Type type;
try {
type = ExternalTool.Type.fromString(typeSupplied);
} catch (IllegalArgumentException ex) {
return error(BAD_REQUEST, ex.getLocalizedMessage());
}
Dataset dataset;
try {
dataset = findDatasetOrDie(idSupplied);
JsonArrayBuilder tools = Json.createArrayBuilder();
List<ExternalTool> datasetTools = externalToolService.findDatasetToolsByType(type);
for (ExternalTool tool : datasetTools) {
ApiToken apiToken = externalToolService.getApiToken(getRequestApiKey());
ExternalToolHandler externalToolHandler = new ExternalToolHandler(tool, dataset, apiToken, null);
JsonObjectBuilder toolToJson = externalToolService.getToolAsJsonWithQueryParameters(externalToolHandler);
if (tool.getId().toString().equals(toolId)) {
return ok(toolToJson);
}
}
} catch (WrappedResponse wr) {
return wr.getResponse();
}
return error(BAD_REQUEST, "Could not find external tool with id of " + toolId);
}

@Path("files/{id}/externalTools")
@GET
Expand Down
Loading

0 comments on commit 56f3b4b

Please sign in to comment.