diff --git a/doc/release-notes/3437-new-index-api-added.md b/doc/release-notes/3437-new-index-api-added.md new file mode 100644 index 00000000000..2f40c65073f --- /dev/null +++ b/doc/release-notes/3437-new-index-api-added.md @@ -0,0 +1,4 @@ +(this API was added as a side feature of the pr #10222. the main point of the pr was an improvement in the OAI set housekeeping logic, I believe it's too obscure part of the system to warrant a relase note by itself. but the new API below needs to be announced). + +A new Index API endpoint has been added allowing an admin to clear an individual dataset from Solr. + diff --git a/doc/release-notes/9983-unique-constraints.md b/doc/release-notes/9983-unique-constraints.md new file mode 100644 index 00000000000..d889beb0718 --- /dev/null +++ b/doc/release-notes/9983-unique-constraints.md @@ -0,0 +1,14 @@ +This release adds two missing database constraints that will assure that the externalvocabularyvalue table only has one entry for each uri and that the oaiset table only has one set for each spec. (In the very unlikely case that your existing database has duplicate entries now, install would fail. This can be checked by running + +SELECT uri, count(*) FROM externalvocabularyvaluet group by uri; + +and + +SELECT spec, count(*) FROM oaiset group by spec; + +and then removing any duplicate rows (where count>1). + + + + +TODO: Whoever puts the release notes together should make sure there is the standard note about reloading metadata blocks for the citation, astrophysics, and biomedical blocks (plus any others from other PRs) after upgrading. \ No newline at end of file diff --git a/doc/sphinx-guides/source/_static/admin/dataverse-external-tools.tsv b/doc/sphinx-guides/source/_static/admin/dataverse-external-tools.tsv index c22392a7c5e..3bdbc3a482d 100644 --- a/doc/sphinx-guides/source/_static/admin/dataverse-external-tools.tsv +++ b/doc/sphinx-guides/source/_static/admin/dataverse-external-tools.tsv @@ -6,3 +6,4 @@ File Previewers explore file "A set of tools that display the content of files - Data Curation Tool configure file "A GUI for curating data by adding labels, groups, weights and other details to assist with informed reuse. See the README.md file at https://github.com/scholarsportal/Dataverse-Data-Curation-Tool for the installation instructions." Ask the Data query file Ask the Data is an experimental tool that allows you ask natural language questions about the data contained in Dataverse tables (tabular data). See the README.md file at https://github.com/IQSS/askdataverse/tree/main/askthedata for the instructions on adding Ask the Data to your Dataverse installation. TurboCurator by ICPSR configure dataset TurboCurator generates metadata improvements for title, description, and keywords. It relies on open AI's ChatGPT & ICPSR best practices. See the `TurboCurator Dataverse Administrator `_ page for more details on how it works and adding TurboCurator to your Dataverse installation. +JupyterHub explore file The `Dataverse-to-JupyterHub Data Transfer Connector `_ is a tool that simplifies the transfer of data between Dataverse repositories and the cloud-based platform JupyterHub. It is designed for researchers, scientists, and data analysts, facilitating collaboration on projects by seamlessly moving datasets and files. The tool is a lightweight client-side web application built using React and relies on the Dataverse External Tool feature, allowing for easy deployment on modern integration systems. Currently optimized for small to medium-sized files, future plans include extending support for larger files and signed Dataverse endpoints. For more details, you can refer to the external tool manifest: https://forgemia.inra.fr/dipso/eosc-pillar/dataverse-jupyterhub-connector/-/blob/master/externalTools.json diff --git a/doc/sphinx-guides/source/admin/integrations.rst b/doc/sphinx-guides/source/admin/integrations.rst index cae44d42dbf..1542c900ba2 100644 --- a/doc/sphinx-guides/source/admin/integrations.rst +++ b/doc/sphinx-guides/source/admin/integrations.rst @@ -197,6 +197,16 @@ Avgidea Data Search Researchers can use a Google Sheets add-on to search for Dataverse installation's CSV data and then import that data into a sheet. See `Avgidea Data Search `_ for details. +JupyterHub +++++++++++ + +The `Dataverse-to-JupyterHub Data Transfer Connector `_ streamlines data transfer between Dataverse repositories and the cloud-based platform JupyterHub, enhancing collaborative research. +This connector facilitates seamless two-way transfer of datasets and files, emphasizing the potential of an integrated research environment. +It is a lightweight client-side web application built using React and relying on the Dataverse External Tool feature, allowing for easy deployment on modern integration systems. Currently, it supports small to medium-sized files, with plans to enable support for large files and signed Dataverse endpoints in the future. + +What kind of user is the feature intended for? +The feature is intended for researchers, scientists and data analyst who are working with Dataverse instances and JupyterHub looking to ease the data transfer process. See `presentation `_ for details. + .. _integrations-discovery: Discoverability diff --git a/doc/sphinx-guides/source/admin/metadatacustomization.rst b/doc/sphinx-guides/source/admin/metadatacustomization.rst index 5bd28bfa103..78eadd9b2ce 100644 --- a/doc/sphinx-guides/source/admin/metadatacustomization.rst +++ b/doc/sphinx-guides/source/admin/metadatacustomization.rst @@ -37,8 +37,8 @@ tab-separated value (TSV). [1]_\ :sup:`,`\ [2]_ While it is technically possible to define more than one metadata block in a TSV file, it is good organizational practice to define only one in each file. -The metadata block TSVs shipped with the Dataverse Software are in `/tree/develop/scripts/api/data/metadatablocks -`__ and the corresponding ResourceBundle property files `/tree/develop/src/main/java `__ of the Dataverse Software GitHub repo. Human-readable copies are available in `this Google Sheets +The metadata block TSVs shipped with the Dataverse Software are in `/scripts/api/data/metadatablocks +`__ with the corresponding ResourceBundle property files in `/src/main/java/propertyFiles `__ of the Dataverse Software GitHub repo. Human-readable copies are available in `this Google Sheets document `__ but they tend to get out of sync with the TSV files, which should be considered authoritative. The Dataverse Software installation process operates on the TSVs, not the Google spreadsheet. About the metadata block TSV diff --git a/doc/sphinx-guides/source/admin/solr-search-index.rst b/doc/sphinx-guides/source/admin/solr-search-index.rst index e6f7b588ede..3f7b9d5b547 100644 --- a/doc/sphinx-guides/source/admin/solr-search-index.rst +++ b/doc/sphinx-guides/source/admin/solr-search-index.rst @@ -26,8 +26,8 @@ Remove all Solr documents that are orphaned (i.e. not associated with objects in ``curl http://localhost:8080/api/admin/index/clear-orphans`` -Clearing Data from Solr -~~~~~~~~~~~~~~~~~~~~~~~ +Clearing ALL Data from Solr +~~~~~~~~~~~~~~~~~~~~~~~~~~~ Please note that the moment you issue this command, it will appear to end users looking at the root Dataverse installation page that all data is gone! This is because the root Dataverse installation page is powered by the search index. @@ -86,6 +86,16 @@ To re-index a dataset by its database ID: ``curl http://localhost:8080/api/admin/index/datasets/7504557`` +Clearing a Dataset from Solr +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +This API will clear the Solr entry for the dataset specified. It can be useful if you have reasons to want to hide a published dataset from showing in search results and/or on Collection pages, but don't want to destroy and purge it from the database just yet. + +``curl -X DELETE http://localhost:8080/api/admin/index/datasets/`` + +This can be reversed of course by re-indexing the dataset with the API above. + + Manually Querying Solr ---------------------- diff --git a/scripts/api/data/metadatablocks/astrophysics.tsv b/scripts/api/data/metadatablocks/astrophysics.tsv index 4039d32cb75..92792d404c9 100644 --- a/scripts/api/data/metadatablocks/astrophysics.tsv +++ b/scripts/api/data/metadatablocks/astrophysics.tsv @@ -2,13 +2,13 @@ astrophysics Astronomy and Astrophysics Metadata #datasetField name title description watermark fieldType displayOrder displayFormat advancedSearchField allowControlledVocabulary allowmultiples facetable displayoncreate required parent metadatablock_id astroType Type The nature or genre of the content of the files in the dataset. text 0 TRUE TRUE TRUE TRUE FALSE FALSE astrophysics - astroFacility Facility The observatory or facility where the data was obtained. text 1 TRUE TRUE TRUE TRUE FALSE FALSE astrophysics - astroInstrument Instrument The instrument used to collect the data. text 2 TRUE TRUE TRUE TRUE FALSE FALSE astrophysics + astroFacility Facility The observatory or facility where the data was obtained. text 1 TRUE FALSE TRUE TRUE FALSE FALSE astrophysics + astroInstrument Instrument The instrument used to collect the data. text 2 TRUE FALSE TRUE TRUE FALSE FALSE astrophysics astroObject Object Astronomical Objects represented in the data (Given as SIMBAD recognizable names preferred). text 3 TRUE FALSE TRUE TRUE FALSE FALSE astrophysics resolution.Spatial Spatial Resolution The spatial (angular) resolution that is typical of the observations, in decimal degrees. text 4 TRUE FALSE FALSE TRUE FALSE FALSE astrophysics resolution.Spectral Spectral Resolution The spectral resolution that is typical of the observations, given as the ratio \u03bb/\u0394\u03bb. text 5 TRUE FALSE FALSE TRUE FALSE FALSE astrophysics resolution.Temporal Time Resolution The temporal resolution that is typical of the observations, given in seconds. text 6 FALSE FALSE FALSE FALSE FALSE FALSE astrophysics - coverage.Spectral.Bandpass Bandpass Conventional bandpass name text 7 TRUE TRUE TRUE TRUE FALSE FALSE astrophysics + coverage.Spectral.Bandpass Bandpass Conventional bandpass name text 7 TRUE FALSE TRUE TRUE FALSE FALSE astrophysics coverage.Spectral.CentralWavelength Central Wavelength (m) The central wavelength of the spectral bandpass, in meters. Enter a floating-point number. float 8 TRUE FALSE TRUE TRUE FALSE FALSE astrophysics coverage.Spectral.Wavelength Wavelength Range The minimum and maximum wavelength of the spectral bandpass. Enter a floating-point number. none 9 FALSE FALSE TRUE FALSE FALSE FALSE astrophysics coverage.Spectral.MinimumWavelength Minimum (m) The minimum wavelength of the spectral bandpass, in meters. Enter a floating-point number. float 10 TRUE FALSE FALSE TRUE FALSE FALSE coverage.Spectral.Wavelength astrophysics diff --git a/scripts/api/data/metadatablocks/biomedical.tsv b/scripts/api/data/metadatablocks/biomedical.tsv index 28d59130c34..d70f754336a 100644 --- a/scripts/api/data/metadatablocks/biomedical.tsv +++ b/scripts/api/data/metadatablocks/biomedical.tsv @@ -13,7 +13,7 @@ studyAssayOtherTechnologyType Other Technology Type If Other was selected in Technology Type, list any other technology types that were used in this Dataset. text 9 TRUE FALSE TRUE TRUE FALSE FALSE biomedical studyAssayPlatform Technology Platform The manufacturer and name of the technology platform used in the assay (e.g. Bruker AVANCE). text 10 TRUE TRUE TRUE TRUE FALSE FALSE biomedical studyAssayOtherPlatform Other Technology Platform If Other was selected in Technology Platform, list any other technology platforms that were used in this Dataset. text 11 TRUE FALSE TRUE TRUE FALSE FALSE biomedical - studyAssayCellType Cell Type The name of the cell line from which the source or sample derives. text 12 TRUE TRUE TRUE TRUE FALSE FALSE biomedical + studyAssayCellType Cell Type The name of the cell line from which the source or sample derives. text 12 TRUE FALSE TRUE TRUE FALSE FALSE biomedical #controlledVocabulary DatasetField Value identifier displayOrder studyDesignType Case Control EFO_0001427 0 studyDesignType Cross Sectional EFO_0001428 1 diff --git a/scripts/api/data/metadatablocks/citation.tsv b/scripts/api/data/metadatablocks/citation.tsv index 2f39086464d..bcc7ed4866d 100644 --- a/scripts/api/data/metadatablocks/citation.tsv +++ b/scripts/api/data/metadatablocks/citation.tsv @@ -70,7 +70,7 @@ seriesName Name The name of the dataset series text 66 #VALUE TRUE FALSE FALSE TRUE FALSE FALSE series citation seriesInformation Information Can include 1) a history of the series and 2) a summary of features that apply to the series textbox 67 #VALUE FALSE FALSE FALSE FALSE FALSE FALSE series citation software Software Information about the software used to generate the Dataset none 68 , FALSE FALSE TRUE FALSE FALSE FALSE citation https://www.w3.org/TR/prov-o/#wasGeneratedBy - softwareName Name The name of software used to generate the Dataset text 69 #VALUE FALSE TRUE FALSE FALSE FALSE FALSE software citation + softwareName Name The name of software used to generate the Dataset text 69 #VALUE FALSE FALSE FALSE FALSE FALSE FALSE software citation softwareVersion Version The version of the software used to generate the Dataset, e.g. 4.11 text 70 #NAME: #VALUE FALSE FALSE FALSE FALSE FALSE FALSE software citation relatedMaterial Related Material Information, such as a persistent ID or citation, about the material related to the Dataset, such as appendices or sampling information available outside of the Dataset textbox 71 FALSE FALSE TRUE FALSE FALSE FALSE citation relatedDatasets Related Dataset Information, such as a persistent ID or citation, about a related dataset, such as previous research on the Dataset's subject textbox 72 FALSE FALSE TRUE FALSE FALSE FALSE citation http://purl.org/dc/terms/relation diff --git a/src/main/java/edu/harvard/iq/dataverse/DatasetFieldServiceBean.java b/src/main/java/edu/harvard/iq/dataverse/DatasetFieldServiceBean.java index ce2b00086ec..6223cd83773 100644 --- a/src/main/java/edu/harvard/iq/dataverse/DatasetFieldServiceBean.java +++ b/src/main/java/edu/harvard/iq/dataverse/DatasetFieldServiceBean.java @@ -19,6 +19,8 @@ import jakarta.ejb.EJB; import jakarta.ejb.Stateless; +import jakarta.ejb.TransactionAttribute; +import jakarta.ejb.TransactionAttributeType; import jakarta.inject.Named; import jakarta.json.Json; import jakarta.json.JsonArray; @@ -34,6 +36,7 @@ import jakarta.persistence.NoResultException; import jakarta.persistence.NonUniqueResultException; import jakarta.persistence.PersistenceContext; +import jakarta.persistence.PersistenceException; import jakarta.persistence.TypedQuery; import org.apache.commons.codec.digest.DigestUtils; @@ -46,7 +49,6 @@ import org.apache.http.impl.client.HttpClients; import org.apache.http.protocol.HttpContext; import org.apache.http.util.EntityUtils; - import edu.harvard.iq.dataverse.settings.SettingsServiceBean; /** @@ -448,6 +450,7 @@ public JsonObject getExternalVocabularyValue(String termUri) { * @param cvocEntry - the configuration for the DatasetFieldType associated with this term * @param term - the term uri as a string */ + @TransactionAttribute(TransactionAttributeType.REQUIRES_NEW) public void registerExternalTerm(JsonObject cvocEntry, String term) { String retrievalUri = cvocEntry.getString("retrieval-uri"); String prefix = cvocEntry.getString("prefix", null); @@ -518,6 +521,8 @@ public void process(HttpResponse response, HttpContext context) throws HttpExcep logger.fine("Wrote value for term: " + term); } catch (JsonException je) { logger.severe("Error retrieving: " + retrievalUri + " : " + je.getMessage()); + } catch (PersistenceException e) { + logger.fine("Problem persisting: " + retrievalUri + " : " + e.getMessage()); } } else { logger.severe("Received response code : " + statusCode + " when retrieving " + retrievalUri diff --git a/src/main/java/edu/harvard/iq/dataverse/DatasetFieldType.java b/src/main/java/edu/harvard/iq/dataverse/DatasetFieldType.java index 824b486a42d..01785359e0e 100644 --- a/src/main/java/edu/harvard/iq/dataverse/DatasetFieldType.java +++ b/src/main/java/edu/harvard/iq/dataverse/DatasetFieldType.java @@ -284,7 +284,7 @@ public void setDisplayOnCreate(boolean displayOnCreate) { } public boolean isControlledVocabulary() { - return controlledVocabularyValues != null && !controlledVocabularyValues.isEmpty(); + return allowControlledVocabulary; } /** diff --git a/src/main/java/edu/harvard/iq/dataverse/DataversePage.java b/src/main/java/edu/harvard/iq/dataverse/DataversePage.java index 943a74327d5..3dbc22902b0 100644 --- a/src/main/java/edu/harvard/iq/dataverse/DataversePage.java +++ b/src/main/java/edu/harvard/iq/dataverse/DataversePage.java @@ -362,7 +362,7 @@ public void initFeaturedDataverses() { List featuredSource = new ArrayList<>(); List featuredTarget = new ArrayList<>(); featuredSource.addAll(dataverseService.findAllPublishedByOwnerId(dataverse.getId())); - featuredSource.addAll(linkingService.findLinkingDataverses(dataverse.getId())); + featuredSource.addAll(linkingService.findLinkedDataverses(dataverse.getId())); List featuredList = featuredDataverseService.findByDataverseId(dataverse.getId()); for (DataverseFeaturedDataverse dfd : featuredList) { Dataverse fd = dfd.getFeaturedDataverse(); diff --git a/src/main/java/edu/harvard/iq/dataverse/api/Index.java b/src/main/java/edu/harvard/iq/dataverse/api/Index.java index 4910c460b6a..c30a77acb58 100644 --- a/src/main/java/edu/harvard/iq/dataverse/api/Index.java +++ b/src/main/java/edu/harvard/iq/dataverse/api/Index.java @@ -215,7 +215,7 @@ public Response clearSolrIndex() { return error(Status.INTERNAL_SERVER_ERROR, ex.getLocalizedMessage()); } } - + @GET @Path("{type}/{id}") public Response indexTypeById(@PathParam("type") String type, @PathParam("id") Long id) { @@ -326,6 +326,29 @@ public Response indexDatasetByPersistentId(@QueryParam("persistentId") String pe } } + /** + * Clears the entry for a dataset from Solr + * + * @param id numer id of the dataset + * @return response; + * will return 404 if no such dataset in the database; but will attempt to + * clear the entry from Solr regardless. + */ + @DELETE + @Path("datasets/{id}") + public Response clearDatasetFromIndex(@PathParam("id") Long id) { + Dataset dataset = datasetService.find(id); + // We'll attempt to delete the Solr document regardless of whether the + // dataset exists in the database: + String response = indexService.removeSolrDocFromIndex(IndexServiceBean.solrDocIdentifierDataset + id); + if (dataset != null) { + return ok("Sent request to clear Solr document for dataset " + id + ": " + response); + } else { + return notFound("Could not find dataset " + id + " in the database. Requested to clear from Solr anyway: " + response); + } + } + + /** * This is just a demo of the modular math logic we use for indexAll. */ diff --git a/src/main/java/edu/harvard/iq/dataverse/api/TestApi.java b/src/main/java/edu/harvard/iq/dataverse/api/TestApi.java index 10510013495..b9db44b2671 100644 --- a/src/main/java/edu/harvard/iq/dataverse/api/TestApi.java +++ b/src/main/java/edu/harvard/iq/dataverse/api/TestApi.java @@ -44,6 +44,34 @@ public Response getExternalToolsforFile(@PathParam("id") String idSupplied, @Que return wr.getResponse(); } } + + @GET + @Path("datasets/{id}/externalTool/{toolId}") + public Response getExternalToolforDatasetById(@PathParam("id") String idSupplied, @PathParam("toolId") String toolId, @QueryParam("type") String typeSupplied) { + ExternalTool.Type type; + try { + type = ExternalTool.Type.fromString(typeSupplied); + } catch (IllegalArgumentException ex) { + return error(BAD_REQUEST, ex.getLocalizedMessage()); + } + Dataset dataset; + try { + dataset = findDatasetOrDie(idSupplied); + JsonArrayBuilder tools = Json.createArrayBuilder(); + List datasetTools = externalToolService.findDatasetToolsByType(type); + for (ExternalTool tool : datasetTools) { + ApiToken apiToken = externalToolService.getApiToken(getRequestApiKey()); + ExternalToolHandler externalToolHandler = new ExternalToolHandler(tool, dataset, apiToken, null); + JsonObjectBuilder toolToJson = externalToolService.getToolAsJsonWithQueryParameters(externalToolHandler); + if (tool.getId().toString().equals(toolId)) { + return ok(toolToJson); + } + } + } catch (WrappedResponse wr) { + return wr.getResponse(); + } + return error(BAD_REQUEST, "Could not find external tool with id of " + toolId); + } @Path("files/{id}/externalTools") @GET diff --git a/src/main/java/edu/harvard/iq/dataverse/dataaccess/ImageThumbConverter.java b/src/main/java/edu/harvard/iq/dataverse/dataaccess/ImageThumbConverter.java index 2de37174a3b..1be2bb79e0f 100644 --- a/src/main/java/edu/harvard/iq/dataverse/dataaccess/ImageThumbConverter.java +++ b/src/main/java/edu/harvard/iq/dataverse/dataaccess/ImageThumbConverter.java @@ -208,6 +208,7 @@ private static boolean generatePDFThumbnail(StorageIO storageIO, int s // will run the ImageMagick on it, and will save its output in another temp // file, and will save it as an "auxiliary" file via the driver. boolean tempFilesRequired = false; + File tempFile = null; try { Path pdfFilePath = storageIO.getFileSystemPath(); @@ -225,7 +226,7 @@ private static boolean generatePDFThumbnail(StorageIO storageIO, int s } if (tempFilesRequired) { - InputStream inputStream = null; + InputStream inputStream = null; try { storageIO.open(); inputStream = storageIO.getInputStream(); @@ -234,12 +235,11 @@ private static boolean generatePDFThumbnail(StorageIO storageIO, int s return false; } - File tempFile; OutputStream outputStream = null; try { tempFile = File.createTempFile("tempFileToRescale", ".tmp"); outputStream = new FileOutputStream(tempFile); - //Reads/transfers all bytes from the input stream to the output stream. + //Reads/transfers all bytes from the input stream to the output stream. inputStream.transferTo(outputStream); } catch (IOException ioex) { logger.warning("GenerateImageThumb: failed to save pdf bytes in a temporary file."); @@ -270,6 +270,12 @@ private static boolean generatePDFThumbnail(StorageIO storageIO, int s logger.warning("failed to save generated pdf thumbnail, as AUX file " + THUMBNAIL_SUFFIX + size + "!"); return false; } + finally { + try { + tempFile.delete(); + } + catch (Exception e) {} + } } return true; @@ -371,6 +377,14 @@ private static boolean generateImageThumbnailFromInputStream(StorageIO logger.warning("Failed to rescale and/or save the image: " + ioex.getMessage()); return false; } + finally { + if(tempFileRequired) { + try { + tempFile.delete(); + } + catch (Exception e) {} + } + } return true; diff --git a/src/main/java/edu/harvard/iq/dataverse/harvest/server/OAIRecordServiceBean.java b/src/main/java/edu/harvard/iq/dataverse/harvest/server/OAIRecordServiceBean.java index 1b4a7bc7db0..cc15d4c978b 100644 --- a/src/main/java/edu/harvard/iq/dataverse/harvest/server/OAIRecordServiceBean.java +++ b/src/main/java/edu/harvard/iq/dataverse/harvest/server/OAIRecordServiceBean.java @@ -40,10 +40,6 @@ @Stateless @Named public class OAIRecordServiceBean implements java.io.Serializable { - @EJB - OAISetServiceBean oaiSetService; - @EJB - IndexServiceBean indexService; @EJB DatasetServiceBean datasetService; @EJB @@ -55,13 +51,24 @@ public class OAIRecordServiceBean implements java.io.Serializable { EntityManager em; private static final Logger logger = Logger.getLogger("edu.harvard.iq.dataverse.harvest.server.OAIRecordServiceBean"); - - public void updateOaiRecords(String setName, List datasetIds, Date updateTime, boolean doExport) { - updateOaiRecords(setName, datasetIds, updateTime, doExport, logger); - } - public void updateOaiRecords(String setName, List datasetIds, Date updateTime, boolean doExport, Logger setUpdateLogger) { - + /** + * Updates the OAI records for the set specified + * @param setName name of the OAI set + * @param datasetIds ids of the datasets that are candidates for this OAI set + * @param updateTime time stamp + * @param doExport attempt to export datasets that haven't been exported yet + * @param confirmed true if the datasetIds above were looked up in the database + * - as opposed to in the search engine. Meaning, that it is + * confirmed that any dataset not on this list that's currently + * in the set is no longer in the database and should be + * marked as deleted without any further checks. Otherwise + * we'll want to double-check if the dataset still exists + * as published. This is to prevent marking existing datasets + * as deleted during a full reindex etc. + * @param setUpdateLogger dedicated Logger + */ + public void updateOaiRecords(String setName, List datasetIds, Date updateTime, boolean doExport, boolean confirmed, Logger setUpdateLogger) { // create Map of OaiRecords List oaiRecords = findOaiRecordsBySetName(setName); Map recordMap = new HashMap<>(); @@ -101,9 +108,6 @@ public void updateOaiRecords(String setName, List datasetIds, Date updateT DatasetVersion releasedVersion = dataset.getReleasedVersion(); Date publicationDate = releasedVersion == null ? null : releasedVersion.getReleaseTime(); - //if (dataset.getPublicationDate() != null - // && (dataset.getLastExportTime() == null - // || dataset.getLastExportTime().before(dataset.getPublicationDate()))) { if (publicationDate != null && (dataset.getLastExportTime() == null || dataset.getLastExportTime().before(publicationDate))) { @@ -125,7 +129,9 @@ public void updateOaiRecords(String setName, List datasetIds, Date updateT } // anything left in the map should be marked as removed! - markOaiRecordsAsRemoved( recordMap.values(), updateTime, setUpdateLogger); + markOaiRecordsAsRemoved(recordMap.values(), updateTime, confirmed, setUpdateLogger); + + } @@ -162,7 +168,7 @@ record = new OAIRecord(setName, dataset.getGlobalId().asString(), new Date()); } } - + /* // Updates any existing OAI records for this dataset // Should be called whenever there's a change in the release status of the Dataset // (i.e., when it's published or deaccessioned), so that the timestamps and @@ -201,13 +207,31 @@ public void updateOaiRecordsForDataset(Dataset dataset) { logger.fine("Null returned - no records found."); } } +*/ - public void markOaiRecordsAsRemoved(Collection records, Date updateTime, Logger setUpdateLogger) { + public void markOaiRecordsAsRemoved(Collection records, Date updateTime, boolean confirmed, Logger setUpdateLogger) { for (OAIRecord oaiRecord : records) { if ( !oaiRecord.isRemoved() ) { - setUpdateLogger.fine("marking OAI record "+oaiRecord.getGlobalId()+" as removed"); - oaiRecord.setRemoved(true); - oaiRecord.setLastUpdateTime(updateTime); + boolean confirmedRemoved = confirmed; + if (!confirmedRemoved) { + Dataset lookedUp = datasetService.findByGlobalId(oaiRecord.getGlobalId()); + if (lookedUp == null) { + confirmedRemoved = true; + } else if (lookedUp.getLastExportTime() == null) { + confirmedRemoved = true; + } else { + boolean isReleased = lookedUp.getReleasedVersion() != null; + if (!isReleased) { + confirmedRemoved = true; + } + } + } + + if (confirmedRemoved) { + setUpdateLogger.fine("marking OAI record "+oaiRecord.getGlobalId()+" as removed"); + oaiRecord.setRemoved(true); + oaiRecord.setLastUpdateTime(updateTime); + } } else { setUpdateLogger.fine("OAI record "+oaiRecord.getGlobalId()+" is already marked as removed."); } diff --git a/src/main/java/edu/harvard/iq/dataverse/harvest/server/OAISetServiceBean.java b/src/main/java/edu/harvard/iq/dataverse/harvest/server/OAISetServiceBean.java index d5c78c36b98..b3a09391bf3 100644 --- a/src/main/java/edu/harvard/iq/dataverse/harvest/server/OAISetServiceBean.java +++ b/src/main/java/edu/harvard/iq/dataverse/harvest/server/OAISetServiceBean.java @@ -171,6 +171,8 @@ public void exportOaiSet(OAISet oaiSet, Logger exportLogger) { String query = managedSet.getDefinition(); List datasetIds; + boolean databaseLookup = false; // As opposed to a search engine lookup + try { if (!oaiSet.isDefaultSet()) { datasetIds = expandSetQuery(query); @@ -181,6 +183,7 @@ public void exportOaiSet(OAISet oaiSet, Logger exportLogger) { // including the unpublished drafts and deaccessioned ones. // Those will be filtered out further down the line. datasetIds = datasetService.findAllLocalDatasetIds(); + databaseLookup = true; } } catch (OaiSetException ose) { datasetIds = null; @@ -191,7 +194,7 @@ public void exportOaiSet(OAISet oaiSet, Logger exportLogger) { // they will be properly marked as "deleted"! -- L.A. 4.5 //if (datasetIds != null && !datasetIds.isEmpty()) { exportLogger.info("Calling OAI Record Service to re-export " + datasetIds.size() + " datasets."); - oaiRecordService.updateOaiRecords(managedSet.getSpec(), datasetIds, new Date(), true, exportLogger); + oaiRecordService.updateOaiRecords(managedSet.getSpec(), datasetIds, new Date(), true, databaseLookup, exportLogger); //} managedSet.setUpdateInProgress(false); diff --git a/src/main/java/propertyFiles/Bundle.properties b/src/main/java/propertyFiles/Bundle.properties index 157f2ecaf54..f1c8381816c 100644 --- a/src/main/java/propertyFiles/Bundle.properties +++ b/src/main/java/propertyFiles/Bundle.properties @@ -875,7 +875,7 @@ dataverse.option.deleteDataverse=Delete Dataverse dataverse.publish.btn=Publish dataverse.publish.header=Publish Dataverse dataverse.nopublished=No Published Dataverses -dataverse.nopublished.tip=In order to use this feature you must have at least one published dataverse. +dataverse.nopublished.tip=In order to use this feature you must have at least one published or linked dataverse. dataverse.contact=Email Dataverse Contact dataverse.link=Link Dataverse dataverse.link.btn.tip=Link to Your Dataverse diff --git a/src/main/resources/db/migration/V6.1.0.3__9983-missing-unique-constraints.sql b/src/main/resources/db/migration/V6.1.0.3__9983-missing-unique-constraints.sql new file mode 100644 index 00000000000..6cb3a455e4e --- /dev/null +++ b/src/main/resources/db/migration/V6.1.0.3__9983-missing-unique-constraints.sql @@ -0,0 +1,16 @@ +DO $$ +BEGIN + + BEGIN + ALTER TABLE externalvocabularyvalue ADD CONSTRAINT externalvocabularvalue_uri_key UNIQUE(uri); + EXCEPTION + WHEN duplicate_table THEN RAISE NOTICE 'Table unique constraint externalvocabularvalue_uri_key already exists'; + END; + + BEGIN + ALTER TABLE oaiset ADD CONSTRAINT oaiset_spec_key UNIQUE(spec); + EXCEPTION + WHEN duplicate_table THEN RAISE NOTICE 'Table unique constraint oaiset_spec_key already exists'; + END; + +END $$; \ No newline at end of file diff --git a/src/test/java/edu/harvard/iq/dataverse/api/ExternalToolsIT.java b/src/test/java/edu/harvard/iq/dataverse/api/ExternalToolsIT.java index 9a280f475a1..22abf6fa2e3 100644 --- a/src/test/java/edu/harvard/iq/dataverse/api/ExternalToolsIT.java +++ b/src/test/java/edu/harvard/iq/dataverse/api/ExternalToolsIT.java @@ -197,7 +197,7 @@ public void testDatasetLevelTool1() { .statusCode(OK.getStatusCode()) .body("data.displayName", CoreMatchers.equalTo("DatasetTool1")); - long toolId = JsonPath.from(addExternalTool.getBody().asString()).getLong("data.id"); + Long toolId = JsonPath.from(addExternalTool.getBody().asString()).getLong("data.id"); Response getExternalToolsByDatasetIdInvalidType = UtilIT.getExternalToolsForDataset(datasetId.toString(), "invalidType", apiToken); getExternalToolsByDatasetIdInvalidType.prettyPrint(); @@ -205,12 +205,12 @@ public void testDatasetLevelTool1() { .statusCode(BAD_REQUEST.getStatusCode()) .body("message", CoreMatchers.equalTo("Type must be one of these values: [explore, configure, preview, query].")); - Response getExternalToolsByDatasetId = UtilIT.getExternalToolsForDataset(datasetId.toString(), "explore", apiToken); + Response getExternalToolsByDatasetId = UtilIT.getExternalToolForDatasetById(datasetId.toString(), "explore", apiToken, toolId.toString()); getExternalToolsByDatasetId.prettyPrint(); getExternalToolsByDatasetId.then().assertThat() - .body("data[0].displayName", CoreMatchers.equalTo("DatasetTool1")) - .body("data[0].scope", CoreMatchers.equalTo("dataset")) - .body("data[0].toolUrlWithQueryParams", CoreMatchers.equalTo("http://datasettool1.com?datasetPid=" + datasetPid + "&key=" + apiToken)) + .body("data.displayName", CoreMatchers.equalTo("DatasetTool1")) + .body("data.scope", CoreMatchers.equalTo("dataset")) + .body("data.toolUrlWithQueryParams", CoreMatchers.equalTo("http://datasettool1.com?datasetPid=" + datasetPid + "&key=" + apiToken)) .statusCode(OK.getStatusCode()); //Delete the tool added by this test... @@ -271,15 +271,14 @@ public void testDatasetLevelToolConfigure() { .statusCode(OK.getStatusCode()) .body("data.displayName", CoreMatchers.equalTo("Dataset Configurator")); - long toolId = JsonPath.from(addExternalTool.getBody().asString()).getLong("data.id"); - - Response getExternalToolsByDatasetId = UtilIT.getExternalToolsForDataset(datasetId.toString(), "configure", apiToken); + Long toolId = JsonPath.from(addExternalTool.getBody().asString()).getLong("data.id"); + Response getExternalToolsByDatasetId = UtilIT.getExternalToolForDatasetById(datasetId.toString(), "configure", apiToken, toolId.toString()); getExternalToolsByDatasetId.prettyPrint(); getExternalToolsByDatasetId.then().assertThat() - .body("data[0].displayName", CoreMatchers.equalTo("Dataset Configurator")) - .body("data[0].scope", CoreMatchers.equalTo("dataset")) - .body("data[0].types[0]", CoreMatchers.equalTo("configure")) - .body("data[0].toolUrlWithQueryParams", CoreMatchers.equalTo("https://datasetconfigurator.com?datasetPid=" + datasetPid)) + .body("data.displayName", CoreMatchers.equalTo("Dataset Configurator")) + .body("data.scope", CoreMatchers.equalTo("dataset")) + .body("data.types[0]", CoreMatchers.equalTo("configure")) + .body("data.toolUrlWithQueryParams", CoreMatchers.equalTo("https://datasetconfigurator.com?datasetPid=" + datasetPid)) .statusCode(OK.getStatusCode()); //Delete the tool added by this test... @@ -594,7 +593,7 @@ public void testFileLevelToolWithAuxFileReq() throws IOException { .statusCode(OK.getStatusCode()) .body("data.displayName", CoreMatchers.equalTo("HDF5 Tool")); - long toolId = JsonPath.from(addExternalTool.getBody().asString()).getLong("data.id"); + Long toolId = JsonPath.from(addExternalTool.getBody().asString()).getLong("data.id"); Response getTool = UtilIT.getExternalTool(toolId); getTool.prettyPrint(); @@ -610,13 +609,13 @@ public void testFileLevelToolWithAuxFileReq() throws IOException { .body("data", Matchers.hasSize(0)); // The tool shows for a true HDF5 file. The NcML aux file is available. Requirements met. - Response getToolsForTrueHdf5 = UtilIT.getExternalToolsForFile(trueHdf5.toString(), "preview", apiToken); + Response getToolsForTrueHdf5 = UtilIT.getExternalToolForFileById(trueHdf5.toString(), "preview", apiToken, toolId.toString()); getToolsForTrueHdf5.prettyPrint(); getToolsForTrueHdf5.then().assertThat() .statusCode(OK.getStatusCode()) - .body("data[0].displayName", CoreMatchers.equalTo("HDF5 Tool")) - .body("data[0].scope", CoreMatchers.equalTo("file")) - .body("data[0].contentType", CoreMatchers.equalTo("application/x-hdf5")); + .body("data.displayName", CoreMatchers.equalTo("HDF5 Tool")) + .body("data.scope", CoreMatchers.equalTo("file")) + .body("data.contentType", CoreMatchers.equalTo("application/x-hdf5")); //Delete the tool added by this test... Response deleteExternalTool = UtilIT.deleteExternalTool(toolId); diff --git a/src/test/java/edu/harvard/iq/dataverse/api/HarvestingServerIT.java b/src/test/java/edu/harvard/iq/dataverse/api/HarvestingServerIT.java index cffe730a806..57a12224c89 100644 --- a/src/test/java/edu/harvard/iq/dataverse/api/HarvestingServerIT.java +++ b/src/test/java/edu/harvard/iq/dataverse/api/HarvestingServerIT.java @@ -23,6 +23,7 @@ import static org.junit.jupiter.api.Assertions.assertFalse; import static org.junit.jupiter.api.Assertions.assertNotNull; +import static org.junit.jupiter.api.Assertions.assertNull; import static org.junit.jupiter.api.Assertions.assertTrue; import static org.junit.jupiter.api.Assertions.assertEquals; @@ -39,6 +40,7 @@ public class HarvestingServerIT { private static String adminUserAPIKey; private static String singleSetDatasetIdentifier; private static String singleSetDatasetPersistentId; + private static Integer singleSetDatasetDatabaseId; private static List extraDatasetsIdentifiers = new ArrayList<>(); @BeforeAll @@ -84,7 +86,7 @@ private static void setupDatasets() { // create dataset: Response createDatasetResponse = UtilIT.createRandomDatasetViaNativeApi(dataverseAlias, adminUserAPIKey); createDatasetResponse.prettyPrint(); - Integer datasetId = UtilIT.getDatasetIdFromResponse(createDatasetResponse); + singleSetDatasetDatabaseId = UtilIT.getDatasetIdFromResponse(createDatasetResponse); // retrieve the global id: singleSetDatasetPersistentId = UtilIT.getDatasetPersistentIdFromResponse(createDatasetResponse); @@ -104,13 +106,13 @@ private static void setupDatasets() { // So wait for all of this to finish. UtilIT.sleepForReexport(singleSetDatasetPersistentId, adminUserAPIKey, 10); - // ... And let's create 4 more datasets for a multi-dataset experiment: + // ... And let's create 5 more datasets for a multi-dataset experiment: - for (int i = 0; i < 4; i++) { + for (int i = 0; i < 5; i++) { // create dataset: createDatasetResponse = UtilIT.createRandomDatasetViaNativeApi(dataverseAlias, adminUserAPIKey); createDatasetResponse.prettyPrint(); - datasetId = UtilIT.getDatasetIdFromResponse(createDatasetResponse); + Integer datasetId = UtilIT.getDatasetIdFromResponse(createDatasetResponse); // retrieve the global id: String thisDatasetPersistentId = UtilIT.getDatasetPersistentIdFromResponse(createDatasetResponse); @@ -415,6 +417,11 @@ public void testSetEditAPIandOAIlistSets() throws InterruptedException { // OAI set with a single dataset, and attempt to retrieve // it and validate the OAI server responses of the corresponding // ListIdentifiers, ListRecords and GetRecord methods. + // Finally, we will make sure that the test reexport survives + // a reexport when the control dataset is dropped from the search + // index temporarily (if, for example, the site admin cleared their + // solr index in order to reindex everything from scratch - which + // can take a while on a large database). This is per #3437 @Test public void testSingleRecordOaiSet() throws InterruptedException { // Let's try and create an OAI set with the "single set dataset" that @@ -569,6 +576,83 @@ public void testSingleRecordOaiSet() throws InterruptedException { assertEquals("Medicine, Health and Life Sciences", responseXmlPath.getString("OAI-PMH.GetRecord.record.metadata.dc.subject")); // ok, looks legit! + + // Now, let's clear this dataset from Solr: + Response solrClearResponse = UtilIT.indexClearDataset(singleSetDatasetDatabaseId); + assertEquals(200, solrClearResponse.getStatusCode()); + solrClearResponse.prettyPrint(); + + // Now, let's re-export the set. The search query that defines the set + // will no longer find it (todo: confirm this first?). However, since + // the dataset still exists in the database; and would in real life + // be reindexed again, we don't want to mark the OAI record for the + // dataset as "deleted" just yet. (this is a new feature, as of 6.2) + // So, let's re-export the set... + + exportSetResponse = UtilIT.exportOaiSet(setName); + assertEquals(200, exportSetResponse.getStatusCode()); + Thread.sleep(1000L); // wait for just a second, to be safe + + // OAI Test 5. Check ListIdentifiers again: + + Response listIdentifiersResponse = UtilIT.getOaiListIdentifiers(setName, "oai_dc"); + assertEquals(OK.getStatusCode(), listIdentifiersResponse.getStatusCode()); + + // Validate the service section of the OAI response: + responseXmlPath = validateOaiVerbResponse(listIdentifiersResponse, "ListIdentifiers"); + + // ... and confirm that the record for our dataset is still listed + // as active: + List ret = responseXmlPath.getList("OAI-PMH.ListIdentifiers.header"); + + assertEquals(1, ret.size()); + assertEquals(singleSetDatasetPersistentId, responseXmlPath + .getString("OAI-PMH.ListIdentifiers.header.identifier")); + assertEquals(setName, responseXmlPath + .getString("OAI-PMH.ListIdentifiers.header.setSpec")); + // ... and, most importantly, make sure the record does not have a + // `status="deleted"` attribute: + assertNull(responseXmlPath.getString("OAI-PMH.ListIdentifiers.header.@status")); + + // TODO: (?) we could also destroy the dataset for real now, and make + // sure the "deleted" attribute has been added to the OAI record. + + // While we are at it, let's now destroy this dataset for real, and + // make sure the "deleted" attribute is actually added once the set + // is re-exported: + + Response destroyDatasetResponse = UtilIT.destroyDataset(singleSetDatasetPersistentId, adminUserAPIKey); + assertEquals(200, destroyDatasetResponse.getStatusCode()); + destroyDatasetResponse.prettyPrint(); + + // Confirm that it no longer exists: + Response datasetNotFoundResponse = UtilIT.nativeGet(singleSetDatasetDatabaseId, adminUserAPIKey); + assertEquals(404, datasetNotFoundResponse.getStatusCode()); + + // Repeat the whole production with re-exporting set and checking + // ListIdentifiers: + + exportSetResponse = UtilIT.exportOaiSet(setName); + assertEquals(200, exportSetResponse.getStatusCode()); + Thread.sleep(1000L); // wait for just a second, to be safe + System.out.println("re-exported the dataset again, with the control dataset destroyed"); + + // OAI Test 6. Check ListIdentifiers again: + + listIdentifiersResponse = UtilIT.getOaiListIdentifiers(setName, "oai_dc"); + assertEquals(OK.getStatusCode(), listIdentifiersResponse.getStatusCode()); + + // Validate the service section of the OAI response: + responseXmlPath = validateOaiVerbResponse(listIdentifiersResponse, "ListIdentifiers"); + + // ... and confirm that the record for our dataset is still listed... + ret = responseXmlPath.getList("OAI-PMH.ListIdentifiers.header"); + assertEquals(1, ret.size()); + assertEquals(singleSetDatasetPersistentId, responseXmlPath + .getString("OAI-PMH.ListIdentifiers.header.identifier")); + + // ... BUT, it should be marked as "deleted" now: + assertEquals(responseXmlPath.getString("OAI-PMH.ListIdentifiers.header.@status"), "deleted"); } @@ -589,9 +673,13 @@ public void testMultiRecordOaiSet() throws InterruptedException { // in the class init: String setName = UtilIT.getRandomString(6); - String setQuery = "(dsPersistentId:" + singleSetDatasetIdentifier; + String setQuery = ""; for (String persistentId : extraDatasetsIdentifiers) { - setQuery = setQuery.concat(" OR dsPersistentId:" + persistentId); + if (setQuery.equals("")) { + setQuery = "(dsPersistentId:" + persistentId; + } else { + setQuery = setQuery.concat(" OR dsPersistentId:" + persistentId); + } } setQuery = setQuery.concat(")"); @@ -732,7 +820,6 @@ public void testMultiRecordOaiSet() throws InterruptedException { boolean allDatasetsListed = true; - allDatasetsListed = persistentIdsInListIdentifiers.contains(singleSetDatasetIdentifier); for (String persistentId : extraDatasetsIdentifiers) { allDatasetsListed = allDatasetsListed && persistentIdsInListIdentifiers.contains(persistentId); } @@ -857,12 +944,11 @@ public void testMultiRecordOaiSet() throws InterruptedException { // Record the last identifier listed on this final page: persistentIdsInListRecords.add(ret.get(0).substring(ret.get(0).lastIndexOf('/') + 1)); - // Finally, let's confirm that the expected 5 datasets have been listed + // Finally, let's confirm again that the expected 5 datasets have been listed // as part of this Set: allDatasetsListed = true; - allDatasetsListed = persistentIdsInListRecords.contains(singleSetDatasetIdentifier); for (String persistentId : extraDatasetsIdentifiers) { allDatasetsListed = allDatasetsListed && persistentIdsInListRecords.contains(persistentId); } @@ -905,7 +991,7 @@ public void testInvalidQueryParams() { // TODO: // What else can we test? // Some ideas: - // - Test handling of deleted dataset records + // - Test handling of deleted dataset records - DONE! // - Test "from" and "until" time parameters // - Validate full verb response records against XML schema // (for each supported metadata format, possibly?) diff --git a/src/test/java/edu/harvard/iq/dataverse/api/UtilIT.java b/src/test/java/edu/harvard/iq/dataverse/api/UtilIT.java index ec41248a65f..2b884589b5b 100644 --- a/src/test/java/edu/harvard/iq/dataverse/api/UtilIT.java +++ b/src/test/java/edu/harvard/iq/dataverse/api/UtilIT.java @@ -1494,6 +1494,11 @@ static Response reindexDataset(String persistentId) { return response; } + static Response indexClearDataset(Integer datasetId) { + return given() + .delete("/api/admin/index/datasets/"+datasetId); + } + static Response reindexDataverse(String dvId) { Response response = given() .get("/api/admin/index/dataverses/" + dvId); @@ -2066,7 +2071,7 @@ static Response indexClear() { return given() .get("/api/admin/index/clear"); } - + static Response index() { return given() .get("/api/admin/index"); @@ -2339,6 +2344,21 @@ static Response getExternalToolsForDataset(String idOrPersistentIdOfDataset, Str } return requestSpecification.get("/api/admin/test/datasets/" + idInPath + "/externalTools?type=" + type + optionalQueryParam); } + + static Response getExternalToolForDatasetById(String idOrPersistentIdOfDataset, String type, String apiToken, String toolId) { + String idInPath = idOrPersistentIdOfDataset; // Assume it's a number. + String optionalQueryParam = ""; // If idOrPersistentId is a number we'll just put it in the path. + if (!NumberUtils.isCreatable(idOrPersistentIdOfDataset)) { + idInPath = ":persistentId"; + optionalQueryParam = "&persistentId=" + idOrPersistentIdOfDataset; + } + RequestSpecification requestSpecification = given(); + if (apiToken != null) { + requestSpecification = given() + .header(UtilIT.API_TOKEN_HTTP_HEADER, apiToken); + } + return requestSpecification.get("/api/admin/test/datasets/" + idInPath + "/externalTool/" + toolId + "?type=" + type + optionalQueryParam); + } static Response getExternalToolsForFile(String idOrPersistentIdOfFile, String type, String apiToken) { String idInPath = idOrPersistentIdOfFile; // Assume it's a number.