Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IQSS/11095- Account for multivalue needed by cvoc scripts #11096

Open
wants to merge 3 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions doc/release-notes/11095-fix-extcvoc-indexing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
Some External Controlled Vocabulary scripts/configurations, when used on a metadata field that is single valued could result
in indexing failure for the dataset (e.g. when the the script tried to index both the identifier and name of the identified entity for indexing).
Dataverse has been updated to correctly indicate the need for a multi-valued solr field in these cases in the call to /api/admin/index/solr/schema.
Configuring the Solr schema and the update-fields.sh script as usually recommended when using custom metadata blocks will resolve the issue.

The overall release notes should include a solr update (which hopefully is required by an update to 9.7.0 anyway) and our standard instructions
should change to recommending use of the udpate-fields.sh script when using custom metadatablocks *and/or external vocabulary scripts*.
qqmyers marked this conversation as resolved.
Show resolved Hide resolved
6 changes: 4 additions & 2 deletions doc/sphinx-guides/source/admin/metadatacustomization.rst
Original file line number Diff line number Diff line change
Expand Up @@ -559,8 +559,7 @@ Using External Vocabulary Services

The Dataverse software has a mechanism to associate specific fields defined in metadata blocks with a vocabulary(ies) managed by external services. The mechanism relies on trusted third-party Javascripts. The mapping from field type to external vocabulary(ies) is managed via the :ref:`:CVocConf <:CVocConf>` setting.

*This functionality is considered 'experimental'. It may require significant effort to configure and is likely to evolve in subsequent Dataverse software releases.*

*This functionality may require significant effort to configure and is likely to evolve in subsequent Dataverse software releases.*

The effect of configuring this mechanism is similar to that of defining a field in a metadata block with 'allowControlledVocabulary=true':

Expand All @@ -585,6 +584,9 @@ Configuration involves specifying which fields are to be mapped, to which Solr f
These are all defined in the :ref:`:CVocConf <:CVocConf>` setting as a JSON array. Details about the required elements as well as example JSON arrays are available at https://github.com/gdcc/dataverse-external-vocab-support, along with an example metadata block that can be used for testing.
The scripts required can be hosted locally or retrieved dynamically from https://gdcc.github.io/ (similar to how dataverse-previewers work).

Since external vocabulary scripts can change how fields are indexed (storing an identifier and name and/or values in different languages),
updating the solr schema as described in :ref:`update-solr-schema` should be done after adding new scripts to your configuration.
qqmyers marked this conversation as resolved.
Show resolved Hide resolved

Please note that in addition to the :ref:`:CVocConf` described above, an alternative is the :ref:`:ControlledVocabularyCustomJavaScript` setting.

Protecting MetadataBlocks
Expand Down
3 changes: 3 additions & 0 deletions doc/sphinx-guides/source/installation/config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4653,6 +4653,9 @@ The commands below should give you an idea of how to load the configuration, but

``curl -X PUT --upload-file cvoc-conf.json http://localhost:8080/api/admin/settings/:CVocConf``

Since external vocabulary scripts can change how fields are indexed (storing an identifier and name and/or values in different languages),
updating the solr schema as described in :ref:`update-solr-schema` should be done after adding new scripts to your configuration.
qqmyers marked this conversation as resolved.
Show resolved Hide resolved

.. _:ControlledVocabularyCustomJavaScript:

:ControlledVocabularyCustomJavaScript
Expand Down
11 changes: 6 additions & 5 deletions src/main/java/edu/harvard/iq/dataverse/api/Index.java
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@
import java.lang.reflect.Field;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.util.Set;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.Future;
Expand Down Expand Up @@ -451,11 +452,11 @@ public Response clearOrphans(@QueryParam("sync") String sync) {
public String getSolrSchema() {

StringBuilder sb = new StringBuilder();

for (DatasetFieldType datasetField : datasetFieldService.findAllOrderedByName()) {
Map<Long, JsonObject> cvocTermUriMap = datasetFieldSvc.getCVocConf(true);
for (DatasetFieldType datasetFieldType : datasetFieldService.findAllOrderedByName()) {
//ToDo - getSolrField() creates/returns a new object - just get it once and re-use
String nameSearchable = datasetField.getSolrField().getNameSearchable();
SolrField.SolrType solrType = datasetField.getSolrField().getSolrType();
String nameSearchable = datasetFieldType.getSolrField().getNameSearchable();
SolrField.SolrType solrType = datasetFieldType.getSolrField().getSolrType();
String type = solrType.getType();
if (solrType.equals(SolrField.SolrType.EMAIL)) {
/**
Expand All @@ -474,7 +475,7 @@ public String getSolrSchema() {
*/
logger.info("email type detected (" + nameSearchable + ") See also https://github.com/IQSS/dataverse/issues/759");
}
String multivalued = datasetField.getSolrField().isAllowedToBeMultivalued().toString();
String multivalued = Boolean.toString(datasetFieldType.getSolrField().isAllowedToBeMultivalued()|| cvocTermUriMap.containsKey(datasetFieldType.getId()));
qqmyers marked this conversation as resolved.
Show resolved Hide resolved
// <field name="datasetId" type="text_general" multiValued="false" stored="true" indexed="true"/>
sb.append(" <field name=\"" + nameSearchable + "\" type=\"" + type + "\" multiValued=\"" + multivalued + "\" stored=\"true\" indexed=\"true\"/>\n");
}
Expand Down
Loading