diff --git a/duplicates.md b/duplicates.md new file mode 100644 index 00000000..91e89bac --- /dev/null +++ b/duplicates.md @@ -0,0 +1,105 @@ +# Duplicate detection endpoint +[Back to the list of all defined endpoints](endpoints.md) + +## Main Endpoint +**/api/submission/duplicates** + +Provide access to basic duplicate detection services. These services use Solr and the levenshtein distance operator +to detect potential duplicates of a given item, useful during submission and workflow review. + +See `dspace/config/modules/duplicate-detection.cfg` for configuration properties and examples. + +## Single duplicate + +Not implemented. (a duplicate only makes sense in the context of a search by item) + +## All duplicates + +Not implemented. (a duplicate only makes sense in the context of a search by item) + +## Search + +**GET /api/submission/duplicates/search/findByItem?uuid=<:uuid>** + +Provides a list of items that may be duplicates, if this feature is enabled, given the uuid as a parameter. + +Note that although this appears in the submission category, the item UUID can also be an archived item. +Currently, the only frontend use of this feature is in workspace and workflow, so it is categorised as such. + +Each potential duplicate has the following attributes: + +* title: The item title +* uuid: The item UUID +* owningCollectionName: Name of the owning collection, if present +* workspaceItemId: Integer ID of the workspace item, if present +* workflowItemId: Integer ID of the workflow item, if present +* metadata: A list of metadata values copied from the item, as per configuration +* type: The value is always DUPLICATE. This is the 'type' category used for serialization/deserialization. + +Example + +```json +{ + "potentialDuplicates": [ + { + "title": "Example Item", + "uuid": "5ca83276-f003-460d-98b6-dd3c30708749", + "owningCollectionName": "Publishers", + "workspaceItemId": null, + "workflowItemId": null, + "metadata": { + "dc.title": [ + { + "value": "Example Item", + "language": null, + "authority": null, + "confidence": -1, + "place": 0 + } + ], + "dspace.entity.type": [ + { + "value": "Publication", + "language": null, + "authority": null, + "confidence": -1, + "place": 0 + } + ] + }, + "type": "DUPLICATE" + }, { + "title": "Example Itom", + "uuid": "32f8f6e4-c79e-4322-aae7-07ee535f70a6", + "owningCollectionName": null, + "workspaceItemId": 51, + "workflowItemId": null, + "metadata": { + "dc.title": [{ + "value": "Example Itom", + "language": null, + "authority": null, + "confidence": -1, + "place": 0 + }] + }, + "type": "DUPLICATE" + }, { + "title": "Exaple Item", + "uuid": "0647ff45-48f5-4c1b-b6d7-f5dbbc160856", + "owningCollectionName": null, + "workspaceItemId": 52, + "workflowItemId": null, + "metadata": { + "dc.title": [{ + "value": "Exaple Item", + "language": null, + "authority": null, + "confidence": -1, + "place": 0 + }] + }, + "type": "DUPLICATE" + }] +} +``` \ No newline at end of file diff --git a/endpoints.md b/endpoints.md index fb26516e..519434a8 100644 --- a/endpoints.md +++ b/endpoints.md @@ -48,6 +48,7 @@ * [/api/integration/suggestions](suggestions.md) * [/api/integration/suggestionsources](suggestionsources.md) * [/api/integration/suggestiontargets](suggestiontargets.md) +* [/api/submission/duplicates](duplicates.md) ## Endpoints Under Development/Discussion * [/api/authz/resourcepolicies](resourcepolicies.md) diff --git a/items.md b/items.md index aa5c26bb..a0e1067d 100644 --- a/items.md +++ b/items.md @@ -585,4 +585,4 @@ Return codes: * 204 No content - if the operation succeed * 401 Unauthorized - if you are not authenticated * 403 Forbidden - if you are not logged in with sufficient permissions -* 404 Not found - if the item doesn't exist (or was already deleted) +* 404 Not found - if the item doesn't exist (or was already deleted) \ No newline at end of file diff --git a/submission.md b/submission.md index 40a6b3dd..007e9ad3 100644 --- a/submission.md +++ b/submission.md @@ -32,7 +32,7 @@ This is the WorkspaceItem object you created. It is **important** to keep the `id` of the WorkspaceItem, as this is necessary to update it or access it again. For example, using the `id`, you can load up the current state of your WorkspaceItem ``` -GET /api/sumission/workspaceitems/<:id> +GET /api/submission/workspaceitems/<:id> ``` In the response, you'll see a list of `sections` which are available to complete for this WorkspaceItem. diff --git a/submissionsection-types.md b/submissionsection-types.md index 99fe96fd..dd75b70a 100644 --- a/submissionsection-types.md +++ b/submissionsection-types.md @@ -13,5 +13,6 @@ cclicense | [/config/submissioncclicenses](submissioncclicenses.md) | [example]( access | [/config/submissionaccessoptions](submissionaccessoptions.md) | [example](workspaceitem-data-access.md) sherpaPolicies | n/a | [example](workspaceitem-data-sherpa-policy.md) identifiers | n/a | [example](workspaceitem-data-identifiers.md) +duplicates | n/a | [example](workspaceitem-data-duplicates.md) n/a --> not applicable. The sectionType doesn't require/support any extra configuration \ No newline at end of file diff --git a/workspaceitem-data-duplicates.md b/workspaceitem-data-duplicates.md new file mode 100644 index 00000000..ee663f42 --- /dev/null +++ b/workspaceitem-data-duplicates.md @@ -0,0 +1,89 @@ +# WorkspaceItem data of identifiers sectionType +[Back to the definition of the workspaceitems endpoint](workspaceitems.md) + +This section data represent a list of potential duplicates associated for this workspace item. + +It is a JSON object with the following structure (matches the response from the [duplicate search endpoint](duplicates.md)) : + +```json +{ + "potentialDuplicates": [ + { + "title": "Example Item", + "uuid": "5ca83276-f003-460d-98b6-dd3c30708749", + "owningCollectionName": "Publishers", + "workspaceItemId": null, + "workflowItemId": null, + "metadata": { + "dc.title": [ + { + "value": "Example Item", + "language": null, + "authority": null, + "confidence": -1, + "place": 0 + } + ], + "dspace.entity.type": [ + { + "value": "Publication", + "language": null, + "authority": null, + "confidence": -1, + "place": 0 + } + ] + }, + "type": "DUPLICATE" + }, { + "title": "Example Itom", + "uuid": "32f8f6e4-c79e-4322-aae7-07ee535f70a6", + "owningCollectionName": null, + "workspaceItemId": 51, + "workflowItemId": null, + "metadata": { + "dc.title": [{ + "value": "Example Itom", + "language": null, + "authority": null, + "confidence": -1, + "place": 0 + }] + }, + "type": "DUPLICATE" + }, { + "title": "Exaple Item", + "uuid": "0647ff45-48f5-4c1b-b6d7-f5dbbc160856", + "owningCollectionName": null, + "workspaceItemId": 52, + "workflowItemId": null, + "metadata": { + "dc.title": [{ + "value": "Exaple Item", + "language": null, + "authority": null, + "confidence": -1, + "place": 0 + }] + }, + "type": "DUPLICATE" + }] +} +``` +The potential duplicates listed in the section have all been detected by a special Solr search that compares the + levenshtein edit distance between the in-progress item title and other item titles (normalised). + +Each potential duplicate has the following attributes: + +* title: The item title +* uuid: The item UUID +* owningCollectionName: Name of the owning collection, if present +* workspaceItemId: Integer ID of the workspace item, if present +* workflowItemId: Integer ID of the workflow item, if present +* metadata: A list of metadata values copied from the item, as per configuration +* type: The value is always DUPLICATE. This is the 'type' category used for serialization/deserialization. + +See `dspace/config/modules/duplicate-detection.cfg` for configuration properties. + +## Patch operations +There are no PATCH methods implemented for this section. diff --git a/workspaceitems.md b/workspaceitems.md index c79ccfad..b85c0f1b 100644 --- a/workspaceitems.md +++ b/workspaceitems.md @@ -60,8 +60,31 @@ Provide detailed information about a specific workspaceitem. The JSON response d "doi" : "https://doi.org/10.5072/dspace/2", "otherIdentifiers" : [ ] }, + "duplicates": { + "potentialDuplicates": [ + { + "title": "Sample Submission Item", + "uuid": "5ca83276-f003-460d-98b6-dd3c30708749", + "owningCollectionName": "Another Collection", + "workspaceItemId": null, + "workflowItemId": null, + "metadata": { + "dc.title": [ + { + "value": "Example Item", + "language": null, + "authority": null, + "confidence": -1, + "place": 0 + } + ] + }, + "type": "DUPLICATE" + } + ] + }, "traditional-page1": { - "dc.title" : [{value: "Sample Submission Item"}], + "dc.title" : [{value: "Sample Submission Item"}], "dc.contributor.author" : [ {value: "Bollini, Andrea", authority: "rp00001", confidence: 600} ]