Skip to content

Commit

Permalink
Merge pull request esmero#200 from alliomeria/1.4.0
Browse files Browse the repository at this point in the history
Documentation New Guides & Updates for 1.4.0 : round 1
  • Loading branch information
alliomeria authored Jun 18, 2024
2 parents edce9dd + 86835e6 commit 3481c68
Show file tree
Hide file tree
Showing 25 changed files with 557 additions and 75 deletions.
37 changes: 37 additions & 0 deletions docs/101_guides_list.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
---
title: Archipelago 101 - Core Documentation Guides
tags:
- Archipelago 101
- Documentation
---


# Archipelago 101: Core Documentation Guides

Top 10 guides we recommend you review as you get started working with Archipelago:

1. [Metadata in Archipelago](metadatainarchipelago.md): a long and worthwhile read that covers the fundamentals of Archipelago's architecture and approach to metadata and data

2. [Strawberryfield Formatters](strawberryfield-formatters.md): overview of the general setup of an Archipelago Digital Object (ADO) page and the way your ADO JSON metadata and data are output

3. [Primer on Display Modes & How to Create a Webform as an Input Method](webformsasinput.md): deeper look at Display Modes and Form Modes, two ways you'll be interacting with your ADOs most frequently

4. [Twig Templates and Archipelago](metadatatwigs.md): a great place to dive into one of Archipelago's best loved feature areas

5. [Archipelago Multi Importer](ami_index.md): all about Archipelago's batch ingest and update functionality

6. [Search and Solr Overview](search_solr_index.md): for repositories, it's all about the search
* [In-a-nutshell : JSON data to Strawberry Keyname Providers to Solr](search_solr_index.md#in-a-nutshell-json-data-to-strawberry-keyname-providers-to-solr): essential overview of the pipeline from JSON data into and out of Solr
* [Strawberry Key Name Providers, Solr Field, and Facet Configuration](strawberry_key_name_providers.md): fundamental information for site adminisrators

7. [Advanced Batch Find and Replace](find_and_replace.md): targetted batch updates for your ADO metadata

8. [Strawberry Runners Post-Processing Configuration](strawberryrunners.md): background post-processing defaults and options for all your file transformation and data indexing needs

9. [Archipelago Local Deployment Guide](archipelago-deployment-readme.md): get your own local Archipelago up and running in about 15 minutes

10. [Archipelago Presentations, Events, and Additional Resources](presentations_events.md): features recordings and links to different Archipelago workshops, conference presentations, and other helpful references

___

Thank you for reading! Please contact us on our [Archipelago Commons Google Group](https://groups.google.com/forum/#!forum/archipelago-commons) with any questions or feedback.
8 changes: 8 additions & 0 deletions docs/find_and_replace.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,14 @@ After reviewing the 'Important Notes & Workflow Recommendations' below, please s

The Actions available through Archipelago's Advanced Batch Find and Replace can potentially have repository-wide effects. It is strongly recommended that you proceed with caution when executing any of the available Actions.


!!! warning "Adding New Facets"

The default Facets available through Archipelago's Advanced Batch Find and Replace have an important configuration selection made on each individual Facet. For every [new Facet you add](strawberry_key_name_providers.md) for Find and Replace, you need to select the checkboxes for both the 'VBO batch handler' settings to use the `VBO Batch Facet processor`, and the selection within the 'VBO batch handler settings' to `Use URL based facets in VBO Batches`. You need to make sure these are selected so that the "visible" list/count of objects you filter using a Facet is respected during actual VBO process execution of batch changes you make for any Find and Replace Actions.
Also, please be aware that Drupal's VBO does not pass a "limit" (except if your VIEW has actually a "SHOW" a defined number of results which most users will never use). Because of that, when you run a VBO-based action, the default batch limitation will be set to the Search API/Solr defined Limit. You can view this Limit information at
'~yoursite/admin/config/search/search-api/server/esmero_solr/edit', under the Advanced Tab. This all means that if you first set a Limit of 100 in your Search API/SOLR defined Limit, then you see 1000 objects in your Find and Replace results and select all 1000 results for batch change operations, when you run your Find and Replace action only 100 changes will be processed. There is no way Archipelago can work around that VBO related behavior (for now, except open an ISSUE, perhaps a way can be found!).


## Simulation Mode

Before executing any of the available Find and Replace Actions, the best-practice workflow recommendation is to **always** first run in Simulation Mode:
Expand Down
67 changes: 67 additions & 0 deletions docs/iiif-content-search.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
---
title: IIIF Content Search API Integration
tags:
- IIIF
- IIIF Server Settings Form
- IIIF Content Search API
- Solr
- Solr Fields
- Solr Index
---

# IIIF Content Search API Integration

Beginning in release 1.3.0 and now fully mature in 1.4.0, Archipelago features IIIF Content Search API integration with attendant default configurations and settings.

Through a non-trifling amount of code and maths, Archipelago speaks the IIIF Content Search API language using data from your Archipelago's Digital Objects, to enable you to search within Mirador (or other supported viewers) for specific hits within OCR, VTT file, or manually created textual annotations.

Please also see the related [IIIF Server Settings Form](iiif_server_settings.md), and Strawberry Runners guides for [Reviewing and adjusting the `pager` and `ocr` Post-Processor operations](strawberryrunners_pager_ocr.md) and [Reviewing and Adjusting the `subtitle` Post-Processor operations](strawberryrunners_subtitle.md).

## 1. IIIF Manifest Templates

First, Archipelago's default IIIF Manifest templates explicitly state that they support the 3 versions of IIIF Content Search APIS in the 'service' key.

```JSON
"service": [
{
"id": "{{ baseurl }}iiifcontentsearch/v2/do/{{ node.uuid.value }}/metadatadisplayexposed/iiifmanifest/mode/advanced/page/0",
"type": "SearchService2"
},
{
"id": "{{ baseurl }}iiifcontentsearch/v1/do/{{ node.uuid.value }}/metadatadisplayexposed/iiifmanifest/mode/advanced/page/0",
"type": "SearchService1",
"@context": "http://iiif.io/api/search/1/context.json",
"profile": "http://iiif.io/api/search/1/search"
},
{
"@id": "{{ baseurl }}iiifcontentsearch/v1/do/{{ node.uuid.value }}/metadatadisplayexposed/iiifmanifest/mode/advanced/page/0",
"@context": "http://iiif.io/api/search/0/context.json",
"profile": "http://iiif.io/api/search/0/search"
}
],
```

## 2. API Endpoints Exposure

Next, in the default Exposed Metadata Endpoints API Endpoints (generated from the IIIF templates), Archipelago provides the specific structure needed for the IIIF Content Search API. Archipelago passes the data about “the template containing it”, the IIIF API version, if simple or advanced, and the Archipelago Digital Object resource UUID we are searching against (the one that contains the RAW data feeding the template, or at least the Top level/parent one of that).

## 3. Pathway into and out of the Solr Index

Then, Archipelago's backend recreates an ADO's IIIF manifest using this data (basically repeats what the client did before), but uses JMESPATHs to extract just what is needed, flipping the order of the structure and putting IIIF Image IDs, as "top keys" referencing canvases and their #xywh selectors (for the annotation text), if present.

Using this transformed data, Archipelago's backend search is able to be limited to OCR generated only by those images (importantly, as Archipelago repositories can contain millions of OCR'd documents). Archipelago's internal search then returns natively, via the [Bavarian State Library’s Solr OCR highlight plugin](https://github.com/dbmdz/solr-ocrhighlighting/), the relevant hits within a specified ADO. These are then reprocessed to be IIIF compliant (W3C) annotations and then reverted back to results as “canvases with images”.

## Things to keep in mind

- To make this performant, Archipelago uses two levels of caches that get invalidated automatically on any "ingredient" used modification.

- Archipelago can also tell the backend to use a "different" template than the one used at the front (Mirador), allowing you to define which "canvases" are searchable. This is not a normal use case, but still a valid one. And you can, per resource, have complex logic and/or different Viewers, even on a one by one basis.

### Acknowledgements

Archipelago's developers would like to extend our gratitude to our community, especially to [Mike](https://github.com/digitaldogsbody) and [Johannes](https://github.com/jbaiter) for their work and help, and everyone else in the IIIF and repository communities for all the amazing tools, viewers, specs and cookbook examples.

___

Thank you for reading! Please contact us on our [Archipelago Commons Google Group](https://groups.google.com/forum/#!forum/archipelago-commons) with any questions or feedback.

86 changes: 86 additions & 0 deletions docs/iiif_server_settings.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
---
title: IIF Server Settings Form Default Settings
tags:
- IIIF
- IIIF Server Settings Form
- IIIF Content Search API
- Solr
- Solr Fields
- Solr Index
---

# IIIF Server Settings Form Default Settings

The IIIF Server Settings Form is used to configure different IIIF related settings used throughout your Archipelago environment. We strongly advise keeping the default settings intact. The necessary [Solr Fields](strawberry_key_name_providers.md#creating-a-solr-field) listed below should be setup by default.

You can find the IIIF Configuration Form:

- Through the `Manage` menu > `Configuration` > `Archipelago` > `Configure Strawberry Runners Post Processors`
- Directly at `/admin/config/archipelago/iiif`

![IIIF Server Settings Form](images/iiif_server_settings_form.png)

On the IIIF Server Settings Form page, you will see the following:


1. Note that these 'IIIF Server configuration URLs are used as defaults for field formatters using IIIF, but can be overridden on a one by one basis when setting up your formatters for each Display Mode.'

2. Base URL of your IIIF Media Server public accessible from the Outside World.
- Please provide a publicly accessible IIIF server URL. This URL will be used for AJAX and JS calls. Trailing Slashes will be removed.
- Set to `http://localhost:8183/iiif/2` by default.
- We do not recommend changing this selection.

3. Base URL of your IIIF Media Server accessible from inside this Webserver.
- Please provide Internal IIIF server URL. This URL will be used by Internal Server calls and needs to be locally accessible by your server, e.g 127.0.0.1 or an local Docker alias. Trailing Slashes will be removed.
- Set to `http://esmero-cantaloupe:8182/iiif/2` by default.
- We do not recommend changing this selection.

4. Checkbox to 'Enable IIIF Content Search API V1 and V2 endpoints'.
- Checked by default in later (1.4.0+) versions of Archipelago.
- See the [related (and essential) IIIF Manifest snippet shared here](iiif-content-search.md#1-iiif-manifest-templates)
- APIs are accesible at the following path: "/iiifcontentsearch/{version}/do/{node_uuid}/metadatadisplayexposed/{metadataexposeconfig_entity}/mode/{mode}/page/{page}" with:
- {version} one of [v1,v2]
- {node_uuid} the UUID of the ADO whose Manifest you want to search inside
- {metadataexposeconfig_entity} the machine name of the exposed Metadata Display endpoint used to render the Manifest that is calling the API (e.g iiifmanifest)
- {mode} one of [simple,advanced]. Advanced is the smartest choice. Simple is faster, but requires your Canvas ids to be exactly in this pattern http(s)://domain.ext/do/{node_uuid}/{file_uuid}/canvas/{internal_to_the_file_sequence_order}
- {page} 0 to N depedening on the Number of results. By default please use 0

5. Checkbox to 'Only allow searches inside a Manifest If the Manifest itself (for an ADO) defines the Search Endpoints as a Service'
- Checked by default in later (1.4.0+) versions of Archipelago.
- If enabled we will double check if the calling IIIF Manifest defines the Endpoint(s) in the `service` key. If unchecked any Manifest will be searchable by calling an API URL directly.

6. IIIF Content Search API: field(s) that holds Parent Nodes
- Strawberry Flavor Data Source Search API Fields that can be used to connect a Strawberry Flavor to a Parent AD0.
- Default specified fields are: `Strawberryfield Flavor Datasource >> SBF Parent ID` and `Strawberryfield Flavor Datasource >> SBF Parent Node >> isPartOf >> ID`

7. Strawberry Runner processors that should be searched against for visual highlights.
- e.g Strawberry Flavor Data might have been generated by the "ocr" strawberry runners processor. A comma separated list of processors (machine names) that generated miniOCR.
- Default is: `ocr`
- If you are using the [Strawberry Runners `pager` and `ocr` post-processors](strawberryrunners_subtitle.md), you should always keep this enabled.

8. Strawberry Runner processors that should be searched against for time based media.
- e.g Strawberry Flavor Data might have been generated by the "subtitle" strawberry runners processor. These will have time based fragments and will match IIIF Annotations with motivation supplementing and target the time based media on the parent Canvas. A comma separated list of processors (machine names) that generated time based transcripts encoded as miniOCR.
- Default is: `subtitle`

9. Check to 'Target the VTT Supplementing Annotation'
- If enabled (aligned with the specs) the target of a hit result will point to the supplementing Annotation containing in its body the VTT file. If not the Canvas containing in its body a Media Resource (less precise but more compatible with Viewers
- If you are using the [Strawberry Runners `subtitle` post-processor](strawberryrunners_subtitle.md), you should always keep this enabled.

10. Strawberry Runner processors that should be searched against plain text extractions.
- e.g Strawberry Flavor Data might have been generated by the "text" strawberry runners processor. These will not have coordinates but will match IIIF Annotations with motivation supplementing and target the whole canvas. A comma separated list of processors (machine names) that generated time based transcripts encoded as miniOCR.
- Default is: `text`
- If you are using the [Strawberry Runners `subtitle` post-processor](strawberryrunners_subtitle.md), you should always keep this enabled.

11. IIIF Content Search API: field(s) that hold the URI of the File that produced the Searchable content
- Strawberry Flavor Data Source Search API Fields that hold the URI of the File that generated its content.
- Default specified fields are: `Strawberryfield Flavor Datasource >> Parent File`, `Strawberryfield Flavor Datasource >> SBF source or related URI/URL`, and` Strawberryfield Flavor Datasource >> Parent File >> URI`

12. IIIF Content Search API: Max Results per Page
- Default is: `25`

13. IIIF Content Search API: Max allowed characters/length for a Search term
- Default is: `64`

___

Return to the main [Strawberry Runners](strawberryrunners.md) or the [Archipelago Documentation main page](index.md).
Binary file added docs/images/ado-type-to-view-mode-mapping.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/display-modes-2024.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/forms-modes-2024.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/iiif_server_settings_form.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/manage-display-2024.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/manage-display-coll.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/manage-form-display-2024.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/managing-display-modes-2024.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/sbr_subtitle.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/strawberryrunnershome_updated.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
14 changes: 10 additions & 4 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,15 @@
# Archipelago Commons Intro

Archipelago Commons, or simply Archipelago, is an Open Source Digital Objects Repository / DAM Server Architecture based on the popular CMS [`Drupal 9/10`](https://www.drupal.org) and released under [`GLP V.3 License`](https://www.gnu.org/licenses/gpl-3.0.txt). Archipelago is developed and supported at the [Metropolitan New York Library Council (METRO)](https://metro.org).
Archipelago Commons, or simply Archipelago, is an Open Source Digital Objects Repository / DAM Server Architecture based on the popular CMS [`Drupal 9/10+`](https://www.drupal.org) and released under [`GLP V.3 License`](https://www.gnu.org/licenses/gpl-3.0.txt). Archipelago is developed and supported at the [Metropolitan New York Library Council (METRO)](https://metro.org).

Archipelago is a mix of deeply integrated custom-coded Drupal modules (made with care by us) and a curated and well-configured Drupal instance, running under a discrete and well-planned set of service containers. Learn more about the different [`Software Services`](devops.md) used by Archipelago.
Archipelago is a mix of deeply integrated custom-coded Drupal modules (made with care by us, the [Digital Services Team and METRO](https://metro.org/digital-services)) and a curated and well-configured Drupal instance, running under a discrete and well-planned set of complementary additional service containers. You can learn more about the different [Software Services used by Archipelago here](devops.md), and [Archipelago's unique approach to Metadata here](metadatainarchipelago.md).

Archipelago's primary focus is to serve the greater [`GLAM community`](https://en.wikipedia.org/wiki/GLAM_(industry_sector)) by providing a flexible, consistent, and unified way of describing, storing, linking, exposing metadata and media assets. We respect identities and existing workflows. We endeavor to design Archipelago in ways that empower communities of every size and shape.
Archipelago's primary focus is to serve the greater [`GLAM community`](https://en.wikipedia.org/wiki/GLAM_(industry_sector)) (libraries, archives, museums, universities and colleges, cultural heritage organizations) by providing a flexible, consistent, and unified way of describing, storing, linking, exposing metadata and media assets that make up rich repository collections all around our shared beautiful world. We respect identities and existing workflows, and we endeavor to design Archipelago in ways that empower communities of every size, shade, and shape.

Finally, Archipelago tries to stay humble, slim, and nimble in nature with a small codebase full of inline comments and `@todos`. All of our work is driven by a clear and [concise but thoughtful planned technical roadmap --updated in tandem with new releases](https://github.com/esmero/archipelago-deployment/issues/243).

We recommend you start with the [Core Documentation Guides listed here](101_guides_list.md) as you begin your Archipelago explorations.
___

Thank you for reading! Please contact us on our [Archipelago Commons Google Group](https://groups.google.com/forum/#!forum/archipelago-commons) with any questions or feedback.

Finally, Archipelago tries to stay humble, slim, and nimble in nature with a small code base full of inline comments and `@todos`. All of our work is driven by a clear and [concise but thoughtful planned technical roadmap --updated in tandem with new releases](https://github.com/esmero/archipelago-deployment/issues/243).
Loading

0 comments on commit 3481c68

Please sign in to comment.