Skip to content

ArchivesSpace plugin to create a sitemap for the PUI

Notifications You must be signed in to change notification settings

BCDigLib/aspace_sitemap

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

70 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ArchivesSpace Sitemap Generation for the PUI

Getting started

Download and unpack the latest release of the plugin into your ArchivesSpace plugins directory:

    $ curl ...
    $ cd /path/to/archivesspace/plugins
    $ unzip ...

Add the plugin name to the list of enabled plugins in config/config.rb:

AppConfig[:plugins] = ['some_plugin','aspace_sitemap']

Note

For users running ArchivesSpace versions older than v2.6.0, please note that slug related options are not available.

Institutions with large numbers of published objects may need to increase the memory alloted to the application. See http://archivesspace.github.io/archivesspace/user/tuning-archivesspace/

The sitemap generation relies on the SOLR index for some checks related to unpublished ancestors, so the sitemap generation should only be run after the indexer completes the first full index round.

What does it do?

The plugin adds a new job that generates a sitemap (at least one sitemap with a sitemap index) for the PUI. The file(s) can be downloaded and placed on a server of your choice for submission to the search engine(s) of choice and saved to the local filesystem to be served out at {pui_host}/sitemap-index.xml. There are two configuration options.

Configuration

Configure the plugin by editing your config.rb file with the following entries - modified as appropriate. If you are submitting the sitemap via the tools provided by Google or Bing, you will need to set the following.

  1. Google requires verification that you own the site. One way is by a verification meta tag.
# set the meta tag from Google to verify site ownership
AppConfig[:google_verification_meta_tag] = "your_verification_meta_tag"
  1. Bing also requires verification that you own the site. One way is by a verification meta tag.
# set the meta tag from Bing to verify site ownership
AppConfig[:bing_verification_meta_tag] = "your_verification_meta_tag"

How to Use

For users with access to Background Jobs, there is a new entry in the Create Jobs menu called ArchivesSpace PUI Sitemap Once selected, the job asks for several inputs

  1. What types of objects to include in the sitemap. At least one is required.
  2. The update frequency. For most institutions, yearly is probably fine.
  3. Use human readable slugs. Slugs generated by the user or the application will be used in the <loc> field if they are available. (v2.6.0+)
  4. Write to local filesystem. Sitemaps will be written to a static space and to the root of the PUI webspace. The generated sitemaps are stored in AppConfig[:data_directory]/pui_sitemaps and placed at the root of the site ie: {pui_host}/sitemap-index.xml It also updates the robots.txt file in the PUI to include the sitemap entry. Any existing sitemaps are copied to the PUI webroot on startup and the robots.txt file is updated on startup if there are existing sitemap files. Uncheck this option and fill in the sitemap index base url entry (below) if you want to host the sitemaps on an external server.
  5. The sitemap index base url. This is the location where you will be hosting the sitemaps. It is ignored if write to filesystem (above) is selected.
  6. The limit on the number of entries per sitemap file. You should be able to leave this at the default of 50000.

Notes

  1. The 'priority' attribute is not used in the sitemap since there is no mechanism in place to mark objects in the staff interface. Given the large number of objects that are typically published, it seems unlikely that 'priority' would be widely used. Google has also indicated that the priority attribute is not used by their algorithm.
  2. The option to use slug/human readable urls is somewhat risky, since these slugs are based on changeable metadata.

Joshua Shaw ([email protected])
Digital Library Technologies Group
Dartmouth College Library

About

ArchivesSpace plugin to create a sitemap for the PUI

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Ruby 84.3%
  • HTML 15.7%