This is a small Ruby (not Rails) project meant to run on Heroku, whose only purpose is to connect to our ArchivesSpace's API once a week and export .ead.xml files to a location on S3. They can be harvested there by various partners. All the important code can be found in export_archivesspace_xml/lib/exporter.rb
More documentation can be found in the wiki.
The files are uploaded to an s3 bucket; this is publicly accessible via a cname record at .
creates a very simple index.html
file in the bucket. The file allows our partners to use a variation on the following command to download all our EAD files:
wget -r -A *.ead.xml
We maintain a cloudfront distribution for the files at . AWS details, including specifics about the SSL cert, are documented on the wiki; see wiki link above.
We maintain a description of the app's infrastructure, such as S3 buckets, in Terraform (details).
This is done via environment variables set on the Heroku project. Here are some of the important ones:
These allow the code to contact ArchivesSpace and download the EADs.
These are needed so the code knows where to put the files.
Note: The IAM permissions associated with this key pair in S3 are minimal: the code can only write files to the ead bucket.
We don't manage these- they're set by Heroku for our add-ons.
The project does not include a web dyno, and relies on the Heroku Scheduler to spin up a nightly process.
bundle exec ruby run_check.rb will download each EAD file from the bucket, validate it against the EAD schema, and report any fatal errors.