A collection of scripts and queries for auditing and updating data in ArchivesSpace. Managed by the Yale Archival Management Systems committee (YAMS)
This repository is in active development. See this document for details on future additions to the auditing queries.
- ArchivesSpace 2.4+
- Python 3.4+
requests
sshtunnel
pymysql
utilities
local Python package
To connect to the ArchivesSpace database, enter credentials into config_template.yml
and change name to config.yml
. The DBConn
class in db_conn_ssh.py
will look in the current directory for the config file. To run a query, enter the following into a Terminal:
$ cd /Users/username/filepath
$ python
Python 3.6.2 |Anaconda custom (x86_64)| (default, Sep 21 2017, 18:29:43)
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import db_conn_ssh as db
>>> as_db = db.DBConn()
>>> q = 'SELECT title from archival_object LIMIT 10'
>>> results = as_db.run_query(q)
>>> print(results)
title
0 Yale Bowl materials collected by Charles A. Ferry
1 Correspondence
2 Proposed Yale Bowl Memorabilia
3 Photographs
4 Charles A. Ferry Bowl model
5 First batch of concrete
6 Construction album
7 Miscellaneous construction
8 Bowl filled
9 Contracts
Audit queries in this repository can be run individually, or can be run in bulk by executing run_queries.py
. An input (location of the queries) and output (where results should be stored) directory, as well as any exclusions (any .sql files that should not be run) must be specified in the config.yml
file. To run run_queries.py
, enter the following into a Terminal:
$ cd /Users/username/filepath
$ python run_queries.py
The run_queries.py
script can also be set to run as a cron job or Windows Scheduler task, or automated with a Python scheduler module such as apscheduler
.
Update scripts are intended to be run separately based on the results of the audit queries.
Script to connect to ArchivesSpace database via SSH.
Search for queries in user-defined directory and execute each, saving output to specified directory. Exclusions can be defined in config.yml
file.
Assorted utility scripts for logging into the ArchivesSpace API, file handling, logging, etc.
Script to delete top-level records. Can be used to delete "orphan" agent and subject records, unassociated containers and digital objects, etc.
Retrieves a list of all agent records.
Retrieves a list of all subject records.
Script to retrieve all subject records via ArchivesSpace API
Script to retrieve all agent records via ArchivesSpace API
Retrieves all agents linked to descriptive records.
Retrieves all subjects linked to descriptive records.
Retrieves a count of agents linked to descriptive records
Retrieves a count of subjects linked to descriptive records.
Gets a count of sources (i.e. LCNAF) used in corporate entity name records.
Gets a count of sources used in family name records.
Gets a count of sources used in personal name records.
Gets a count of sources used in subject records.
Retrieves a list of agents not linked to descriptive records.
Retrieves a list of subjects not linked to descriptive records.
Processes output of get_all_agents.sql
, get_all_subjects.sql
, get_agents.py
, or get_subjects.py
and returns potential duplicate agent and subject records.
Merges duplicate agent or subject records.
Retrieves all top containers without a container type.
Retrieves all containers not associated with an archival object instance.
Retrieves all digital objects not associated with an archival object instance.
Changes position of digital object records according to user specifications.
Retrieves a count of container profiles currently in use.
Retrieves a count of extent types currently in use.
Retrieves a list of controlled values. Includes values not in use.
Retrieves a count of material types currently in use.
Change position of enumeration values according to user specifications.
Merges duplicate enumeration values.
Deletes unused enumeration values from the database.
See separate date README for detailed instructions on running date auditing queries and normalization scripts.
See separate file version README for detailed instructions on running file version audit and update scripts.
Returns a rough representation of a finding aid in CSV format.
Retrieves a list of resources with identifiers split across multiple fields.
Moves split identifiers into 'id_0' field.
Retrieves all notes. Can be limited to return certain types.
Executes all_notes.sql
query and returns an analysis of note label usage.
Removes note labels.
Retrieves a list of all users with permissions. Used to analyze the permissions of active users.
Retrieves a list of all users with or without permissions. Used to identify inactive users.
Retrieves a list of external links used in notes and digital object records.
Executes get_links.sql
query and checks the results for broken links.
Retrieves all free-text access notes.
Retrieves all machine-actionable restrictions.
Retrieves all restriction notes + machine actionable restrictions.