DPLA Subject Term Analysis

This project is a subject term analysis method for DPLA metadata using a collection of scripts.

Overview

This method has three stages:

Obtain
Scrub
Explore

Each of these stages have their own folder and the directions can be found in a markdown file. In order to use this method download this repostiory. Next, create two additional folders named 'data' and 'vocab'. The data folder will contain DPLA metadata files for the obtain, scrub, and explore stages. The vocab folder will contain controlled vocabularies using the obtain and scrub stages.

Software requirements

This process has been developed with Linux and will most likely need refinement to work on other operating systems. Required software is:

Apache Spark
Apache Jena ARQ (https://jena.apache.org/documentation/query/index.html)
dos2unix (https://sourceforge.net/projects/dos2unix/)
Python 3
pandas
Jupyter Notebook

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
explore		explore
obtain		obtain
scrub		scrub
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DPLA Subject Term Analysis

About

Releases

Packages

Languages

timkanke/subject-term-analysis

Folders and files

Latest commit

History

Repository files navigation

DPLA Subject Term Analysis

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages