Case Harvester

Case Harvester is a project designed to scrape the Maryland Judiciary Case Search (MJCS) and build a near-complete database of Maryland court cases that can be queried and analyzed without the limitations of the MJCS interface. It is designed to leverage Amazon Web Services (AWS) for scalability and performance.

Our database of cases (with criminal defendant PII redacted) is available to the public and can be found at mdcaseexplorer.com, which is built using our Case Explorer software. REST and GraphQL APIs are available. If you would like to download tables from our database exported monthly, you can find that at exports.mdcaseexplorer.com.

NOTE: Unless you are modifying Case Harvester for specific purposes, please do not run your own instance so that MJCS is spared unneccesary load. Instead, use the options described above for viewing the data, or if you have an AWS account you are also welcome to clone our database directly.

Architecture

Case Harvester is split into three main components: spider, scraper, and parser. Each component is a part of a pipeline that finds, downloads, and parses case data from the MJCS. The following diagram shows at a high level how each of these components interact:

Spider

The spider component is responsible for discovering new case numbers. It does this by submitting search queries to the MJCS and iterating through the results. Because the MJCS only returns a maximum of 500 results, the search algorithm splits queries that return 500 results into a set of more narrowed queries which are then submitted. Each of these queries is then split again if more than 500 results are returned, and so forth, until the MJCS is exhaustively searched for case numbers.

Scraper

The scraper component downloads and stores the case details for every case number discovered by the spider. The full HTML for each case is added to an S3 bucket. Version information is kept for each case, including a timestamp of when each version was downloaded, so changes to a case can be recorded and referenced.

Parser

The parser component is a Lambda function that parses the fields of information in the HTML case details for each case, and stores that data in the PostgreSQL database. Each new item added to the scraper S3 bucket triggers a new parser Lambda invocation, which allows for significant scaling.

Case details in the MJCS are formatted differently depending on the county and type of case (e.g. district vs circuit court, criminal vs civil, etc.), and whether it is in one of the new MDEC-compatible formats. MJCS assigns a code to each of these different case types:

ODYCRIM: MDEC Criminal Cases
ODYTRAF: MDEC Traffic Cases
ODYCIVIL: MDEC Civil Cases
ODYCVCIT: MDEC Civil Citations
ODYCOSA: MDEC Appellate Court of Maryland (formerly Court of Special Appeals)
ODYCOA: MDEC Supreme Court of Maryland (formerly Court of Appeals)
DSCR: District Court Criminal Cases
DSCIVIL: District Court Civil Cases
DSCP: District Court Civil Citations
DSTRAF: District Court Traffic Cases
K: Circuit Court Criminal Cases
CC: Circuit Court Civil Cases
DV: Domestic Violence Cases
DSK8: Baltimore City Criminal Cases
PG: Prince George's County Circuit Court Criminal Cases
PGV: Prince George's County Circuit Court Civil Cases
MCCI: Montgomery County Civil Cases
MCCR: Montgomery County Criminal Cases

Each different parser breaks down the case details to a granular level and stores the data in a number of database tables. This schematic diagram illustrates how this data is represented in the database.

Questions

For questions or more information, email [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 286 Commits
.github		.github
cloudformation		cloudformation
db		db
env		env
img		img
lambda		lambda
lib/psycopg2		lib/psycopg2
mjcs		mjcs
resources		resources
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
alembic.ini		alembic.ini
go.mod		go.mod
go.sum		go.sum
harvester.py		harvester.py
orchestrator.go		orchestrator.go
requirements.txt		requirements.txt
secrets.json.example		secrets.json.example

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Case Harvester

Architecture

Spider

Scraper

Parser

Questions

About

Releases 7

Sponsor this project

Packages

Contributors 2

Languages

License

dismantl/CaseHarvester

Folders and files

Latest commit

History

Repository files navigation

Case Harvester

Architecture

Spider

Scraper

Parser

Questions

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 7

Sponsor this project

Packages 0

Contributors 2

Languages

Packages