Releases: dismantl/CaseHarvester
Releases · dismantl/CaseHarvester
2.0
Release notes for Case Harvester version 2.0:
- New case format parsers:
- ODYCOSA: Appellate Court of MD (formerly Special Court of Appeals)
- ODYCOA: Supreme Court of MD (formerly Court of Appeals)
- Daily collection of newly posted case numbers via the Collector component
- Added an Orchestrator component (written in Go) and rewrote the Spider and Scraper components in order to bypass the absurd anti-bot protections on Case Search, including DataDome. I'm not going to document their design; if the anti-transparency nerds at the Maryland Judiciary and DataDome want to figure out how it works, they can read the fucking code 🖕🏻🖕🏻
- Judge info has been added to MDEC formats
Thanks to our current and former sponsors for helping us cover our server costs and funding research and development for bypassing DataDome:
1.2
Release notes for Case Harvester version 1.2:
- New case format parsers:
- K: Circuit Court Criminal Cases
- DSCP: District Court Civil Citations
- DSTRAF: District Court Traffic Cases
- PG: Prince George's County Circuit Court Criminal Cases
- PGV: Prince George's County Circuit Court Civil Cases
- MCCR: Montgomery County Criminal Cases
- MCCI: Montgomery County Civil Cases
- Monthly exports of all tables to S3 for public download
- Auto-scale scraper service based on size of scraper queue
- Added column_metadata table to hold info about column meaning and other attributes used in Case Explorer
- Unredact civil cases
- Miscellaneous parser fixes
This release also includes a workaround for a new anti-scraping measure that was added to the Maryland Judiciary Case Search. Worryingly, MJCS now also seems to have a half-completed reCAPTCHA implementation, which, if it were to be fully implemented and deployed, would make scraping significantly harder and would thus be a major blow to transparency of the MD court system.
1.1.1
1.1
Release notes for Case Harvester version 1.1:
- Added new parser for ODYCIVIL cases (~6.5 million)
- New and improved scraping schedule
- Improved concurrency
- Automatic redaction of defendant information
- Configurable user-agent
- Parser executes outside VPC to cut costs
- Removed scraper failed queue
- Added template ECS spider task definition
- Upgraded database engine from Postgres 9.6 to 11.5
- Replaced BTree indexes with hash indexes
- Updates/fixes to ODYCRIM, ODYTRAF parsers
- Uses HTTPS version of MJCS
1.1rc3
1.1rc2
1.0
Version: 1.0
Codename: Reaper
Description: First stable release of Case Harvester.
Features:
- Automated cloud infrastructure deployment via AWS Cloudformation. No servers to maintain.
- Fully automated and scheduled spidering/scraping
- Comprehensive command line interface for manual use and testing
- Tune concurrency and other settings with configuration profiles or environment variables
- Docker image for easy portability
- Version history recorded as case details get updated over time
- Development and production environments
- Autogenerated database schematic documentation
- Alembic database versioning
- Extensible parser class covering six different case formats so far