Skip to content

nwhysel/CROL-PDF

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 

Repository files navigation

City Record Online - PDF Scraping Project

Community Links

About

As the City embarks on implementing Intro 363-2014 and unlocking its daily actions, we are building a public workgroup to unlock the decades of historical information and make it accessible to all, at no charge.

Our group’s goal is to disassemble digital copies of City Record and convert them into usable notifications, words, dates, and events. We want to make solicitation procurement notices and awards, public hearings, meetings, court notices, property dispositions, agency public hearings, agency rules, and changes in personnel into a powerful archive for all.

This project will start by converting the City Record PDFs, with more than 15 years of data, into usable information. Facilitated by BetaNYC and Socrata, we will turn these files into a first class collection of information that builds a smarter city. Through this process businesses, community groups, academics, and the public will learn how their City government works. This unique collaboration of government, industry, hackers, and advocates, illustrates that opening up data isn't just about transparency but actually building smarter, more inclusive, and resilient governance.

Project Partners

  • City of New York
  • BetaNYC
  • Citizens Union
  • Dev Bootcamp
  • Ontodia
  • Socrata
  • Sunlight Foundation

How to get started

Currently, there are three things you can do.

Download PDFs

Currently, we are working on developing a number of outlets to download this treasure trove of information. In total, there are 16.3 gigs of archival PDFs. If you have any problems downloading these files, report an issue on GitHub or the Discussion List

  • 1998 to 2008 are scaned documents
  • March 2008 till present are 'text selectable'

HTML/XML (Complete set)

If you have a tool that will crawl and download websites, you can download all of the PDFs from the City.

Dropbox (Complete set)

We have shared the complete collection of files via Dropbox. You can download them individually or you can add the primary folder and sync to a local storage device. (shareable link bit.ly/dropbox-crow)

BitTorrent Sync (Complete set)

As a bit of an experiment we are using BitTorrent protocol. This is the complete 16.3 gigs. Please help by downloading and socializing these PDFs. You will need to download BitTorrent Sync and use the following read-only 'secret' passcode "BDGM4KAQHZ6XII2JNJREDX6VDN3QTLI7G" (shareable link bit.ly/bts-crow)

FTP (In Process)

BetaNYC is hosting an FTP server with all of the PDFs. These files can be fetched anonymously via files.betanyc.us.

Google Docs (In Process)

If you have the Google Drive for your computer, you can download all of the PDFs to your local computer. We are in the process of uploading all of the documents. For now, you can only access the text selectable documents (March 2008 till present) via http://bit.ly/gdocs-crow

Press Releases, Blog Posts, and News Articles

Blog Posts

Press Releases / News Articles

About

City Record Online - PDF Scraping

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published