This is for https://github.com/vipulnaik/donations
Specific issue: vipulnaik/donations#20
There are two scripts here:
scrape.py
: This is the incremental scraper. It will download just the latest version of the top donors/contributors page and will check that against the donations database to see what new donations have been made since the last time the donations database was updated. It then outputs this latest round of donations as a SQL file for use in the donations database.scrape2.py
: This is the historical scraper. It will download each Internet Archive snapshot of the top donors page and will infer all donations that have been made historically (since early 2015, when the top donors page first appeared). It has two output formats depending on the argument you give to it. When run with theby_donor
argument (like./scrape2.py by_donor > out
), it will list each donor along with donations from the DLW database and the donations from the top donors page, so that you can compare which donations are tracked where. It looks like this. When run with thesql
argument, it will output SQL tuples of the donations inferred from the top donors page snapshots. It looks like this.
CC0.