Scan/Trim/Extract Pipeline for Coronavirus Site
- The code now expects to be run from the root directory of the repo. *
- This includes IDEs like VS Code. *
- Gets the data from urls in google sheet.
- Pulls the raw HTML
- Creates a clean version without the markup
- Push it into a github repo.
- pulls an image for each page
- pushed it to an S3 bucket
- Fire up a captive browser
- For a list of urls, take a screen shot
- If they change, push them into git