-
Notifications
You must be signed in to change notification settings - Fork 4
Exercise: Shell scripts
Let's build our scraper that we introduced at our very first meeting! Recall (this command is also located in scraper/scraper.sh
:
cutpoint=$(echo "http://ecx.images-amazon.com/images/I/" | wc -m | grep '[0-9]\{1,\}' --only-matching); mkdir -p images; curl --silent http://www.amazon.com/s/\&field-keywords\=ocaml | grep 'http://ecx.images-amazon.com/images/I/[0-9A-Za-z\.\_\,\%\\-]\{0,\}.jpg' --only-matching | while read image; do suffix=$(echo $image | cut -c $cutpoint-); wget $image -O images/$suffix; done
Let's break this down piece-by-piece and rewrite this more elegantly as a shell script!
How did I know to scrape images from http://ecx.images-amazon.com/images/I/
? Go to www.amazon.com
, do a search for ocaml
, then right-click a result image and click "Inspect Element":
This should open up the web inspector with the corresponding html highlighted.
If you look more carefully, you'll see the source of the image:
It's http://ecx.images-amazon.com/images/I/
! If you poke around at the other images, you'll see that they all start with this same prefix as well.
Make sure you have the bootcamp code repo (same as the scavenger hunt exercise).
git clone git://github.com/hcs/bootcamp-unix.git
cd bootcamp-unix/exercise-scraper
Open up the file scraper.sh
in your favorite text editor, and then follow the instructions in the file to
finish building the scraper.
When you think you're done, run the scraper ./scraper.sh
, and this should create an images
directory will the images! To remove the images:
rm -rf images
The solution is located in ./scraper_solution.sh
. To see the original scraper from the first week and the prettified version of it:
cd ../scraper
ls
There should be two files scraper.sh
(the original version in the demo) and scraper_pretty.sh
(a more readable version of it).
Go back to the main page.