Skip to content

hasaniskandar/zaradotcom-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ZaradotcomScraper

zaradotcom-scraper is a javascript-generated web scraper for ZARA.

Setup

This section describes the steps to setup on local machine. Please make sure PhantomJS and Redis are already installed properly.

Clone and setup locally

# Clone the repo
git clone https://github.com/hasaniskandar/zaradotcom-scraper.git

# Go to the app root directory
cd zaradotcom-scraper

# If you are using RVM
rvm 2.1.0@zaradotcom-scraper --create --ruby-version

# Install gems
bundle install

# Setup the database
bundle exec rake db:setup

Start the app

bundle exec rails server

Start the worker

QUEUE=* bundle exec rake environment resque:work

Getting Started

  • Go to localhost:3000.
  • Click New scrape to start new scraper and wait until it is done (Note that it may take several hours, depends on internet connection and resources).
  • When it is done, download links will be visible.

Heroku Setup

This section describes the required steps to setup to Heroku on a free plan and unverified account.

Setup web app

Create an app on Cedar stack with heroku-buildpack-multi:

heroku apps:create zaradotcom-scraper --stack cedar \
                                      --buildpack https://github.com/ddollar/heroku-buildpack-multi.git

Set REDIS_URL manually to use Redis without an add-on:

# Replace the url below with the correct one:
heroku config:set REDIS_URL="redis://my-username:[email protected]:9999/" --app zaradotcom-scraper

Deploy:

# Push to Heroku
git push heroku master

# Compile assets and migrate the database
heroku run rake assets:precompile db:migrate --app zaradotcom-scraper

Scale 1 web dyno:

heroku ps:scale web=1 --app zaradotcom-scraper

Setup worker app

Create another app with the same stack and buildpack as the first one, and also set its remote:

heroku apps:create zaradotcom-scraper-worker --stack cedar \
                                             --buildpack https://github.com/ddollar/heroku-buildpack-multi.git \
                                             --remote heroku-worker

Set DATABASE_URL and REDIS_URL exactly the same with web app:

heroku config:set DATABASE_URL="`heroku config:get DATABASE_URL --app zaradotcom-scraper`" \
                  REDIS_URL="`heroku config:get REDIS_URL --app zaradotcom-scraper`" \
                  --app zaradotcom-scraper-worker

Set LD_LIBRARY_PATH and PATH to make heroku-buildpack-phantomjs works with heroku-buildpack-multi:

heroku config:set PATH="/usr/local/bin:/usr/bin:/bin:/app/vendor/phantomjs/bin" \
                  LD_LIBRARY_PATH="/usr/local/lib:/usr/lib:/lib:/app/vendor/phantomjs/lib" \
                  --app zaradotcom-scraper-worker

Deploy:

# Push to Heroku
git push heroku-worker master

# Compile assets and migrate the database
heroku run rake assets:precompile db:migrate --app zaradotcom-scraper-worker

Scale 1 worker dyno:

heroku ps:scale worker=1 --app zaradotcom-scraper-worker

Optionally, email notification can be enabled to monitor scraper. Notification will be sent to subscriber(s) whenever scraper is done or error.

# EMAIL_SUBSCRIBERS      => Coma separated email addresses to receive notifications
# ACTION_MAILER_URL_HOST => Host name of the web app
# SMTP_USERNAME          => Email for SMTP setting
# SMTP_PASSWORD          => Password for SMTP setting
heroku config:set EMAIL_SUBSCRIBERS="[email protected], [email protected]" \
                  ACTION_MAILER_URL_HOST="zaradotcom-scraper.herokuapp.com" \
                  SMTP_USERNAME="[email protected]" \
                  SMTP_PASSWORD="my-password" \
                  --app zaradotcom-scraper-worker

Known Issues

About

Javascript-generated web scraper for http://www.zara.com/

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published