ZaradotcomScraper

zaradotcom-scraper is a javascript-generated web scraper for ZARA.

Setup

This section describes the steps to setup on local machine. Please make sure PhantomJS and Redis are already installed properly.

Clone and setup locally

# Clone the repo
git clone https://github.com/hasaniskandar/zaradotcom-scraper.git

# Go to the app root directory
cd zaradotcom-scraper

# If you are using RVM
rvm 2.1.0@zaradotcom-scraper --create --ruby-version

# Install gems
bundle install

# Setup the database
bundle exec rake db:setup

Start the app

bundle exec rails server

Start the worker

QUEUE=* bundle exec rake environment resque:work

Getting Started

Go to localhost:3000.
Click New scrape to start new scraper and wait until it is done (Note that it may take several hours, depends on internet connection and resources).
When it is done, download links will be visible.

Heroku Setup

This section describes the required steps to setup to Heroku on a free plan and unverified account.

Setup web app

Create an app on Cedar stack with heroku-buildpack-multi:

heroku apps:create zaradotcom-scraper --stack cedar \
                                      --buildpack https://github.com/ddollar/heroku-buildpack-multi.git

Set REDIS_URL manually to use Redis without an add-on:

# Replace the url below with the correct one:
heroku config:set REDIS_URL="redis://my-username:[email protected]:9999/" --app zaradotcom-scraper

Deploy:

# Push to Heroku
git push heroku master

# Compile assets and migrate the database
heroku run rake assets:precompile db:migrate --app zaradotcom-scraper

Scale 1 web dyno:

heroku ps:scale web=1 --app zaradotcom-scraper

Setup worker app

Create another app with the same stack and buildpack as the first one, and also set its remote:

heroku apps:create zaradotcom-scraper-worker --stack cedar \
                                             --buildpack https://github.com/ddollar/heroku-buildpack-multi.git \
                                             --remote heroku-worker

Set DATABASE_URL and REDIS_URL exactly the same with web app:

heroku config:set DATABASE_URL="`heroku config:get DATABASE_URL --app zaradotcom-scraper`" \
                  REDIS_URL="`heroku config:get REDIS_URL --app zaradotcom-scraper`" \
                  --app zaradotcom-scraper-worker

Set LD_LIBRARY_PATH and PATH to make heroku-buildpack-phantomjs works with heroku-buildpack-multi:

heroku config:set PATH="/usr/local/bin:/usr/bin:/bin:/app/vendor/phantomjs/bin" \
                  LD_LIBRARY_PATH="/usr/local/lib:/usr/lib:/lib:/app/vendor/phantomjs/lib" \
                  --app zaradotcom-scraper-worker

Deploy:

# Push to Heroku
git push heroku-worker master

# Compile assets and migrate the database
heroku run rake assets:precompile db:migrate --app zaradotcom-scraper-worker

Scale 1 worker dyno:

heroku ps:scale worker=1 --app zaradotcom-scraper-worker

Optionally, email notification can be enabled to monitor scraper. Notification will be sent to subscriber(s) whenever scraper is done or error.

# EMAIL_SUBSCRIBERS      => Coma separated email addresses to receive notifications
# ACTION_MAILER_URL_HOST => Host name of the web app
# SMTP_USERNAME          => Email for SMTP setting
# SMTP_PASSWORD          => Password for SMTP setting
heroku config:set EMAIL_SUBSCRIBERS="[email protected], [email protected]" \
                  ACTION_MAILER_URL_HOST="zaradotcom-scraper.herokuapp.com" \
                  SMTP_USERNAME="[email protected]" \
                  SMTP_PASSWORD="my-password" \
                  --app zaradotcom-scraper-worker

Known Issues

Error R14 (Memory quota exceeded): phantomjs increases memory usage each time page is loaded until it exceeds quota limit.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
app		app
bin		bin
config		config
db		db
lib		lib
log		log
public		public
test		test
vendor/assets		vendor/assets
.buildpacks		.buildpacks
.gitignore		.gitignore
Gemfile		Gemfile
Gemfile.lock		Gemfile.lock
Procfile		Procfile
README.md		README.md
Rakefile		Rakefile
config.ru		config.ru

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ZaradotcomScraper

Setup

Clone and setup locally

Start the app

Start the worker

Getting Started

Heroku Setup

Setup web app

Setup worker app

Known Issues

About

Releases

Packages

Languages

hasaniskandar/zaradotcom-scraper

Folders and files

Latest commit

History

Repository files navigation

ZaradotcomScraper

Setup

Clone and setup locally

Start the app

Start the worker

Getting Started

Heroku Setup

Setup web app

Setup worker app

Known Issues

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages