Sainsbury’s Software Engineering Test

This task is intended to test your ability to consume a webpage, process some data and present it.

Using best practice coding methods, build a console application that scrapes the Sainsbury’s grocery site - Ripe Fruits page and returns a JSON array of all the products on the page.

You need to follow each link and get the size (in kb) of the linked HTML (no assets) and the description to display in the JSON.

Each element in the JSON results array should contain title, unit_price, size and description keys corresponding to items in the table. Additionally, there should be a total field which is a sum of all unit prices on the page.

The link to use is: http://hiring-tests.s3-website-eu-west-1.amazonaws.com/2015_Developer_Scrape/5_products.html

Example JSON:

{
   "results":[
      {
         "title":"Sainsbury's Avocado, Ripe & Ready x2",
         "size":"90.6kb",
         "unit_price":1.80,
         "description":"Great to eat now - refrigerate at home 1 of 5 a day 1 avocado counts as 1 of your 5..."
      },
      {
         "title":"Sainsbury's Avocado, Ripe & Ready x4",
         "size":"87kb",
         "unit_price":2.00,
         "description":"Great to eat now - refrigerate at home 1 of 5 a day 1 "
      }
   ],
   "total":3.80
}

Installation

Clone the repository and install dependencies:

    ./composer.phar install

Run

Just launch the default command below in your shell:

 
 bin/scraper

Opionally you can pass a specific grocery list url

 
 bin/scraper products-scraper http://another-sainsburys-grocery-list-url

You can get a formatted json output using the --pretty option

 
 bin/scraper products-scraper --pretty

Tests

PhpUnit

 bin/phpunit

Behat

 bin/behat

Design

I designed the app as per requirement: the code should be as concise as possibile and get straight to the point without forgetting decoupled code. I could have used a dependency injection container, even a small one like Pimple, but I preferred to keep it simple and initialize the objects with their relations directly in the app

Structure

Command ProductsScraperCommand is responsible to start the app and to handle the possible command options
Service ProductsInfoScraper is the service responsible to call the scraper and collect the product info
Scraper ProductDetailScraper and ProductListScraper
Model Product, Products and Url are the main object in the domain

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
bin		bin
features		features
src/Sainsburys		src/Sainsburys
test/Sainsburys		test/Sainsburys
.gitignore		.gitignore
.travis.yml		.travis.yml
README.md		README.md
composer.json		composer.json
composer.lock		composer.lock
composer.phar		composer.phar
phpunit.xml.dist		phpunit.xml.dist

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sainsbury’s Software Engineering Test

Installation

Run

Tests

PhpUnit

Behat

Design

Structure

About

Releases

Packages

Languages

cirpo/sainsburys-scraper

Folders and files

Latest commit

History

Repository files navigation

Sainsbury’s Software Engineering Test

Installation

Run

Tests

PhpUnit

Behat

Design

Structure

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages