Skip to content

Latest commit

 

History

History
23 lines (15 loc) · 906 Bytes

readme.md

File metadata and controls

23 lines (15 loc) · 906 Bytes

Problem Statement :

#1. Write a program to read the content of any of the below website and all its sub pages and perform following actions:

  1. Parse all the pages and sub pages of News, Sports and Business section
  2. Extract the content, Image and Links
  3. Dump the Content, Image and Links into the respective mongo collections

Website used for this project is: https://www.firstpost.com/

Instructions to run the Scrapper-Project:

  1. Download the zip file and extract it.
  2. Open the scrapper folder in your VS code.
  3. Connect to the MongoDB server.
  4. Run command "npm install"; it will install all required dependencies.
  5. Run the "node app.js" command from the folder.
  6. Wait for a few seconds until scripts are running, then exit from cmd.
  7. Now you can access the scrapped data from the "news" collection from the "news-scrapper" database through MongoDB server.