Problem Statement :
#1. Write a program to read the content of any of the below website and all its sub pages and perform following actions:
- Parse all the pages and sub pages of News, Sports and Business section
- Extract the content, Image and Links
- Dump the Content, Image and Links into the respective mongo collections
Website used for this project is: https://www.firstpost.com/
Instructions to run the Scrapper-Project:
- Download the zip file and extract it.
- Open the scrapper folder in your VS code.
- Connect to the MongoDB server.
- Run command "npm install"; it will install all required dependencies.
- Run the "node app.js" command from the folder.
- Wait for a few seconds until scripts are running, then exit from cmd.
- Now you can access the scrapped data from the "news" collection from the "news-scrapper" database through MongoDB server.