Skip to content

Built a web scrapper which scraps news content, images, and links and dumps them into MongoDB collections.

Notifications You must be signed in to change notification settings

vaibhav604/Assignment-NodeJS-Sraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Problem Statement :

#1. Write a program to read the content of any of the below website and all its sub pages and perform following actions:

  1. Parse all the pages and sub pages of News, Sports and Business section
  2. Extract the content, Image and Links
  3. Dump the Content, Image and Links into the respective mongo collections

Website used for this project is: https://www.firstpost.com/

Instructions to run the Scrapper-Project:

  1. Download the zip file and extract it.
  2. Open the scrapper folder in your VS code.
  3. Connect to the MongoDB server.
  4. Run command "npm install"; it will install all required dependencies.
  5. Run the "node app.js" command from the folder.
  6. Wait for a few seconds until scripts are running, then exit from cmd.
  7. Now you can access the scrapped data from the "news" collection from the "news-scrapper" database through MongoDB server.

About

Built a web scrapper which scraps news content, images, and links and dumps them into MongoDB collections.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published