Skip to content

This a python based web-crawler. Given a starting URL - it visit all pages within the domain, but does not follow the links to external sites such as Google, Twitter, Facebook, youtube etc.

Notifications You must be signed in to change notification settings

raviusit/web-crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

web-crawler

This a python based web-crawler. Given a starting URL – http://www.wiprodigital.com/ - it visit all pages within the domain, but does not follow the links to external sites such as "twitter", "facebook", "google", "messenger", "mapsmaker", "linkedin", "bit", "youtube", "mckinsey", "windowsphone", "pac-online", "statcounter" and many other

system Requirements

  • Python 2.7
  • any standard windows/Linux OS 64 bit
  • PIP for python package management
  • Libraries/Modules needs to be installed - import re - import urllib2 - import BeautifulSoup

To Build and run

  • Install python on the target run machine
  • install/import required modules
  • set pythonpath
  • execute the program
  • make sure you have internet connection

About

This a python based web-crawler. Given a starting URL - it visit all pages within the domain, but does not follow the links to external sites such as Google, Twitter, Facebook, youtube etc.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages