web-crawler

This a python based web-crawler. Given a starting URL – http://www.wiprodigital.com/ - it visit all pages within the domain, but does not follow the links to external sites such as "twitter", "facebook", "google", "messenger", "mapsmaker", "linkedin", "bit", "youtube", "mckinsey", "windowsphone", "pac-online", "statcounter" and many other

system Requirements

Python 2.7
any standard windows/Linux OS 64 bit
PIP for python package management
Libraries/Modules needs to be installed - import re - import urllib2 - import BeautifulSoup

To Build and run

Install python on the target run machine
install/import required modules
set pythonpath
execute the program
make sure you have internet connection

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
crawler.py		crawler.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

web-crawler

system Requirements

To Build and run

About

Releases

Packages

Languages

raviusit/web-crawler

Folders and files

Latest commit

History

Repository files navigation

web-crawler

system Requirements

To Build and run

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages