-
Notifications
You must be signed in to change notification settings - Fork 0
vish1/hbcse-crawler
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Copyright 2004, 2013 Vishwas Bhat, Apurva Pangam, Tarun Makhija, Vineet Jalali Multipurpose Internet crawler ----------------------------- Purpose: ------- Create a knowledge base on a particular domain like mathematics. Using set of keywords for the required domain, crawl the Internet for sites containing the keywords and store the relevant pages locally. Use MD5 hash to prevent storing the same page again. Execute the program: ------------------- ./agentxml.py Limitations of the crawler: -------------------------- 1. Does not handle authentication websites 2. Works for only http sites Required improvements: --------------------- 1. Reorganisation of code 2. Improve storage of captured sites Credits: ------- The crawler was created as part of a student project at the Homi Bhabh Center for Science Education under the guidance of Dr. Nagarjuna. We would like to thank him for his support and direction in creating this project.
About
Web crawler designed under Dr. Nagarjuna at HBCSE
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published