Skip to content

This is a small Iterable which has the ability to crawl Websites...soooon

Notifications You must be signed in to change notification settings

LittleRolf/crawl-o-mat

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 

Repository files navigation

#WebCrawler This is a small Iterable which has the ability to crawl Websites and output all the links found. For the HTTP communication it uses HttpUnit by Russel Gold. This project was developed as a little school project to show how the Java Iterator works.

NOTE: This project uses Maven as Build-Management-Tool, so import it as Maven project using Eclipse or IntelliJ Idea

#Usage A working demo is available in the source, called App.java.

Import at least

import com.meterware.httpunit.HttpUnitOptions;
import com.meterware.httpunit.WebLink;

and all the different Exceptions you need to use. Then just create a new WebCrawler object with two parameters, the full URL String and the depth it should crawl

WebCrawler exampleCrawler = new WebCrawler("www.example.com",4);

Then you can use the iterator to get a whole bunch of links(every single one that is not javascript made).

for(WebLink link : exampleCrawler) {
	//do what you want...
}

About

This is a small Iterable which has the ability to crawl Websites...soooon

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages