Skip to content

Latest commit

 

History

History
28 lines (23 loc) · 1.08 KB

README.md

File metadata and controls

28 lines (23 loc) · 1.08 KB

#WebCrawler This is a small Iterable which has the ability to crawl Websites and output all the links found. For the HTTP communication it uses HttpUnit by Russel Gold. This project was developed as a little school project to show how the Java Iterator works.

NOTE: This project uses Maven as Build-Management-Tool, so import it as Maven project using Eclipse or IntelliJ Idea

#Usage A working demo is available in the source, called App.java.

Import at least

import com.meterware.httpunit.HttpUnitOptions;
import com.meterware.httpunit.WebLink;

and all the different Exceptions you need to use. Then just create a new WebCrawler object with two parameters, the full URL String and the depth it should crawl

WebCrawler exampleCrawler = new WebCrawler("www.example.com",4);

Then you can use the iterator to get a whole bunch of links(every single one that is not javascript made).

for(WebLink link : exampleCrawler) {
	//do what you want...
}