GitHub - LittleRolf/crawl-o-mat: This is a small Iterable which has the ability to crawl Websites...soooon

#WebCrawler This is a small Iterable which has the ability to crawl Websites and output all the links found. For the HTTP communication it uses HttpUnit by Russel Gold. This project was developed as a little school project to show how the Java Iterator works.

NOTE: This project uses Maven as Build-Management-Tool, so import it as Maven project using Eclipse or IntelliJ Idea

#Usage A working demo is available in the source, called App.java.

Import at least

import com.meterware.httpunit.HttpUnitOptions;
import com.meterware.httpunit.WebLink;

and all the different Exceptions you need to use. Then just create a new WebCrawler object with two parameters, the full URL String and the depth it should crawl

WebCrawler exampleCrawler = new WebCrawler("www.example.com",4);

Then you can use the iterator to get a whole bunch of links(every single one that is not javascript made).

for(WebLink link : exampleCrawler) {
	//do what you want...
}

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
src/main/java/de/littlerolf/crawlomat		src/main/java/de/littlerolf/crawlomat
.gitignore		.gitignore
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

LittleRolf/crawl-o-mat

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages