Skip to content

Crawling Techniques

Stefan Vlems edited this page Oct 30, 2019 · 1 revision

Basic Web Application Crawler

The default crawler script analyses pages and searches for links and forms. Forms will be automatically parsed and filled with random data.

For performance reasons the following hard-limits are set by the crawler:

# maximum allowed variations of post data / GET parameters per URL
max_allowed_checksum=5
max_url_unique_keys=5
max_postdata_per_url=10

# blocks things like search form on every page
max_postdata_unique_keys=5

Advanced Web Application Crawler

This module will replace the default crawler whenever the target domain uses page states (like ASP.NET / JSP). Instead of post data the crawler will use Events so the scanners can reproduce valid page state tokens like the __VIEWSTATE.

Also the detection of forms is extended with functionality to detect dynamic targets and arguments set by JavaScript callbacks.

XHR Processing / The Mefjus Proxy

In addition to basic crawling the Helios platform also offers a Chromedriver extension with an interception proxy. This allows processing of AJAX / XHR requests.

Also manual crawling can be enabled by using --driver --interactive.

Clone this wiki locally