Using Hadoop for website stress testing ("Benign DDos")

Hadoop is engineered to consume the full capacity of every available resource up to the currently-limiting one. So in general, you should never issue requests against external services from a Hadoop job — one-by-one queries against a database; crawling web pages; requests to an external API. The resulting load spike will effectively be attempting what web security folks call a "DDoS", or distributed denial of service attack.

Unless of course you are trying to test a service for resilience against an adversarial DDoS — in which case that assault is a feature, not a bug!

elephant_stampede

        require 'faraday'

        processor :elephant_stampede do

	  def process(logline)
	    beg_at = Time.now.to_f
	    resp = Faraday.get url_to_fetch(logline)
	    yield summarize(resp, beg_at)
	  end

	  def summarize(resp, beg_at)
	    duration = Time.now.to_f - beg_at
	    bytesize = resp.body.bytesize
	    { duration: duration, bytesize: bytesize }
	  end

	  def url_to_fetch(logline)
	    logline.url
	  end
	end

	flow(:mapper){ input > parse_loglines > elephant_stampede }

You must use Wukong’s eventmachine bindings to make more than one simultaneous request per mapper.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

07f-benign_ddos.asciidoc

07f-benign_ddos.asciidoc

Using Hadoop for website stress testing ("Benign DDos")

Files

07f-benign_ddos.asciidoc

Latest commit

History

07f-benign_ddos.asciidoc

File metadata and controls

Using Hadoop for website stress testing ("Benign DDos")