Skip to content
jesusabdullah edited this page Sep 13, 2010 · 6 revisions

Pre-requisites

  • br somewhere handy in your path
  • vanilla Unix shell tools: sort, awk, ssh, netcat 1
  • password-less ssh to each machine you plan to use

Installation

The bashreduce installation is very straightforward. Just download and unpack the tarball or clone this repository using git. Now, let’s build the optional performance boosting utilities it comes with:

$ cd bashreduce/brutils
$ make
cc -O3 -Wall   -c -o brp.o brp.c
cc -o brp brp.o
cc -O3 -Wall   -c -o brm.o brm.c
cc -o brm brm.o

$ sudo make install
install -c brp /usr/local/bin
install -c brm /usr/local/bin

For convenience, put br somewhere on your PATH.

Configuration

Edit /etc/br.hosts and enter the machines you wish to use as workers, one host per line. Or specify your machines at runtime:

$ br -h "host1 host2 host3"

To take advantage of multiple cores, repeat the host name.

Examples

sorting

$ br < input > output

word count

$ br -r "uniq -c" < input > output

great big join

$ LC_ALL='C' br -r "join - /tmp/join_data" < input > output

distributed grep

$ br -m "grep pattern" < input > output

Notes

1 There are several versions of netcat. Ubuntu/Debian has two variants: " openbsd " and " traditional ". br only works when using netcat traditional. In this way, all machines you wish to use as workers must to be installed netcat traditional.