Skip to content
jweslley edited this page Sep 13, 2010 · 6 revisions

Pre-requisites

  • br somewhere handy in your path
  • vanilla Unix shell tools: sort, awk, ssh, netcat 1
  • password-less ssh to each machine you plan to use

Installation

The bashreduce installation is very straightforward. Just download and unpack the tarball or clone this repository using git . Now, let’s build the optional performance boosting utilities it comes with:

$ cd bashreduce/brutils
$ make
cc -O3 -Wall   -c -o brp.o brp.c
cc -o brp brp.o
cc -O3 -Wall   -c -o brm.o brm.c
cc -o brm brm.o

$ sudo make install
install -c brp /usr/local/bin
install -c brm /usr/local/bin

For convenience, put br somewhere on your PATH.

Configuration

Edit /etc/br.hosts and enter the machines you wish to use as workers, one host per line. Or specify your machines at runtime:

$ br -h "host1 host2 host3"

To take advantage of multiple cores, repeat the host name.

Examples

sorting

$ br < input > output

word count

$ br -r "uniq -c" < input > output

great big join

$ LC_ALL='C' br -r "join - /tmp/join_data" < input > output

distributed grep

$ br -m "grep pattern" < input > output

Notes

1 There are several versions of netcat. Ubuntu/Debian has two variants: “openbsd” and “traditional”. br just works when using netcat traditional. In this way, all machines you wish to use as workers must to be installed netcat traditional.

Clone this wiki locally