Research project around the possibility of more efficient implementation of Spark RDDs without compromising on user friendly API.
Tested on Linux using the Rust nightly.
In one terminal run:
cd r2d2
cargo run --example sum_by_key -- --id 123 --master --port 6969
In another terminal run:
cd r2d2
./run_workers.sh sum_by_key debug
Example user codes can be found in r2d2/examples
directory. Some examples
might need specific folder setup with generated data.
You'll need to look inside run_workers.sh
and spark.toml
files first. To
run remotely you need to compile binary with either debug or release profile,
distribute the binary to all nodes, and run them on ports specified
in spark.toml