Skip to content

How to run (local or AWS EC2)

Kazuaki Ishizaki edited this page Jan 4, 2016 · 3 revisions

Local

Our Spark implementation comes with two sample programs for now. Scala example is in the examples/src/main directory. To run the Scala sample program, use bin/run-example [params] in the top-level spark-gpu directory in the release of our Spark you downloaded. (Behind the scenes, this invokes the more general spark-submit script for launching applications). For example,

LD_LIBRARY_PATH=/usr/local/cuda/lib64 ./bin/run-example SparkGPUPi 10

LD_LIBRARY_PATH=/usr/local/cuda/lib64 MASTER='local[2]' ./bin/run-example SparkGPULR 8 3200 32 5

You can also run Spark interactively through a modified version of the Scala shell. This is a great way to learn the framework.

LD_LIBRARY_PATH=/usr/local/cuda/lib64 ./bin/spark-shell --master local[2]

This is mostly equal to the original Spark steps at here. The most of descriptions of this page are borrowed from here to show small size of differences to execute Spark programs.

AWS EC2

We updated spark-ec2 script, located in Spark's ec2 directory, allows you to use GPU with our Spark on Amazon EC2. Most of operations are similar to the original spark-ec2, as described here.

Before You Start

  • Create an Amazon EC2 key pair for yourself from the AWS console clicking Key Pairs on the left sidebar, and creating and downloading a key.
  • Get your Amazon EC2 access Key ID and secret access key from the AWS homepage by clicking Account > Security Credentials > Access Credentials. ** Details are here

Launching a Cluster

  • Go into the ec2 directory in the release of our Spark you downloaded.
  • Run ./spark-ec2 -k <keypair> -i <key-file> -s <num-slaves> [--spot-price <price>] launch <cluster-name>, where <keypair> is the name of your EC2 key pair, <key-file is the private key file for your key pair, is the number of GPU slave nodes to launch, and is the name of your cluster. ** Wait 5-10 minutes to finish launching a cluster

For example,

bash export AWS_SECRET_ACCESS_KEY=AaBbCcDdEeFGgHhIiJjKkLlMmNnOoPpQqRrSsTtU export AWS_ACCESS_KEY_ID=ABCDEFG1234567890123 ./spark-ec2 --key-pair=awskey --identity-file=awskey.pem --region=us-east-1 --spot-price 0.1 -s 1 launch gpu-spark-cluster

Other options

  • --instance-type= can be used to specify an EC2 instance type for slave. For now, the script only supports GPU instance types, and the default type is g2.2xlarge.
  • --master-instance-type= can be used to specify an EC2 instance type for master. For now, the script only supports 64-bit instance types, and the default type is t2.micro.

Running Applications

  • Go into the ec2 directory in the release of our Spark you downloaded.
  • Run ./spark-ec2 -k <keypair> -i <key-file> login <cluster-name>

For example,

./spark-ec2 --key-pair=awskey --identity-file=awskey.pem login gpu-spark-cluster

  • Run a Spark sample application

For example,

<ec2>$ cd spark; MASTER='spark://localhost:7077' ./bin/run-example SparkGPULR 4 16000 4000 5

Terminating Applications

  • Go into the ec2 directory in the release of our Spark you downloaded.
  • Run ./spark-ec2 destroy <cluster-name>

For example,

./spark-ec2 destroy gpu-spark-cluster