Skip to content

Create_AMI

leewyang edited this page Feb 21, 2018 · 5 revisions

Create TensorFlowOnSpark (TFoS) AMI on EC2

This tutorial outlines the steps to create TFoS AMI on AWS EC2 using a p2.xlarge instance using Ubuntu Server 16.04.

A pre-built AMI image is available for you to use. See Get Started on EC2.

Launch an Ubuntu Server Instance

We launch an Ubuntu Server 16.04 LTS (HVM) AMI with a p2.xlarge instance in Amazon EC2. 16 GB of storage on the root partition is required.

  1. Go to https://us-west-2.console.aws.amazon.com/console
  2. Select EC2
  3. Request Spot Requests
  4. Specify an AMI
  5. Specify the spot max price
  6. Wait for instance to enter running state

SSH onto Your Instance

Please follow [AWS instruction](http://docs.aws.amazon.com/cli/latest/userguide/cli-ec2-keypairs.html] to create a keypair. Here is an example command):

export EC2_KEY=ec2_${USER}
export EC2_PEM_FILE=~/.ssh/ec2_${USER}.pem
ec2-create-keypair -O ${AWS_ACCESS_KEY_ID} -W ${AWS_SECRET_ACCESS_KEY} --region us-west-2 ${EC2_KEY}
emacs ${EC2_PEM_FILE}
chmod 600 ${EC2_PEM_FILE}

SSH onto your instance:

ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ${EC2_PEM_FILE} root@<MASTER>

Install CUDA 8 + CuDNN 5 (GPU only)

For GPU instances only, you will need to install the CUDA drivers and the CuDNN libraries.

wget https://developer.nvidia.com/compute/cuda/8.0/prod/local_installers/cuda-repo-ubuntu1604-8-0-local_8.0.44-1_amd64-deb

sudo dpkg -i cuda-repo-ubuntu1604-8-0-local_8.0.44-1_amd64-deb
rm cuda-repo-ubuntu1604-8-0-local_8.0.44-1_amd64-deb
sudo apt-get update
sudo apt-get install -y cuda

Downloading cuDNN requires logging into NVIDIA developer site, so we can’t use wget to fetch the files. Download the following files from NVIDIA and upload them to your AWS instance.

sudo dpkg -i libcudnn5_5.1.10-1+cuda8.0_amd64.deb
sudo dpkg -i libcudnn5-dev_5.1.10-1+cuda8.0_amd64.deb
sudo mkdir  /usr/lib/x86_64-linux-gnu/include
sudo cp /usr/include/cudnn.h  /usr/lib/x86_64-linux-gnu/include

Install TensorFlow and TensorFlowOnSpark

TensorFlow now provides pip packages for latest CPU and GPU builds. In most cases, you can just install TensorFlow via one of the following:

sudo apt-get install python-pip
pip install tensorflow          # Python 2 CPU
pip3 install tensorflow         # Python 3 CPU
pip install tensorflow-gpu      # Python 2 GPU
pip3 install tensorflow-gpu     # Python 3 GPU

TensorFlowOnSpark also provides a pip package, which you can install via:

pip install tensorflowonspark

Test Your Installation

python
> import tensorflow as tf
> from tensorflowonspark import TFCluster

If you see no errors, then you should be ready to go. If you encounter any issues, you should check the official installation instructions from TensorFlow for more information.

Installing TensorFlow from Source (OPTIONAL)

For some specialized installations, e.g. to enable RDMA/iverbs, you may need to compile TensorFlow from source.

The following instructions are provided here for easy reference, but the definitive instructions are available on the TensorFlow site

Install Build Dependencies:

sudo apt-get update
sudo apt-get upgrade
sudo apt-get install -y build-essential git libfreetype6-dev libxft-dev libncurses-dev libopenblas-dev gfortran python-matplotlib libblas-dev liblapack-dev libatlas-base-dev python-dev python-pydot linux-headers-generic linux-image-extra-virtual unzip python-numpy swig python-pandas python-sklearn unzip wget pkg-config zip g++ zlib1g-dev libcurl3-dev 

Configure the environment, by adding to the following lines to your ~/.bash_profile file.

export CUDA_ROOT=/usr/local/cuda 
export CUDA_HOME=$CUDA_ROOT 
export PATH=$PATH:$CUDA_ROOT/bin 
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CUDA_ROOT/lib64:$CUDA_ROOT/extras/CUPTI/lib64
export HADOOP_HOME=/root/ephemeral-hdfs
export SPARK_HOME=/root/spark
export PATH=${PATH}:${HADOOP_HOME}/bin:${SPARK_HOME}/bin

Install Java 8:

sudo add-apt-repository -y ppa:webupd8team/java
sudo apt-get update 
echo debconf shared/accepted-oracle-license-v1-1 select true | sudo debconf-set-selections
echo debconf shared/accepted-oracle-license-v1-1 seen true | sudo debconf-set-selections
sudo apt-get install -y oracle-java8-installer

Install Bazel:

echo "deb [arch=amd64] http://storage.googleapis.com/bazel-apt stable jdk1.8" | sudo tee /etc/apt/sources.list.d/bazel.list
curl https://storage.googleapis.com/bazel-apt/doc/apt-key.pub.gpg | sudo apt-key add -
sudo apt-get update && sudo apt-get install bazel
sudo apt-get upgrade bazel

Clone TensorFlow repository and configure your build:

git clone https://github.com/tensorflow/tensorflow.git
cd tensorflow 
./configure

You should access almost all defaults:

  • Hadoop File System support? [y/N] y
  • CUDA support? [y/N] y
  • CUDA SDK version you want to use, e.g. 7.0. [Leave empty to use system default]: 8.0
  • Cudnn version you want to use. [Leave empty to use system default]: 5.1.10
  • location where cuDNN 5.1.10 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: /usr/lib/x86_64-linux-gnu
  • compute capability of your device [Default is: "3.5,5.2"]: 3.7

Build TensorFlow (Be patient. It can take several hours):

bazel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg

Install the package:

sudo pip install /tmp/tensorflow_pkg/tensorflow-*.whl

Test TensorFlow:

cd
python
> import tensorflow as tf

Create AMI image

Exit from your terminal:

rm -rf /root/.cache/bazel/
cat /dev/null > ~/.bash_history && history -c && exit

Use Amazon EC2 console to create an AMI image from your instance: Actions -> Image -> Create Image.