The image is built on top of OpenJDK (8-jdk). The latest version (2.4.4) of Apache Spark is installed in this image. Additionally, ssh has been installed and set-up to be executed password-less. For Apache Spark to be deployed in cluster mode, password-less ssh setup is mandatory.
Apache Spark can be executed in local mode without any additional setup.
Start the Docker image by executing the Docker run
command passing /bin/bash
command option to start an interactive shell session.
docker run -it docker-spark /bin/bash
Once an interactive shell session has been established for the Docker container, Spark shell can be started to execute code against Spark in local mode using Scala programming language as follows:
$SPARK_HOME/bin/spark-shell --master local[2]
Dockerfile for Apache Spark v2.4.4
Dockerfile for Apache Spark v2.3.4