Skip to content

Latest commit

 

History

History
95 lines (61 loc) · 1.57 KB

File metadata and controls

95 lines (61 loc) · 1.57 KB

Linux

Here we'll show you how to install Spark 3.3.2 for Linux. We tested it on Ubuntu 20.04 (also WSL), but it should work for other Linux distros as well

Installing Java

Download OpenJDK 11 or Oracle JDK 11 (It's important that the version is 11 - spark requires 8 or 11)

We'll use OpenJDK

Download it (e.g. to ~/spark):

wget https://download.java.net/java/GA/jdk11/9/GPL/openjdk-11.0.2_linux-x64_bin.tar.gz

Unpack it:

tar xzfv openjdk-11.0.2_linux-x64_bin.tar.gz

define JAVA_HOME and add it to PATH:

export JAVA_HOME="${HOME}/spark/jdk-11.0.2"
export PATH="${JAVA_HOME}/bin:${PATH}"

check that it works:

java --version

Output:

openjdk 11.0.2 2019-01-15
OpenJDK Runtime Environment 18.9 (build 11.0.2+9)
OpenJDK 64-Bit Server VM 18.9 (build 11.0.2+9, mixed mode)

Remove the archive:

rm openjdk-11.0.2_linux-x64_bin.tar.gz

Installing Spark

Download Spark. Use 3.3.2 version:

wget https://dlcdn.apache.org/spark/spark-3.3.2/spark-3.3.2-bin-hadoop3.tgz

Unpack:

tar xzfv spark-3.3.2-bin-hadoop3.tgz

Remove the archive:

rm spark-3.3.2-bin-hadoop3.tgz

Add it to PATH:

export SPARK_HOME="${HOME}/spark/spark-3.3.2-bin-hadoop3"
export PATH="${SPARK_HOME}/bin:${PATH}"

Testing Spark

Execute spark-shell and run the following:

val data = 1 to 10000
val distData = sc.parallelize(data)
distData.filter(_ < 10).collect()

PySpark

It's the same for all platforms. Go to pyspark.md.