Skip to content
This repository has been archived by the owner on Sep 1, 2022. It is now read-only.

Commit

Permalink
Update build-spark.sh (#7)
Browse files Browse the repository at this point in the history
  • Loading branch information
radcheb authored Mar 10, 2021
1 parent ab3287d commit b256d3c
Showing 1 changed file with 5 additions and 3 deletions.
8 changes: 5 additions & 3 deletions build-spark.sh
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ SPARK_VERSION=2.4.5
HADOOP_VERSION=2.8.5
HIVE_VERSION=1.2.1
AWS_SDK_VERSION=1.11.682
BIGQUERY_CONNECTOR_VERSION=0.19.0
BIGQUERY_CONNECTOR_VERSION=0.19.1

# BUILD HIVE FOR HIVE v1 - needed for spark client
git clone https://github.com/apache/hive.git /opt/hive
Expand Down Expand Up @@ -39,9 +39,11 @@ mvn dependency:get -Dartifact=net.minidev:json-smart:1.3.1
find /opt/glue -name "*.jar" -exec cp {} jars \;
# Copy configuration
cp /conf/* conf
# Copy AWS jars
echo :quit | ./bin/spark-shell --conf spark.jars.packages=com.amazonaws:aws-java-sdk:$AWS_SDK_VERSION,org.apache.hadoop:hadoop-aws:$HADOOP_VERSION,com.google.cloud.spark:spark-bigquery-with-dependencies_2.11:$BIGQUERY_CONNECTOR_VERSION,com.google.cloud.bigdataoss:gcs-connector:hadoop2-2.2.0
# Copy AWS and Bigquery connector jars
echo :quit | ./bin/spark-shell --conf spark.jars.packages=com.amazonaws:aws-java-sdk:$AWS_SDK_VERSION,org.apache.hadoop:hadoop-aws:$HADOOP_VERSION,com.google.cloud.spark:spark-bigquery-with-dependencies_2.11:$BIGQUERY_CONNECTOR_VERSION
cp /root/.ivy2/jars/*.jar jars
# Download GCS connector jar
wget https://storage.googleapis.com/hadoop-lib/gcs/gcs-connector-hadoop2-latest.jar -P jars/
# Create archive
DIRNAME=spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION%.*}-glue
mv /opt/spark/dist /opt/spark/$DIRNAME
Expand Down

0 comments on commit b256d3c

Please sign in to comment.