Skip to content

Latest commit

 

History

History
174 lines (118 loc) · 8.33 KB

spark-submit-SparkSubmitCommandBuilder.adoc

File metadata and controls

174 lines (118 loc) · 8.33 KB

SparkSubmitCommandBuilder Command Builder

SparkSubmitCommandBuilder is used to build a command that spark-submit and SparkLauncher use to launch a Spark application.

SparkSubmitCommandBuilder uses the first argument to distinguish between shells:

  1. pyspark-shell-main

  2. sparkr-shell-main

  3. run-example

Caution
FIXME Describe run-example

SparkSubmitCommandBuilder parses command-line arguments using OptionParser (which is a SparkSubmitOptionParser). OptionParser comes with the following methods:

  1. handle to handle the known options (see the table below). It sets up master, deployMode, propertiesFile, conf, mainClass, sparkArgs internal properties.

  2. handleUnknown to handle unrecognized options that usually lead to Unrecognized option error message.

  3. handleExtraArgs to handle extra arguments that are considered a Spark application’s arguments.

Note
For spark-shell it assumes that the application arguments are after spark-submit's arguments.

SparkSubmitCommandBuilder.buildCommand / buildSparkSubmitCommand

public List<String> buildCommand(Map<String, String> env)
Note
buildCommand is a part of the AbstractCommandBuilder public API.

SparkSubmitCommandBuilder.buildCommand simply passes calls on to buildSparkSubmitCommand private method (unless it was executed for pyspark or sparkr scripts which we are not interested in in this document).

buildSparkSubmitCommand Internal Method

private List<String> buildSparkSubmitCommand(Map<String, String> env)

buildSparkSubmitCommand starts by building so-called effective config. When in client mode, buildSparkSubmitCommand adds spark.driver.extraClassPath to the result Spark command.

Note
Use spark-submit to have spark.driver.extraClassPath in effect.

buildSparkSubmitCommand builds the first part of the Java command passing in the extra classpath (only for client deploy mode).

Caution
FIXME Add isThriftServer case.

buildSparkSubmitCommand appends SPARK_SUBMIT_OPTS and SPARK_JAVA_OPTS environment variables.

(only for client deploy mode) …​

Caution
FIXME Elaborate on the client deply mode case.

addPermGenSizeOpt case…​elaborate

Caution
FIXME Elaborate on addPermGenSizeOpt

buildSparkSubmitCommand appends org.apache.spark.deploy.SparkSubmit and the command-line arguments (using buildSparkSubmitArgs).

buildSparkSubmitArgs method

List<String> buildSparkSubmitArgs()

buildSparkSubmitArgs builds a list of command-line arguments for spark-submit.

buildSparkSubmitArgs uses a SparkSubmitOptionParser to add the command-line arguments that spark-submit recognizes (when it is executed later on and uses the very same SparkSubmitOptionParser parser to parse command-line arguments).

Table 1. SparkSubmitCommandBuilder Properties and Corresponding SparkSubmitOptionParser Attributes
SparkSubmitCommandBuilder Property SparkSubmitOptionParser Attribute

verbose

VERBOSE

master

MASTER [master]

deployMode

DEPLOY_MODE [deployMode]

appName

NAME [appName]

conf

CONF [key=value]*

propertiesFile

PROPERTIES_FILE [propertiesFile]

jars

JARS [comma-separated jars]

files

FILES [comma-separated files]

pyFiles

PY_FILES [comma-separated pyFiles]

mainClass

CLASS [mainClass]

sparkArgs

sparkArgs (passed straight through)

appResource

appResource (passed straight through)

appArgs

appArgs (passed straight through)

getEffectiveConfig Internal Method

Map<String, String> getEffectiveConfig()

getEffectiveConfig internal method builds effectiveConfig that is conf with the Spark properties file loaded (using loadPropertiesFile internal method) skipping keys that have already been loaded (it happened when the command-line options were parsed in handle method).

Note
Command-line options (e.g. --driver-class-path) have higher precedence than their corresponding Spark settings in a Spark properties file (e.g. spark.driver.extraClassPath). You can therefore control the final settings by overriding Spark settings on command line using the command-line options. charset and trims white spaces around values.

isClientMode Internal Method

private boolean isClientMode(Map<String, String> userProps)

isClientMode checks master first (from the command-line options) and then spark.master Spark property. Same with deployMode and spark.submit.deployMode.

Caution
FIXME Review master and deployMode. How are they set?

isClientMode responds positive when no explicit master and client deploy mode set explicitly.

OptionParser

OptionParser is a custom SparkSubmitOptionParser that SparkSubmitCommandBuilder uses to parse command-line arguments. It defines all the SparkSubmitOptionParser callbacks, i.e. handle, handleUnknown, and handleExtraArgs, for command-line argument handling.

OptionParser’s handle Callback

boolean handle(String opt, String value)

OptionParser comes with a custom handle callback (from the SparkSubmitOptionParser callbacks).

Table 2. handle Method
Command-Line Option Property / Behaviour

--master

master

--deploy-mode

deployMode

--properties-file

propertiesFile

--driver-memory

Sets spark.driver.memory (in conf)

--driver-java-options

Sets spark.driver.extraJavaOptions (in conf)

--driver-library-path

Sets spark.driver.extraLibraryPath (in conf)

--driver-class-path

Sets spark.driver.extraClassPath (in conf)

--conf

Expects a key=value pair that it puts in conf

--class

Sets mainClass (in conf).

It may also set allowsMixedArguments and appResource if the execution is for one of the special classes, i.e. spark-shell, SparkSQLCLIDriver, or HiveThriftServer2.

--kill | --status

Disables isAppResourceReq and adds itself with the value to sparkArgs.

--help | --usage-error

Disables isAppResourceReq and adds itself to sparkArgs.

--version

Disables isAppResourceReq and adds itself to sparkArgs.

anything else

Adds an element to sparkArgs

OptionParser’s handleUnknown Method

boolean handleUnknown(String opt)

If allowsMixedArguments is enabled, handleUnknown simply adds the input opt to appArgs and allows for further parsing of the argument list.

Caution
FIXME Where’s allowsMixedArguments enabled?

If isExample is enabled, handleUnknown sets mainClass to be org.apache.spark.examples.[opt] (unless the input opt has already the package prefix) and stops further parsing of the argument list.

Caution
FIXME Where’s isExample enabled?

Otherwise, handleUnknown sets appResource and stops further parsing of the argument list.

OptionParser’s handleExtraArgs Method

void handleExtraArgs(List<String> extra)

handleExtraArgs adds all the extra arguments to appArgs.