Skip to content

Latest commit



174 lines (118 loc) · 8.33 KB


File metadata and controls

174 lines (118 loc) · 8.33 KB

SparkSubmitCommandBuilder Command Builder

SparkSubmitCommandBuilder is used to build a command that spark-submit and SparkLauncher use to launch a Spark application.

SparkSubmitCommandBuilder uses the first argument to distinguish between shells:

  1. pyspark-shell-main

  2. sparkr-shell-main

  3. run-example

FIXME Describe run-example

SparkSubmitCommandBuilder parses command-line arguments using OptionParser (which is a SparkSubmitOptionParser). OptionParser comes with the following methods:

  1. handle to handle the known options (see the table below). It sets up master, deployMode, propertiesFile, conf, mainClass, sparkArgs internal properties.

  2. handleUnknown to handle unrecognized options that usually lead to Unrecognized option error message.

  3. handleExtraArgs to handle extra arguments that are considered a Spark application’s arguments.

For spark-shell it assumes that the application arguments are after spark-submit's arguments.

SparkSubmitCommandBuilder.buildCommand / buildSparkSubmitCommand

public List<String> buildCommand(Map<String, String> env)
buildCommand is a part of the AbstractCommandBuilder public API.

SparkSubmitCommandBuilder.buildCommand simply passes calls on to buildSparkSubmitCommand private method (unless it was executed for pyspark or sparkr scripts which we are not interested in in this document).

buildSparkSubmitCommand Internal Method

private List<String> buildSparkSubmitCommand(Map<String, String> env)

buildSparkSubmitCommand starts by building so-called effective config. When in client mode, buildSparkSubmitCommand adds spark.driver.extraClassPath to the result Spark command.

Use spark-submit to have spark.driver.extraClassPath in effect.

buildSparkSubmitCommand builds the first part of the Java command passing in the extra classpath (only for client deploy mode).

FIXME Add isThriftServer case.

buildSparkSubmitCommand appends SPARK_SUBMIT_OPTS and SPARK_JAVA_OPTS environment variables.

(only for client deploy mode) …​

FIXME Elaborate on the client deply mode case.

addPermGenSizeOpt case…​elaborate

FIXME Elaborate on addPermGenSizeOpt

buildSparkSubmitCommand appends org.apache.spark.deploy.SparkSubmit and the command-line arguments (using buildSparkSubmitArgs).

buildSparkSubmitArgs method

List<String> buildSparkSubmitArgs()

buildSparkSubmitArgs builds a list of command-line arguments for spark-submit.

buildSparkSubmitArgs uses a SparkSubmitOptionParser to add the command-line arguments that spark-submit recognizes (when it is executed later on and uses the very same SparkSubmitOptionParser parser to parse command-line arguments).

Table 1. SparkSubmitCommandBuilder Properties and Corresponding SparkSubmitOptionParser Attributes
SparkSubmitCommandBuilder Property SparkSubmitOptionParser Attribute




MASTER [master]


DEPLOY_MODE [deployMode]


NAME [appName]


CONF [key=value]*


PROPERTIES_FILE [propertiesFile]


JARS [comma-separated jars]


FILES [comma-separated files]


PY_FILES [comma-separated pyFiles]


CLASS [mainClass]


sparkArgs (passed straight through)


appResource (passed straight through)


appArgs (passed straight through)

getEffectiveConfig Internal Method

Map<String, String> getEffectiveConfig()

getEffectiveConfig internal method builds effectiveConfig that is conf with the Spark properties file loaded (using loadPropertiesFile internal method) skipping keys that have already been loaded (it happened when the command-line options were parsed in handle method).

Command-line options (e.g. --driver-class-path) have higher precedence than their corresponding Spark settings in a Spark properties file (e.g. spark.driver.extraClassPath). You can therefore control the final settings by overriding Spark settings on command line using the command-line options. charset and trims white spaces around values.

isClientMode Internal Method

private boolean isClientMode(Map<String, String> userProps)

isClientMode checks master first (from the command-line options) and then spark.master Spark property. Same with deployMode and spark.submit.deployMode.

FIXME Review master and deployMode. How are they set?

isClientMode responds positive when no explicit master and client deploy mode set explicitly.


OptionParser is a custom SparkSubmitOptionParser that SparkSubmitCommandBuilder uses to parse command-line arguments. It defines all the SparkSubmitOptionParser callbacks, i.e. handle, handleUnknown, and handleExtraArgs, for command-line argument handling.

OptionParser’s handle Callback

boolean handle(String opt, String value)

OptionParser comes with a custom handle callback (from the SparkSubmitOptionParser callbacks).

Table 2. handle Method
Command-Line Option Property / Behaviour








Sets spark.driver.memory (in conf)


Sets spark.driver.extraJavaOptions (in conf)


Sets spark.driver.extraLibraryPath (in conf)


Sets spark.driver.extraClassPath (in conf)


Expects a key=value pair that it puts in conf


Sets mainClass (in conf).

It may also set allowsMixedArguments and appResource if the execution is for one of the special classes, i.e. spark-shell, SparkSQLCLIDriver, or HiveThriftServer2.

--kill | --status

Disables isAppResourceReq and adds itself with the value to sparkArgs.

--help | --usage-error

Disables isAppResourceReq and adds itself to sparkArgs.


Disables isAppResourceReq and adds itself to sparkArgs.

anything else

Adds an element to sparkArgs

OptionParser’s handleUnknown Method

boolean handleUnknown(String opt)

If allowsMixedArguments is enabled, handleUnknown simply adds the input opt to appArgs and allows for further parsing of the argument list.

FIXME Where’s allowsMixedArguments enabled?

If isExample is enabled, handleUnknown sets mainClass to be org.apache.spark.examples.[opt] (unless the input opt has already the package prefix) and stops further parsing of the argument list.

FIXME Where’s isExample enabled?

Otherwise, handleUnknown sets appResource and stops further parsing of the argument list.

OptionParser’s handleExtraArgs Method

void handleExtraArgs(List<String> extra)

handleExtraArgs adds all the extra arguments to appArgs.