-
Notifications
You must be signed in to change notification settings - Fork 35
Jupyter
The Kotlin Spark API also supports Kotlin Jupyter notebooks. To it, simply add
%use spark
to the top of your notebook. This will get the latest version of the API, together with the latest version of Spark. To define a certain version of Spark or the API itself, simply add it like this:
%use spark(spark=3.3.0, scala=2.13, v=1.1.0)
Other arguments you can pass in this %use
magic include displayLimit
and displayTruncate
:
%use spark(displayLimit=30, displayTruncate=-1)
As well as any Spark property you like:
%use spark(spark.app.name=MyApp, spark.master=local[*])
Inside the notebook a Spark session will be initiated automatically. This can be accessed via the spark
value.
sc: JavaSparkContext
can also be accessed directly. The API operates pretty similarly.
There is also support for HTML rendering of Datasets and simple (Java)RDDs.
The looks of these renders can be adjusted by setting either sparkProperties.displayTruncate
(which adjusts the number of characters per cell) or sparkProperties.displayLimit
(which adjusts the number of rows per table).
To use Spark Streaming abilities, instead use
%use spark-streaming
This does not start a Spark session right away, meaning you can call withSparkStreaming(batchDuration) {}
in whichever cell you want.
Check out the example.
If a running stream is interrupted by Jupyter, an attempt will be made to close the stream itself so no Spark session will remain running in the background.
NOTE: You need kotlin-jupyter-kernel
to be at least version 0.11.0.140 for the Kotlin Spark API to work. Also, if the %use spark
magic does not output "Spark session has been started...", and %use spark-streaming
doesn't work at all, add %useLatestDescriptors
above it.