-
Notifications
You must be signed in to change notification settings - Fork 37
Livy Instructions
Livy is a RESTful interface for Spark. It be used to communicate with clusters without requiring Spark to be installed on the user's machine, which is our main use-case for it. There are some caveats, for example Livy crashes when running tasks in Tellurium. It's not clear if this is a bug in Livy or Tellurium at this point.
The instructions below were provided by @ShaikAsifullah.
Contact Kyle or Shaik to get details regarding the current deployment. The examples below use a dummy IP.
LivyHome is at /home/shaik/Downloads/livy-server-0.3.0
$LIVY_HOME/bin/livy-server start
starts the Livy Server and $LIVY_HOME/bin/livy-server stop
Stops the Livy Server
##What Configurations does Livy need to connect to spark?
In $LIVY_HOME/conf/livy-env.sh
we have SPARK_HOME provided. In $LIVY_HOME/conf/livy.conf
we can configure port and host and deploy mode.
In the logs $LIVY_HOME/logs/livy-shaik-server.out
it says 17/08/06 03:35:56 INFO ContextLauncher: 17/08/06 03:35:56 ERROR PythonInterpreter: Process has died with 1
Well, it’s still not easy and user-friendly regarding the communication protocol so as to run spark jobs on our cluster. Here is how the basic things work
- In order to run Pyspark job, using Livy – we need to create a session first and below is how it can get created
curl -X POST --data '{"kind": "pyspark"}' -H "Content-Type: application/json" http://192.168.1.1:5555/sessions
- Find the session id that just got created. You can find it using this --
curl http://192.168.1.1:5555/sessions/
- For the first time, it would give the session-ID as 0 and it increases by 1 every time a new session is created
- Say, we just want to calculate the value of 1+1, this is how it can be done in Livy (assuming session-ID is 0)
curl http://192.168.1.1:5555/sessions/0/statements -X POST -H 'Content-Type: application/json' -d '{"code":"1 + 1"}'
Yes, we can do that and here is the sample function in the link. Wrapper (without Livy)
- The wrapper is built using fabric and paramiko modules, this is not completely ready yet as we no longer need paramiko and the work can still be done using fabric alone.
- Here is the link for Wrapper which every user(client) needs to install.
- Here is how, the user can ship his code to the cluster and run his code and fetch the results
Component-1
The client has to write a normal script as he would write in normal Zeppelin node with just one minor change For every additional file that he wants to ship to cluster, he has to append the filename with “file:” See this example. In the example above, since we need to ship the SBML file (line 5) too, we mentioned it as SBML = "file:huang-ferrell-96.xml"
Component-2
The SBML/antimony file that we need to send
Component-3
Additional Helper Python Files for your simulation, in the example above we have used CustomSimulator in our custom_simulator.py file and since we need to ship that too, we need to have it written as file:FILENAME What to be done at server level ? For every client that needs to get added to the cluster
- Create a user for the client on the server and have his home directory in place
- Get the public Key from the client and add it to the authorized_keys or provide him a password for his user
- In the home directory for every new user, we have a new directory named “remote”
- In that directory, there should be default python file in place.
- The python modifies the runnable script provided by the client in order for it to run.
- Each client will have fixed number of cores to be utilized from the available cores. Definitely the client cannot overuse them
What to be done at Client Side ?
- Every Client who needs to run his simulations on the server, should get permission first so as to access it.
- Install necessary Python modules like (fabric, WrapperModule etc.. )
- And that’s it.