Skip to content
Kyle Medley edited this page Aug 7, 2017 · 3 revisions

Using Apache Livy with Tellurium

Livy is a RESTful interface for Spark. It be used to communicate with clusters without requiring Spark to be installed on the user's machine, which is our main use-case for it. There are some caveats, for example Livy crashes when running tasks in Tellurium. It's not clear if this is a bug in Livy or Tellurium at this point.

The instructions below were provided by @ShaikAsifullah.

Where is Livy Server currently running?

Contact Kyle or Shaik to get details regarding the current deployment. The examples below use a dummy IP.

Where is it located in the master node?

LivyHome is at /home/shaik/Downloads/livy-server-0.3.0

How to start/stop the Livy Server?

$LIVY_HOME/bin/livy-server start starts the Livy Server and $LIVY_HOME/bin/livy-server stop Stops the Livy Server

##What Configurations does Livy need to connect to spark? In $LIVY_HOME/conf/livy-env.sh we have SPARK_HOME provided. In $LIVY_HOME/conf/livy.conf we can configure port and host and deploy mode.

What is the current problem with Livy?

In the logs $LIVY_HOME/logs/livy-shaik-server.out it says 17/08/06 03:35:56 INFO ContextLauncher: 17/08/06 03:35:56 ERROR PythonInterpreter: Process has died with 1

How can we communicate with Livy?

Well, it’s still not easy and user-friendly regarding the communication protocol so as to run spark jobs on our cluster. Here is how the basic things work

  1. In order to run Pyspark job, using Livy – we need to create a session first and below is how it can get created curl -X POST --data '{"kind": "pyspark"}' -H "Content-Type: application/json" http://192.168.1.1:5555/sessions
  2. Find the session id that just got created. You can find it using this -- curl http://192.168.1.1:5555/sessions/
  3. For the first time, it would give the session-ID as 0 and it increases by 1 every time a new session is created
  4. Say, we just want to calculate the value of 1+1, this is how it can be done in Livy (assuming session-ID is 0) curl http://192.168.1.1:5555/sessions/0/statements -X POST -H 'Content-Type: application/json' -d '{"code":"1 + 1"}'

Can we prettify the whole thing and make it more user-friendly using some wrapper?

Yes, we can do that and here is the sample function in the link. Wrapper (without Livy)

  1. The wrapper is built using fabric and paramiko modules, this is not completely ready yet as we no longer need paramiko and the work can still be done using fabric alone.
  2. Here is the link for Wrapper which every user(client) needs to install.
  3. Here is how, the user can ship his code to the cluster and run his code and fetch the results

Components Involved

Component-1

The client has to write a normal script as he would write in normal Zeppelin node with just one minor change For every additional file that he wants to ship to cluster, he has to append the filename with “file:” See this example. In the example above, since we need to ship the SBML file (line 5) too, we mentioned it as SBML = "file:huang-ferrell-96.xml"

Component-2

The SBML/antimony file that we need to send

Component-3

Additional Helper Python Files for your simulation, in the example above we have used CustomSimulator in our custom_simulator.py file and since we need to ship that too, we need to have it written as file:FILENAME What to be done at server level ? For every client that needs to get added to the cluster

  1. Create a user for the client on the server and have his home directory in place
  2. Get the public Key from the client and add it to the authorized_keys or provide him a password for his user
  3. In the home directory for every new user, we have a new directory named “remote”
  4. In that directory, there should be default python file in place.
  5. The python modifies the runnable script provided by the client in order for it to run.
  6. Each client will have fixed number of cores to be utilized from the available cores. Definitely the client cannot overuse them

What to be done at Client Side ?

  1. Every Client who needs to run his simulations on the server, should get permission first so as to access it.
  2. Install necessary Python modules like (fabric, WrapperModule etc.. )
  3. And that’s it.