-
Notifications
You must be signed in to change notification settings - Fork 8
XSEDE Tutorial Part 2: SAGA
This page contains material for part 2 of the XSEDE 2012 Tutorial. It covers SAGA, the Simple API for Grid Applications, how it is installed and used to submit jobs to and transfer files across XSEDE resources.
Direct Link to the SAGA slides
In this hands-on session, we will work with a SAGA Python implementation called BlisS (from here on simply referred to as SAGA). SAGA is a light-weight Python package that implements parts of the OGF GFD.90 SAGA interface specification and provides plug-ins for different distributed middleware systems and services. SAGA-Bliss implements the most commonly used features of GFD.90 (job submission, file handling), and focuses on usability and simple deployment in real-world heterogeneous distributed computing environments and application scenarios.
SAGA currently provides support for the following backends:
- SSH - Allows job execution on remote hosts via SSH.
- PBS(+SSH) - (includes TORQUE). Provides local and remote access to PBS/Torque clusters.
- SGE(+SSH) - Provides local and remote access to Sun (Orcale) Grid Engine clusters.
- SFTP - Provides remote filesystem access via the SFTP protocol.
More details can be found on the SAGA Plugins page.
Additional information about SAGA can be found on the website: http://saga-project.github.com/bliss/. A comprehensive API documentation is available at http://saga-project.github.com/bliss/apidoc/.
Usernames and password for the tutorial accounts will be handed out at the beginning of the hands-on session.
Depending on your operating system:
-
On Linux/Unix/Mac please use ssh:
ssh <your username>@lonestar.tacc.utexas.edu
-
On windows please download putty and connect to host: lonestar.tacc.utexas.edu with your username
Once you're logged-in, run the following command to make sure that ssh-connections to localhost work:
ssh localhost /bin/hostname
Lonestar accounts use bash as a default shell. There are two login nodes: login1 and login2 that are visible to users. Your $HOME
, $WORK
and $SCRATCH
directories are shared across all compute and login nodes. /tmp
is not. Please use $HOME
or $WORK
for all your work. For larger simulations (more than 2k cores) please use $SCRATCH
. More information is available at the lonestar user guide.
The user environment on lonestar can be easily controlled through modules. Use modules to load and unload software environment variables. To see currently loaded modules you can issue the command:
module list
To search for a particular module or package, use module spider <package name>
.
Useful commands/aliases in the lonestar user environment include cdw
to change directories to the $WORK
directory and cds
to change directories to the $SCRATCH directory
First you have to load the Python module into your environment on Lonestar. The default Python module on Lonestar is Python 2.7, which is fairly new and works well with both, SAGA-BliSS and BigJob. To load the module, run the following command:
module load python
Next, you need a place were you can install software locally, so you can install the latest versions of SAGA-BlisS and BigJob. A small tool called virtualenv allows you to create a local Python software repository that behaves exactly like the global Python repository, with the only difference that you have write access to it. To create your local Python environment run the following command (it downloads virtualenv with curl
and then runs it via python
):
curl --insecure -s https://raw.github.com/pypa/virtualenv/master/virtualenv.py | python - $HOME/tutorial
You need to activate your Python environment in order to make it work. Run the command below. It will temporarily modify your PYTHONPATH
so that it points to $HOME/tutorial/lib/python2.7/site-packages/
instead of the the system site-package directory:
source $HOME/tutorial/bin/activate
Activating the virtualenv is very important. If you don't activate your virtual Python environment, the rest of this tutorial will not work. You can usually tell that your environment is activated properly if your bash command-line prompt starts with (tutorial)
.
The last step is to add your newly created virtualenv to your $HOME/.bashrc
so that any batch jobs that you submit have the same Python environment as you have on the login-node. Add the following line at the end of your $HOME/.bashrc
file:
module load python
source $HOME/tutorial/bin/activate
Next, open $HOME/.profile
and add the following line:
source .bashrc
The latest SAGA-BlisS Python module is available via the Python Package Index (PyPi). PyPi packages can be installed similar to Linux deb or rpm packages with a tool called pip (which stands for pip installs packages). Pip is installed by default in your virtualenv, so in order to install SAGA-BlisS, the only thing you have to do is this:
pip install bliss
You will see some downloading and unpacking action and if everything worked ok, the last two lines should look like this:
Successfully installed bliss paramiko-on-pypi pycrypto-on-pypi
Cleaning up...
To make sure that your installation works, run the following command to check if the SAGA-BlisS module can be imported by the interpreter (the output should be version number of the bliss module):
python -c "import bliss; print bliss.version"
One of the most important feature of Bliss is the capability to submit jobs to local and remote queueing systems and resource managers.
The job submission and management capabilities of Bliss are packaged in the bliss.saga.job module (API Doc). Three classes are defined in this module:
- The job.Service class (API Doc) provides a handle to the resource manager, like for example a remote PBS cluster.
- The job.Description class (API Doc) is used to describe the executable, arguments, environment and requirements (e.g., number of cores, etc) of a new job.
- The job.Job class (API Doc) is a handle to a job associated with a job.Service. It is used to control (start, stop) the job and query its status (e.g., Running, Finished, etc).
In order to use the Bliss Job API in your scripts, we first need to import the Bliss module:
import bliss.saga as saga
Next, we create a job service object that represents a local or cluster resource. The job service takes a single URL as parameter. The URL parameter is passed to Bliss' plug-in mechanism and based on the URL scheme, a specific plug-in is selected to connect to the specified location. The URL is a way to tell Bliss what type of queueing system or middleware you want to use and where it is. For example:
js = saga.job.Service("sge://localhost")
will tell Bliss to use the SGE plug-in to connect a local PBS cluster. In our case, this the SGE queuing system running on lonestar.
Once the job.Service object has been created, it can be used to create and start new jobs. To define a new job, a job.Description object needs to be created that contains information about the executable we want to run, the arguments that we need to passed to it, the environment that needs to be set and what requirements we have for our job. Here's an example:
jd = saga.job.Description()
# requirements
jd.queue = "development"
jd.wall_time_limit = 1 # minutes
# environment, executable & arguments
jd.environment = {'MYOUTPUT':'"Hello from Bliss"'}
jd.executable = '/bin/echo'
jd.arguments = ['$MYOUTPUT']
# output options
jd.output = "my1stjob.stdout"
jd.error = "my1stjob.stderr"
Now that you have understood the basics of a SAGA job, let's try to run the whole thing! In your $HOME
directory, open a new file saga_example_1.py
with your favorite editor (e.g., vim
) and paste the following content:
import sys
import bliss.saga as saga
def main():
try:
# create a job service for lonestar
js = saga.job.Service("sge://localhost")
# describe our job
jd = saga.job.Description()
jd.queue = "development"
jd.wall_time_limit = 1 # minutes
jd.environment = {'MYOUTPUT':'"Hello from Bliss"'}
jd.executable = '/bin/echo'
jd.arguments = ['$MYOUTPUT']
jd.output = "my1stjob.stdout"
jd.error = "my1stjob.stderr"
# create the job (state: New)
myjob = js.create_job(jd)
print "Job ID : %s" % (myjob.jobid)
print "Job State : %s" % (myjob.get_state())
print "\n...starting job...\n"
# run the job
myjob.run()
print "Job ID : %s" % (myjob.jobid)
print "Job State : %s" % (myjob.get_state())
print "\n...waiting for job...\n"
# wait for the job to either finish or fail
myjob.wait()
print "Job State : %s" % (myjob.get_state())
print "Exitcode : %s" % (myjob.exitcode)
except saga.Exception, ex:
print "An error occured during job execution: %s" % (str(ex))
sys.exit(-1)
if __name__ == "__main__":
main()
Save the file and execute it via the python interpreter (make sure your virtualenv is activated):
python saga_example_1.py
The output should look something like this:
Job ID : [sge://localhost]-[None]
Job State : saga.job.Job.New
...starting job...
Job ID : [sge://localhost]-[644240]
Job State : saga.job.Job.Pending
...waiting for job...
Job State : saga.job.Job.Done
Exitcode : None
Once the job has completed, you can have a look at the output file my1stjob.stdout
.
Another important feature of SAGA is its (remote) file and directory handling capabilities. These capabilities are packaged in the bliss.saga.filesystem module (API Doc). Two main classes are defined in this module:
- The filesystem.File class (API Doc) provides a handle to a (remote) file.
- The filesystem.Directory class (API Doc) provides a handle to a (remote) directory.
Together, these two classes can be used to traverse and modify local and remote filesystems. Currently (v. 0.2.4), SAGA supports the SFTP protocol, but other protocol plug-ins are under development.
NOTE: For security reasons, SAGA does not support SSH authentication via plain username/password. In order to use the sftp plug-in with remote machines, it is hence necessary to set-up public-key-based ssh-keychain access to the remote hosts you want to use. For this tutorial, we use localhost
for the sake of simplicity.
In your $HOME
directory, open a new file saga_example_2.py
with your favorite editor (e.g., vim
) and paste the following content:
import os, sys, getpass
import bliss.saga as saga
def main():
try:
# create a new subdirectory in /tmp/
tmp = saga.filesystem.Directory("sftp://localhost/tmp")
mydir = tmp.open_dir(getpass.getuser(), saga.filesystem.Create)
# copy this python file to the newly created directory
thisfile = saga.filesystem.File("sftp://localhost/"+os.path.abspath(__file__))
thisfile.copy(str(mydir.get_url()))
# copy another file
motdfile = saga.filesystem.File("sftp://localhost/etc/motd")
motdfile.copy(mydir.get_url())
# list the directory content
for entry in mydir.list():
file = saga.filesystem.File(str(mydir.get_url())+"/"+entry)
print "%s (%s bytes)" % (file.get_url(), file.get_size())
# finally, remove the directory
mydir.remove()
except saga.Exception, ex:
print "An error occured during file operation: %s" % (str(ex))
sys.exit(-1)
if __name__ == "__main__":
main()
Save the file and execute it via the python interpreter (make sure your virtualenv is activated):
python saga_example_2.py
The output should look something like this:
sftp://localhost/tmp/train115//saga_example_2.py (1029 bytes)
sftp://localhost/tmp/train115//motd (1758 bytes)
In this example, we split up the calculation of a Mandelbrot set into several tiles, submit a job for each tile using the SAGA Job API, retrieve the tiles using the SAGA File API and stitch together the final image from the individual tiles. This example shows how SAGA can be used to create more complex application workflows that involve multiple aspects of the API.
In order for this example to work, we need to install an additional Python module, the Python Image Library (PIL). This is done via pip:
pip install PIL
Next, we need to download the Mandelbrot fractal generator itself. It is really just a very simple python script that, if invoked on the command line, outputs a full or part of a Mandelbrot fractal as a PNG image. Download the script into your $HOME
directory:
curl --insecure -Os https://raw.github.com/saga-project/bliss/master/examples/advanced/mandelbrot/mandelbrot.py
You can give mandelbrot.py a test-drive locally by calculating a single-tiled 1024x1024 Mandelbrot fractal:
python mandelbrot.py 1024 1024 0 1024 0 1024 frac.png
In your $HOME
directory, open a new file saga_mandelbrot.py
with your favorite editor (e.g., vim
) and paste the following content:
import sys, time, os
import bliss.saga as saga
from PIL import Image
# the dimension (in pixel) of the whole fractal
imgx = 8192
imgy = 8192
# the number of tiles in X and Y direction
tilesx = 2
tilesy = 2
if __name__ == "__main__":
try:
# list that holds the jobs
jobs = []
# create a working directory in /scratch
dirname = 'sftp://localhost/%s/mbrot/' % os.getenv('SCRATCH')
workdir = saga.filesystem.Directory(dirname, saga.filesystem.Create)
# copy the executable into our working directory
mbexe = saga.filesystem.File('sftp://localhost/%s/mandelbrot.py' % os.getcwd())
mbexe.copy(workdir.get_url())
# the saga job services connects to and provides a handle
# to a remote machine. In this case. it's a PBS cluster:
jobservice = saga.job.Service('sge://localhost')
for x in range(0, tilesx):
for y in range(0, tilesy):
# describe a single Mandelbrot job. we're using the
# directory created above as the job's working directory
outputfile = 'tile_x%s_y%s.png' % (x,y)
jd = saga.job.Description()
jd.queue = "development"
jd.wall_time_limit = 10
jd.total_cpu_count = 1
jd.working_directory = workdir.get_url().path
jd.executable = 'python'
jd.arguments = ['mandelbrot.py', imgx, imgy,
(imgx/tilesx*x), (imgx/tilesx*(x+1)),
(imgy/tilesy*y), (imgy/tilesy*(y+1)),
outputfile]
# create the job from the description
# above, launch it and add it to the list of jobs
job = jobservice.create_job(jd)
job.run()
jobs.append(job)
print ' * Submitted %s. Output will be written to: %s' % (job.jobid, outputfile)
# wait for all jobs to finish
while len(jobs) > 0:
for job in jobs:
jobstate = job.get_state()
print ' * Job %s status: %s' % (job.jobid, jobstate)
if jobstate is saga.job.Job.Done:
jobs.remove(job)
time.sleep(5)
# copy image tiles back to our 'local' directory
for image in workdir.list('*.png'):
print ' * Copying %s/%s back to %s' % (workdir.get_url(), image, os.getcwd())
workdir.copy(image, 'sftp://localhost/%s/' % os.getcwd())
# stitch together the final image
fullimage = Image.new('RGB',(imgx, imgy),(255,255,255))
print ' * Stitching together the whole fractal: mandelbrot_full.png'
for x in range(0, tilesx):
for y in range(0, tilesy):
partimage = Image.open('tile_x%s_y%s.png' % (x, y))
fullimage.paste(partimage, (imgx/tilesx*x, imgy/tilesy*y, imgx/tilesx*(x+1), imgy/tilesy*(y+1)) )
fullimage.save("mandelbrot_full.png", "PNG")
sys.exit(0)
except saga.Exception, ex:
print 'Problem during execution: %s' % ex
sys.exit(-1)
except KeyboardInterrupt:
# ctrl-c caught: try to cancel our jobs before we exit
# the program, otherwise we'll end up with lingering jobs.
for job in jobs:
job.cancel()
sys.exit(-1)
Save the file and execute it via the python interpreter (once again: make sure your virtualenv is activated and that your $HOME/.profile is prepared as described in section 2):
python saga_mandelbrot.py
The output should look something like this:
* Submitted [sge://localhost]-[652593]. Output will be written to: tile_x0_y0.png
* Submitted [sge://localhost]-[652594]. Output will be written to: tile_x0_y1.png
* Submitted [sge://localhost]-[652595]. Output will be written to: tile_x1_y0.png
* Submitted [sge://localhost]-[652596]. Output will be written to: tile_x1_y1.png
* Job [sge://localhost]-[652593] status: saga.job.Job.Pending
* Job [sge://localhost]-[652594] status: saga.job.Job.Pending
* Job [sge://localhost]-[652595] status: saga.job.Job.Pending
* Job [sge://localhost]-[652596] status: saga.job.Job.Pending
* Job [sge://localhost]-[652593] status: saga.job.Job.Running
* Job [sge://localhost]-[652594] status: saga.job.Job.Running
* Job [sge://localhost]-[652595] status: saga.job.Job.Running
* Job [sge://localhost]-[652596] status: saga.job.Job.Running
...
* Job [sge://localhost]-[652593] status: saga.job.Job.Done
* Job [sge://localhost]-[652594] status: saga.job.Job.Done
* Job [sge://localhost]-[652595] status: saga.job.Job.Done
* Job [sge://localhost]-[652596] status: saga.job.Job.Done
* Copying sftp://localhost//scratch/0000/train115/mbrot//tile_x0_y0.png back to /home1/0000/train115
* Copying sftp://localhost//scratch/0000/train115/mbrot//tile_x0_y1.png back to /home1/0000/train115
* Copying sftp://localhost//scratch/0000/train115/mbrot//tile_x1_y1.png back to /home1/0000/train115
* Copying sftp://localhost//scratch/0000/train115/mbrot//tile_x1_y0.png back to /home1/0000/train115
* Stitching together the whole fractal: mandelbrot_full.png
Last but not least, you can transfer the final PNG image (mandelbrot_full.png) to your local machine to have a look at it. The easiest way to do this is via scp
. If you are working on Linux or MacOS, execute the following command on your local machine:
scp <your_lonestar_username>@lonestar.tacc.utexas.edu:mandelbrot_full.png .
If you are using Windows, you will have to use a third-party tool, like for example WinSCP in order to transfer the file. Use your image viewer of choice to display it.