-
Notifications
You must be signed in to change notification settings - Fork 4
Introduction to Bliss
NOTE: Please also have a look at the Known Problems And Solutions page for a list of common problems and pitfalls.
One of the most important feature of Bliss is the capability to submit jobs to local and remote queueing systems and resource managers. This first example explains how.
The job submission and management capabilities of Bliss are packaged in the bliss.saga.job module (API Doc). Three classes are defined in this module:
- The job.Service class (API Doc) provides a handle to the resource manager, like for example a remote PBS cluster.
- The job.Description class (API Doc) is used to describe the executable, arguments, environment and requirements (e.g., number of cores, etc) of a new job.
- The job.Job class (API Doc) is a handle to a job associated with a job.Service. It is used to control (start, stop) the job and query its status (e.g., Running, Finished, etc).
In order to use the Bliss Job API, we first need to import the Bliss module:
import bliss.saga as saga
Next, we create a job service object that represents a local or cluster resource. The job service takes a single URL as parameter. The URL parameter is passed to Bliss' plug-in mechanism and based on the URL scheme, a specific plug-in is selected to connect to the specified location. The URL is a way to tell Bliss what type of queueing system or middleware you want to use and where it is. For example:
js = saga.job.Service("pbs+ssh://india.futuregrid.org")
will tell Bliss to use the PBS over SSH* plug-in to connect a remote PBS cluster that runs on india.futuregrid.org. The latest version of Bliss supports the following plug-ins:
- fork://localhost Connects to a pseudo job.Service on the local machine: jobs submitted to a job.Service object instantiated with a fork://localhost URL will execute on the local machine.
- ssh://hostname Connects to a pseudo job.Service on a remote machine via SSH: jobs submitted to a job.Service object instantiated with an ssh://hostname URL will execute on the remote machine via the login shell.
- pbs://localhost Connects to a PBS cluster on the local machine: jobs submitted to a job.Service object instantiated with a pbs://localhost URL will be submitted to the queue specified in the job.Description
- pbs+ssh://hostname Connects to a remote PBS cluster via SSH: jobs submitted to a job.Service object instantiated with a pbs+ssh://hostname URL will be submitted to the queue specified in the job.Description
- sge://localhost Connects to a Sun Grid Engine (SGE) cluster on the local machine: jobs submitted to a job.Service object instantiated with a sge://localhost URL will be submitted to the queue specified in the job.Description
- sge+ssh://hostname Connects to a remote SGE cluster via SSH: jobs submitted to a job.Service object instantiated with a sge+ssh://hostname URL will be submitted to the queue specified in the job.Description
Once the job.Service object has been created, it can be used to create and start new jobs. To define a new job, a job.Description object needs to be created that contains information about the executable we want to run, the arguments that we need to passed to it, the environment that needs to be set and what requirements we have for our job. Here's an example:
jd = saga.job.Description()
# requirements
jd.queue = "development"
jd.wall_time_limit = 1 # minutes
# environment, executable & arguments
jd.environment = {'MYOUTPUT':'"Hello from Bliss"'}
jd.executable = '/bin/echo'
jd.arguments = ['$MYOUTPUT']
# output options
jd.output = "myjob.stdout"
jd.error = "myjob.01.stderr"
It is generally a good idea to use proper error handling. Working in a distributed environment will inevitably cause problems due to the unreliable nature of distributed resources: a cluster might be down, a network link faulty, etc. Writing code that is capable of catching these errors and maybe even dynamically react to them is crucial for the development of a usable distributed application or system.
Bliss provides a simple exception mechanism that is triggered every time an error is discovered on plug-in level. Bliss calls should hence be always wrapped in a try block:
try:
# bliss call(s)
except saga.Exception, ex:
print ex
For debugging purposes, Bliss provides a logging mechanism that can be enabled by setting the environment variable SAGA_VERBOSE
to a value between 1: (less verbose) and 5: (very verbose). In very verbose mode, Bliss produces a large amount of log messages concerning the internals of the currently active plug-in. Sometimes, if an error is not propagated properly to the application via an exception, examining the logs can be helpful to figure out what went wrong.
SAGA_VERBOSE=5 python myblissprog.py
Many of Bliss' plug-ins, like the PBS and SGE plug-ins, provide middleware access tunneled over SSH. For security reasons, Bliss (just like the SSH command-line utility) doesn't provide an option for hardcoded passwords.
In order to use plug-ins that allow ssh-tunnelng (xyz_+ssh_://), it is hence necessary to set-up password-less ssh-keychain access to the remote hosts you want to use. Otherwise, you will end-up with error messages like:
bliss.SSHJobPlugin(0x102054320) - ERROR - Couldn't run job because: Private key file is encrypted
or
bliss.PBSJobPlugin(0x10ebdacb0) - ERROR - Couldn't run job because: Permission denied (publickey,hostbased).
Most systems should come with keychain already installed. If not, a simple yum install keychain
(RedHat-based systems), apt-get install keychain
(Debian-based systems) and brew install keychain
(MacOS X via Homebrew) should do the trick.
If you're not familiar with SSH keys and authentication mechanisms at all, please refer to this tutorial for an introduction.
Assuming you have your public/private key-pair stored in $HOME/.ssh/id_rsa
, the following command will ask you for your ssh-key's password and add your key to the ssh-agent for subsequent password-less use:
$> keychain $HOME/.ssh/id_rsa
* keychain 2.7.1 ~ http://www.funtoo.org
* Found existing ssh-agent: 4175
* Adding 1 ssh key(s): /Users/oweidner/.ssh/id_rsa
Enter passphrase for /Users/oweidner/.ssh/id_rsa:
* ssh-add: Identities added: /Users/oweidner/.ssh/id_rsa
In order to use this identity, you simply source it into your environment:
$> source ~/.keychain/<your-hostname>-sh