-
Notifications
You must be signed in to change notification settings - Fork 8
BigJob Tutorial Part 3: Simple Ensemble Example
This page is part of the BigJob Tutorial.
You might be wondering how to create your own BigJob script or how BigJob can be useful for your needs.
The first example, below, submits N jobs using BigJob. This is very useful if you are running many jobs using the same executable. Rather than submit each job individually to the queuing system and then wait for every job to become active and complete, you submit just one 'Big' job that reserves the number of cores needed to run all of your jobs. When this BigJob becomes active, your jobs are pulled by BigJob from the Redis server and executed.
The below examples demonstrates the mapping of a simple job (i.e. executable is /bin/echo) using all of the parameters of a Compute Unit Description.
One of the features of BigJob is the ability for application-level programmability by users. Many of the parameters in each script are customizable and configurable. For the purposes of this tutorial, we would like to draw your attention to a few important parameters that may prevent this script from running if not modified. For a more robust understanding of the configurable parameters, please view the API documentation.
The code below uses fork://localhost
as the service_url
. The service URL communicates what type of queueing system or middleware you want to use and where it is. localhost
can be changed to a machine-specific URL, for example: sge://lonestar.tacc.utexas.edu
. The following table explains the supported middleware on XSEDE and FutureGrid. Note: You WILL have to edit the examples for your personal middleware or queueing system.
Supported Adaptors | Description | Information |
---|---|---|
fork | Submit jobs only on localhost head node. Password less login to localhost is required. Example usage: fork://localhost | |
SSH | Submit jobs on target machine's head node. Password-less login to target machine is required. Example usage: ssh://eric1.loni.org | Allows to submit jobs to a remote host via SSH |
PBS | Submit jobs to target machine's scheduling system. Password-less login to target machine is required. Example usage: Remote (over SSH): pbs+ssh://eric1.loni.org or Local: pbs://localhost | Interfaces with a PBS, PBS Pro or TORQUE scheduler locally or remotely via SSH |
SGE | Submit jobs to target machine's scheduling system. Password-less login to target machine is required. Example usage: Remote (over SSH): sge+ssh://lonestar.tacc.utexas.edu or Local: sge://localhost | Interfaces with a Sun Grid Engine (SGE) scheduler locally or remotely via SSH |
GRAM | Uses Globus to submit jobs. Globus certificates are required. Initiate grid proxy (myproxy-logon) before executing the BigJob application. Example usage of URL gram://gatekeeper.ranger.tacc.teragrid.org:2119/jobmanager-sge | Please find the globus resource URLs of XSEDE machines at https://www.xsede.org/wwwteragrid/archive/web/user-support/gram-gatekeepers-gateway.html |
Torque+GSISSH | Submit jobs using gsissh. Globus certificates are required. Initiate grid proxy (myproxy-logon) before executing the BigJob application. Example usage of URL: xt5torque+gsissh://gsissh.kraken.nics.xsede.org | Please find the GSISSH resource URLs of XSEDE machines at https://www.xsede.org/wwwteragrid/archive/web/user-support/gram-gatekeepers-gateway.html |
When using these scripts on XSEDE, The allocation parameter must be changed from XSEDE-SAGA
to your project's allocation number. This parameter may not be necessary if you are using your local cluster.
This refers to the number of cores used. If your machine does not have 12 cores per node, you will have to change this parameter. For example, if you are using your laptop, number of processes might be 2 or 4.
This refers to the name of the queue on the submission machine. This may not be necessary for your local laptop, but a machine such as Lonestar has different queues within SGE. You must specify if you wish to submit to the "development" queue or some other queue.
In your $HOME directory, open a new file simple_ensembles.py with your favorite editor (e.g., vim) and paste the following content:
import os
import time
import sys
from pilot import PilotComputeService, ComputeDataService, State
### This is the number of jobs you want to run
NUMBER_JOBS=4
COORDINATION_URL = "redis://localhost"
if __name__ == "__main__":
pilot_compute_service = PilotComputeService(COORDINATION_URL)
pilot_compute_description = { "service_url": "fork://localhost",
"number_of_processes": 12,
"allocation": "XSEDE12-SAGA",
"queue": "development",
"working_directory": os.getenv("HOME")+"/agent",
"walltime":10
}
pilot_compute_service.create_pilot(pilot_compute_description=pilot_compute_description)
compute_data_service = ComputeDataService()
compute_data_service.add_pilot_compute_service(pilot_compute_service)
print ("Finished Pilot-Job setup. Submitting compute units")
# submit compute units
for i in range(NUMBER_JOBS):
compute_unit_description = {
"executable": "/bin/echo",
"arguments": ["Hello","$ENV1","$ENV2"],
"environment": ['ENV1=env_arg1','ENV2=env_arg2'],
"number_of_processes": 1,
"spmd_variation":"mpi",
"output": "stdout.txt",
"error": "stderr.txt",
}
compute_data_service.submit_compute_unit(compute_unit_description)
print ("Waiting for compute units to complete")
compute_data_service.wait()
print ("Terminate Pilot Jobs")
compute_data_service.cancel()
pilot_compute_service.cancel()
Execute the script using command
python simple_ensembles.py
Where is my output?
Go into the working directory (in this case, $HOME/agent). You should see a directory named after the pilot-service that starts with bj- and is followed by a unique identifier for that BigJob. If you cd
into that directory, you will see the compute unit directories. These directories start with sj- and are followed by a unique identifier. If you cd
into one of these directories, you will find a stdout.txt and stderr.txt file. stdout.txt should contain the results of the /bin/echo job. Please note that the names of stdout and stderr are configurable in the ComputeUnitDescription.
Back: [Tutorial Home](BigJob Tutorial) Next: BigJob Tutorial Part 4: Mandelbrot Example