Skip to content

BigJob Tutorial Part 3: Simple Ensemble Example

pradeepmantha edited this page Aug 22, 2012 · 16 revisions

This page is part of the BigJob Tutorial.

Overview

The below example submits N jobs using SAGA Pilot-Job. It demonstrates the mapping of a simple echo job using all of the parameters of a Compute Unit Description.

What types of workflows would this be useful for? Many jobs using the same executable.

In your $HOME directory, open a new file simple_ensembles.py with your favorite editor (e.g., vim) and paste the following content:

import os
import time
import sys
from pilot import PilotComputeService, ComputeDataService, State
	
### This is the number of jobs you want to run
NUMBER_JOBS=4
COORDINATION_URL = "redis://[email protected]:6379"

if __name__ == "__main__":

    pilot_compute_service = PilotComputeService(COORDINATION_URL)

    pilot_compute_description = { "service_url": "sge://localhost",
                                  "number_of_processes": 12,
                                  "allocation": "XSEDE12-SAGA",
                                  "queue": "development",                                      
                                  "working_directory": os.getenv("HOME")+"/agent",
                                  "walltime":10
                                }

    pilot_compute_service.create_pilot(pilot_compute_description=pilot_compute_description)

    compute_data_service = ComputeDataService()
    compute_data_service.add_pilot_compute_service(pilot_compute_service)

    print ("Finished Pilot-Job setup. Submitting compute units")

    # submit compute units
    for i in range(NUMBER_JOBS):
        compute_unit_description = {
                "executable": "/bin/echo",
                "arguments": ["Hello","$ENV1","$ENV2"],
                "environment": ['ENV1=env_arg1','ENV2=env_arg2'],
                "number_of_processes": 4,            
                "spmd_variation":"mpi",
                "output": "stdout.txt",
                "error": "stderr.txt",
                }    
        compute_data_service.submit_compute_unit(compute_unit_description)

    print ("Waiting for compute units to complete")
    compute_data_service.wait()

    print ("Terminate Pilot Jobs")
    compute_data_service.cancel()    
    pilot_compute_service.cancel()

Execute the script using command

python simple_ensembles.py

If you run the script, what do you get? You will have to go into the working directory( which is $HOME/agent in this case ), then the directory named after the pilot-service, and then the compute unit directories associated with that pilot-service.