Home

BigJob Wiki

Introduction

The SAGA BigJob framework is a SAGA-based pilot job implementation. The Simple API for Grid Applications (SAGA) is a high-level, easy-to-use API for accessing distributed resources. SAGA BigJob supports a wide range of application types, and is usable over a broad range of infrastructures, i.e., it is general-purpose, extensible and interoperable. Unlike other common pilot job systems SAGA BigJob (i) natively supports MPI job and (ii) works on a variety of back-end systems, generally reflecting the advantage of using a SAGA-based approach. The following figure gives an overview of the SAGA BigJob architecture.

Pilot-Job

Pilot-Jobs support the decoupling of workload submission from resource assignment, this results in a flexible execution model, which in turn enables the distributed scale-out of applications on multiple and possibly heterogeneous resources. It allows the execution of jobs without the necessity to queue each individual job.

Why do you need pilot-jobs?

The pilot job provides a container for many sub-jobs, i.e applications submit these sub-jobs through the pilot-job and not the resource manager. A major advantage of this approach is that the waiting time at the local resource manager, which usually significantly contributes the overall time-to-completion is avoided.

BigJob Architecture

An overview of the BigJob architecture can be found [here] (https://github.com/saga-project/BigJob/wiki/BigJob-Architecture).

How to Run BigJob on XSEDE

If you are a user / domain scientist and you want to run your computational workload on one of the following XSEDE machines, just follow the links below. The instructions explain how you can set-up your environment on these machines with a few simple commands to use a pre-installed version of BigJob. If you are planning to use any of these machines, we highly recommend to follow these instructions - using a pre-installed version of BigJob is much simpler and less error-prone than installing your own version of BigJob!

Lonestar (TACC)
Kraken (NICS)
Ranger (TACC)
Trestles (SDSC)
Multiple Resources

How to Run BigJob on FutureGrid

If you are a user / domain scientist and you want to run your computational workload on one of the following FutureGrid machines, just follow the links below. The instructions explain how you can set-up your environment on FutureGrid machines with a few simple commands to use a pre-installed version of BigJob.

[Instructions: BigJob on FutureGrid] (https://github.com/saga-project/BigJob/wiki/How-to-Run-BigJob-on-FutureGrid)

How to Run BigJob on Open Science Grid

In the context of the ExTENCI Project, we have developed experimental bindings that allow BigJob to access Open Science Grid's (OSG) glide-in WMS Condor pool. While these bindings are primarily used internally in application-specific Science Gateways (e.g., DARE-Cactus) to access OSG and XSEDE resources concurrently, it is possible to run BigJob directly via command-line on OSG resources.

[Instructions: BigJob on OSG] (https://github.com/saga-project/BigJob/wiki/How-to-Run-BigJob-on-OSG)

How to Run BigJob on LONI

If you are a user / domain scientist and you want to run your computational workload on one of the following LONI machines, just follow the links below. The requirement to run BigJob on LONI is to have a valid grid certificate. Once grid certificate is received on one machine, the same can be used by copying and placing on different machines of LONI. If you don't have a certificate, follow instructions at [Requesting Grid Certificate] (https://docs.loni.org/wiki/Requesting_a_LONI_Grid_Certificate).

The instructions explain how you can set-up your environment on these machines with a few simple commands.

Eric
Louie
Oliver
Poseidon

BigJob Command Line Client

BigJob provides a command line client (pilot-cli) that is shipped with the Python package. Instruction on how to use pilot-cli can be found here.

BigJob Application Execution and Examples

For an overview of application execution with BigJob see:

[Application Execution] (https://github.com/saga-project/BigJob/wiki/Application-Execution-and-Examples)

Below is a "living" list of example scripts that show how BigJob is used with different applications and different types of workloads. Each example focuses on a particular type of application (e.g., BFAST or AMBER), but they are often representative for a larger class of applications (e.g., single-core, multi-core, MPI, coupled, uncoupled, ...).

[Example 1] (https://github.com/saga-project/BigJob/blob/master/examples/example_local_single.py) Running single Big-Job and a single Sub-Job on localhost.
Example 2: Running single-core, uncoupled BFAST genome matching workloads with BigJob
[Example 3] (https://github.com/saga-project/BigJob/blob/master/examples/example_local_single_filestaging.py) Using BigJob with File Staging. A guide describing the usage of file staging can be found here.

How to Install Your Own Version of BigJob

These instructions are targeted towards experienced users and system administrators who need to install their own version of BigJob on an HPC cluster either in user or in system space. Please follow the steps below if this is what you want to do!

API

The BigJob API defines two main class bigjob representing the pilot and subjob representing an individual task. The API doc can be found at:

API Documentation

The Pilot-API can be used as an alternative way for accessing BigJob:

[Pilot-API] (https://github.com/saga-project/BigJob/wiki/Pilot-API) (Alpha!)

Where to Get Help?

Check the [Frequently Asked Questions] (https://github.com/saga-project/BigJob/wiki/Frequently-Asked-Questions).

For questions and comments, please join the bigjob-users group:


Subscribe to bigjob-users
Email:
Visit this group

***

Provide feedback

Saved searches

Use saved searches to filter your results more quickly