Skip to content

File Staging

drelu edited this page Mar 25, 2012 · 4 revisions

File Management and Staging

Managing files in distributed systems is a tedious tasks - different paths, names, file versions complicates distributed runs. Since BigJob 0.3.38, BJ includes some basic file staging capabilities. For each bigjob created a directory with the id of the big-job is created:

Big-Job

<BIGJOB_WORKING_DIRECTORY>/bj-54aaba6c-32ec-11e1-a4e5-00264a13ca4c/
<BIGJOB_WORKING_DIRECTORY>/bj-3645d5e8-32ec-11e1-b346-00264a13ca4c/
<BIGJOB_WORKING_DIRECTORY>/bj-398e110a-32e9-11e1-ae24-00264a13ca4c/

Files can be staged to the BJ working directory using the filetransfer parameter of {{{start_pilot_job}}}:

bj_filetransfers = ["ssh://" + os.path.dirname(os.path.abspath(__file__)) 
                    + "/test.txt > BIGJOB_WORK_DIR"]


bj.start_pilot_job( lrms_url,
                    None,
                    number_of_processes,
                    queue,
                    project,
                    workingdirectory,
                    userproxy,
                    walltime,
                    processes_per_node,
                    bj_filetransfers)

The stdout and stderr of the BJ agent is written to this directory.

Sub-Jobs

For each sub-job a sub-directory is created in the directory of the parent BJ:

<BIGJOB_WORKING_DIRECTORY>/bj-54aaba6c-32ec-11e1-a4e5-00264a13ca4c/sj-55010912-32ec-11e1-a4e5-00264a13ca4c
<BIGJOB_WORKING_DIRECTORY>/bj-54aaba6c-32ec-11e1-a4e5-00264a13ca4c/sj-55153072-32ec-11e1-a4e5-00264a13ca4c

By default (i.e. if no working directory is specified in its job description), each sub-job is executed in its sub-job specific directory. If a working directory is specified, the sub-job is specified in this directory.

Files can be staged to the sub-job directory by using the filetransfer attribute:

jd = description()
jd.executable = "/bin/cat"
jd.number_of_processes = "1"
jd.spmd_variation = "single"
jd.arguments = ["test.txt"]
jd.output = "stdout.txt"
jd.error = "stderr.txt"
jd.filetransfer = ["ssh://" + os.path.dirname(os.path.abspath(__file__)) 
                   + "/test.txt > SUBJOB_WORK_DIR"]

File Staging Adaptors

BigJob supports different file staging mechanisms - current SSH and Globus Online. Details on the Globus Online support can be found here.