-
Notifications
You must be signed in to change notification settings - Fork 8
BigJob Tutorial Part 8: Automated Data Staging Example
melrom edited this page Sep 28, 2012
·
7 revisions
This page is part of the BigJob Tutorial.
The example demonstrates the movement of data with the executable to ensure the successful execution of said executable. The Pilot-API is responsible for moving the necessary data to the executable working directory. This is useful when the executable has input file dependencies.
Create a test.txt file in $HOME directory
cat /etc/motd > $HOME/test.txt
In your $HOME directory, open a new file compute_data.py with your favorite editor (e.g., vim) and paste the following content:
import sys
import os
import time
import logging
from pilot import PilotComputeService, ComputeDataService, State
COORDINATION_URL = "redis://[email protected]:6379"
if __name__ == "__main__":
pilot_compute_service = PilotComputeService(COORDINATION_URL)
# create pilot job service and initiate a pilot job
pilot_compute_description = { "service_url": "sge://localhost",
"number_of_processes": 12,
"allocation": "XSEDE12-SAGA",
"queue": "development",
"working_directory": os.getenv("HOME")+"/agent",
"walltime":10,
}
pilot_compute_service.create_pilot(pilot_compute_description=pilot_compute_description)
compute_data_service = ComputeDataService()
compute_data_service.add_pilot_compute_service(pilot_compute_service)
print ("Finished Pilot-Job setup. Submitting compute units")
# Submit 8 compute units
for i in range(8):
compute_unit_description = { "executable": "/bin/cat",
"arguments": ["test.txt"],
"number_of_processes": 1,
"output": "stdout.txt",
"error": "stderr.txt",
"file_transfer": ["ssh://" + os.getenv("HOME") + "/test.txt > SUBJOB_WORK_DIR"]
}
compute_data_service.submit_compute_unit(compute_unit_description)
print("Finished submission. Waiting for completion of CU")
compute_data_service.wait()
print ("Terminate Pilot Compute Service")
compute_data_service.cancel()
pilot_compute_service.cancel()
Execute the script using command
python compute_data.py
Can you extend the script to use multiple Pilot-Jobs and see how data is moved along with compute unit?
Hint: use mulitple_pilotjobs.py
example