-
Notifications
You must be signed in to change notification settings - Fork 0
gcd - Grand Central Dispatch
License
jxy/gcd
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
NAME gcd - Grand Central Dispatch SYNOPSIS ./gcd Prints status and offers to submit jobs according to `gcd_CONF_PLAN'. ./scripts/r Universal job script that runs jobs (section FILES UNDER JOBPATH). DESCRIPTION It implements functions for dispatching long running jobs, with a file system based lock for parallel status update. When called directly, `gcd' shows the status of the jobs and suggests further actions. FILES gcd The source code for system independent functionalities. conf Configuration. gcd_CONF_PLAN=( 10 8192 1440 ProjectName "Path and/or Category" 10 4096 360 ProjectName "Path and/or Category" ) The array, gcd_CONF_PLAN, contains a list in sets of 5 items: the number of jobs to submit, the size (in number of nodes), the time (in minutes), the project name, and extra parameters for a preferred path and / or preferred categories. Setting `gcd_RESUBMIT' to 0 stops the job script to resubmit itself after finishes. Other global variables can be set here, but only without the prefix `gcd'. util/* Scripts to be sourced that defines useful platform dependent utility functions. scripts/r The universal job scripts. It is designed to be submitted to a queuing system that can run custom script inside the top gcd directory. It manages job availability and uses the files under JOBPATH to start and monitor jobs. See section FILES UNDER JOBPATH for details. Given platform support, it can also be run directly, see files under `test' directory. work/* Job managers have their stdout and stderr redirected to files here. jobs/* Paths to jobs, aka JOBPATH. Symlinks in practice. test/* For testing. test/scripts/test Run it under its directory for simple tests. FILES UNDER JOBPATH DESC Free format job description. PROP Job properties related to scheduling, shared among all JOBID under this JOBPATH. The file is sourced as a shell script. It must only define the following variables. block_size=2048 time_limit=7200 repeat=1 category="URGENT HMC" All are integers, except for the last, category is a string, representing space separated categories. The job only fits block_size nodes. The job should finish within time_limit seconds. If repeat is nonzero, the job will continue to be available after finishing regardless of its last return status. If time_limit is zero, it uses 5 percent more than the previous max run time plus 20 minutes. There is a special category, `URGENT', with which jobs would get a higher priority. LIST All individual jobs under this JOBPATH. This file contains the list of JOBID, one per line. Any line that begins with `#' is ignored. JOB This file is sourced under the containing directory. It prepares the job, and set up variables related to running. It must define the following variables. rank_per_node=32 omp_nthreads=2 update_time=1200 run=( full command to execute ) output_path=stdout error_path=stderr The job will start by running "${run[@]}" >"$output_path" 2>"$error_path" and the output_path is checked for possible hangs. The output_path and error_path should be unique for each job run, even for jobs with repeat=1. We consider the job hangs if it hasn't updated its output file in the last update_time seconds, if update_time is positive. If update_time is zero, it uses twice of the previous max update_time plus 5 minutes or the time_limit, whichever is smaller. It can define additional variables that overwrites those defined in PROP. It can use variables given from the executing environment: gcd_JOBPATH, gcd_JOBID, gcd_JOBSIZE, gcd_JOBBLOCK. run/$JOBID.served This file is created when `fetch' serves this JOBID, and is removed when `return' gets this JOBID. run/$JOBID.stop This file is created when `return' gets this JOBID with zero RETURNCODE, and PROP has repeat=0. run/$JOBID.log `fetch' and `return' record the time when the job is served and returned. Job scripts record the idle time during runs. TODO Find an exit strategy. LICENSE MIT license. See file `LICENSE'. HISTORY May, 2017, version 2.0 February, 2017, version 1.0 BUG JOBID cannot begin with `#', nor can it contain the path separator `/'. Each job run should have unique output_path and error_path. The path to gcd cannot contain single quotes. Manual intervention required to stop jobs from self-replicating.
About
gcd - Grand Central Dispatch
Resources
License
Stars
Watchers
Forks
Packages 0
No packages published