Skip to content

jxy/gcd

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NAME
	gcd - Grand Central Dispatch

SYNOPSIS
	./gcd
		Prints status and offers to submit jobs according to
		`gcd_CONF_PLAN'.

	./scripts/r
		Universal job script that runs jobs (section FILES
	UNDER JOBPATH).

DESCRIPTION
	It implements functions for dispatching long running jobs,
	with a file system based lock for parallel status update.

	When called directly, `gcd' shows the status of the jobs and
	suggests further actions.

FILES
	gcd
		The source code for system independent functionalities.

	conf
		Configuration.

			gcd_CONF_PLAN=(
			10 8192 1440 ProjectName "Path and/or Category"
			10 4096 360 ProjectName "Path and/or Category"
			)

		The array, gcd_CONF_PLAN, contains a list in sets
		of 5 items: the number of jobs to submit, the size
		(in number of nodes), the time (in minutes), the
		project name, and extra parameters for a preferred
		path and / or preferred categories.  Setting
		`gcd_RESUBMIT' to 0 stops the job script to resubmit
		itself after finishes.  Other global variables can be
		set here, but only without the prefix `gcd'.

	util/*
		Scripts to be sourced that defines useful platform
		dependent utility functions.

	scripts/r
		The universal job scripts.  It is designed to be
		submitted to a queuing system that can run custom
		script inside the top gcd directory.  It manages
		job availability and uses the files under JOBPATH
		to start and monitor jobs.  See section FILES UNDER
		JOBPATH for details.  Given platform support, it can
		also be run directly, see files under `test' directory.

	work/*
		Job managers have their stdout and stderr redirected
		to files here.

	jobs/*
		Paths to jobs, aka JOBPATH.  Symlinks in practice.

	test/*
		For testing.

	test/scripts/test
		Run it under its directory for simple tests.

FILES UNDER JOBPATH
	DESC
		Free format job description.

	PROP
		Job properties related to scheduling, shared
		among all JOBID under this JOBPATH.  The file is
		sourced as a shell script.  It must only define
		the following variables.

			block_size=2048
			time_limit=7200
			repeat=1
			category="URGENT HMC"

		All are integers, except for the last, category is
		a string, representing space separated categories.
		The job only fits block_size nodes.  The job should
		finish within time_limit seconds.  If repeat is
		nonzero, the job will continue to be available after
		finishing regardless of its last return status.
		If time_limit is zero, it uses 5 percent more
		than the previous max run time plus 20 minutes.
		There is a special category, `URGENT', with which
		jobs would get a higher priority.

	LIST
		All individual jobs under this JOBPATH.  This file
		contains the list of JOBID, one per line.  Any line
		that begins with `#' is ignored.

	JOB
		This file is sourced under the containing directory.
		It prepares the job, and set up variables related
		to running.  It must define the following variables.

			rank_per_node=32
			omp_nthreads=2
			update_time=1200
			run=( full command to execute )
			output_path=stdout
			error_path=stderr

		The job will start by running

			"${run[@]}" >"$output_path" 2>"$error_path"

		and the output_path is checked for possible hangs.
		The output_path and error_path should be unique for
		each job run, even for jobs with repeat=1.  We
		consider the job hangs if it hasn't updated its
		output file in the last update_time seconds, if
		update_time is positive.  If update_time is zero,
		it uses twice of the previous max update_time plus
		5 minutes or the time_limit, whichever is smaller.
		It can define additional variables that overwrites
		those defined in PROP.  It can use variables
		given from the executing environment: gcd_JOBPATH,
		gcd_JOBID, gcd_JOBSIZE, gcd_JOBBLOCK.

	run/$JOBID.served
		This file is created when `fetch' serves this JOBID,
		and is removed when `return' gets this JOBID.

	run/$JOBID.stop
		This file is created when `return' gets this JOBID
		with zero RETURNCODE, and PROP has repeat=0.

	run/$JOBID.log
		`fetch' and `return' record the time when the job
		is served and returned.  Job scripts record the
		idle time during runs.

TODO
	Find an exit strategy.

LICENSE
	MIT license.  See file `LICENSE'.

HISTORY
	May, 2017, version 2.0
	February, 2017, version 1.0

BUG
	JOBID cannot begin with `#', nor can it contain the path
	separator `/'.

	Each job run should have unique output_path and error_path.

	The path to gcd cannot contain single quotes.

	Manual intervention required to stop jobs from self-replicating.

About

gcd - Grand Central Dispatch

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages