Shepherd is designed to manage local workflows that involve both actions that run to completion and services that persist until explicitly stopped. It tracks the internal states of services by analyzing log outputs and file contents, allowing it to manage complex workflows with ease. Shepherd treats services as tasks, enabling them to be seamlessly integrated into larger distributed workflows. By monitoring the internal states of services, Shepherd ensures that dependencies are met and services are gracefully shut down when required.
Shepherd is particularly useful for applications that require persistent services as part of a traditional task-based workflow. For example, consider a scenario where a web server should start only after the database service has completed its initialization and is ready to handle queries. The database does not perform a single action that completes; instead, it reaches an internal state that should dynamically trigger the launch of the web server. Shepherd can monitor the database service's logs for a message indicating successful startup. Upon detecting this state, Shepherd triggers the initiation of the web server, ensuring efficient workflow execution. Moreover, Shepherd wraps the entire service workflow into a single task that terminates upon completion, making it easy to integrate into larger task-based workflows.
-
Services as Tasks: Shepherd treats persistent services as first-class tasks within a workflow, enabling them to be seamlessly integrated into traditional task-based workflow managers.
-
Dependency Management: Shepherd allows tasks (both actions and services) to start based on the internal states of other services or actions, enabling complex state-based dependencies. It supports
any
andall
dependency modes, allowing for flexible dependency configurations. -
State Monitoring: Shepherd continuously monitors the internal state of each application by analyzing standard output and the contents of any specified files. It updates the state in real-time based on specific keywords found in these files. This allows Shepherd to manage workflows with dynamic state transitions.
-
Graceful Startup and Shutdown: Shepherd ensures services start only when their dependencies are met and provides controlled shutdown mechanisms to prevent data loss and ensure system stability.
-
Robust Logging and Visualization: Shepherd generates comprehensive logs and state transition data, which can be used to create visualizations of the workflow execution, aiding in performance analysis and debugging.
-
Integration with Larger Workflows: By encapsulating service workflows into single tasks, Shepherd enables easy integration with larger distributed workflow managers like Makeflow, enhancing workflow flexibility and reliability.
The Shepherd tool manages program execution through a series of defined states, ensuring dependencies are met and states are recorded. Every program has default states (Initialized
, Started
, and Final
) and can have optional user-defined states. Programs transition from Initialized
to Started
once dependencies are satisfied, then move through user-defined states. Actions return Action Success
on a zero return code and Action Failure
otherwise, while services transition to Service Failure
if they stop unexpectedly. Any program receiving a stop signal is marked as Stopped
, and all programs ultimately transition to a Final
state, reflecting their execution outcome.
To install Shepherd, clone the repository and install using pip:
git clone https://github.com/cooperative-computing-lab/shepherd.git
cd shepherd
pip install .
Optionally, create a virtual environment before installing Shepherd to avoid conflicts with other Python packages:
python3 -m venv venv
source venv/bin/activate
pip install .
Shepherd simplifies complex application workflows. Here’s a simple example to demonstrate how to use Shepherd for
scheduling dependent programs. In this example, we have two shell scripts: program1.sh
and program2.sh
. program2
should start only after program1
has successfully completed its execution.
Create two shell scripts named program1.sh
and program2.sh
with the following content:
#!/bin/bash
echo "Starting program..."
sleep 5
echo "Program completed"
Make sure to make the scripts executable:
chmod +x program1.sh program2.sh
Create a Shepherd configuration file named shepherd-config.yml
with the following content:
tasks:
program1:
command: "./program1.sh"
program2:
command: "./program2.sh"
dependency:
items:
program1: "action_success" # Start program2 only after program1 succeeds
output:
state_times: "state_transition_times.json"
max_run_time: 60 # Optional: Limit total runtime to 60 seconds
Run Shepherd with the configuration file:
shepherd -c shepherd-config.yml
If you are running the python source, then run
python3 shepherd.py -c shepherd-config.yml
If you are running shepherd executable, then run
shepherd -c shepherd-config.yml
With this simple configuration, Shepherd will:
- Execute
program1.sh
. - Monitor the internal states of the program.
- Start
program2.sh
only afterprogram1.sh
succeeds. - Create
state_transition_times.json
, which will look similar to this:
{
"my_service": {
"initialized": 0.19824790954589844,
"started": 0.19916582107543945,
"ready": 5.461252927780151,
"stopped": 60.36078095436096,
"final": 60.36192083358765
},
"my_action": {
"initialized": 0.20003700256347656,
"started": 5.466093063354492,
"action_success": 10.657495021820068,
"final": 10.658098936080933
}
}
Shepherd can monitor standard output (stdout) or any other file to detect user-defined states. These states can then be used as dependencies for other programs. This feature allows you to define complex workflows based on custom application states.
Suppose you have a service that becomes 'ready' after some initialization, and other tasks depend on it being ready.
Example Service script (service.sh
):
#!/bin/bash
echo "Service is starting..."
sleep 5
echo "Service is ready"
tail -f /dev/null # Keep the service running
Action script (action.sh
):
#!/bin/bash
echo "Action is running..."
sleep 5
echo "Action completed"
Make sure to make the scripts executable:
chmod +x service.sh action.sh
Below is a Shepherd configuration file that monitors the standard output of the service script to detect the 'ready' state. The action script starts only after the service is ready.
tasks:
my_service:
type: "service"
command: "./service.sh"
state:
log:
ready: "Service is ready"
my_action:
type: "action"
command: "./action.sh"
dependency:
items:
my_service: "ready"
output:
state_times: "state_transition_times.json"
max_run_time: 60
- Shepherd starts the service script
service.sh
. - Shepherd monitors the standard output of the service script for the message "Service is ready".
- Once the service is ready, Shepherd starts the action script
action.sh
. - Shepherd creates
state_transition_times.json
with the state transition times.
{
"my_service": {
"initialized": 0.25,
"started": 0.26,
"ready": 5.35,
"stopped": 65.00,
"final": 65.01
},
"my_action": {
"initialized": 5.36,
"started": 5.37,
"action_success": 10.40,
"final": 10.41
}
}
Shepherd uses a YAML configuration file to define the workflow. Here are some key configuration options:
Tasks are defined under the tasks section. Each task can be an action or a service:
- Action: A task that runs to completion and exits.
- Service: A persistent application that runs until explicitly stopped.
The default type is action. Here is an example configuration with an action and a service:
tasks:
my_action:
type: "action"
command: "python process_data.py"
my_service:
type: "service"
command: "start_server.sh"
Dependencies specify when a task should start, based on the states of other tasks.
- Mode: Specifies whether all dependencies must be met (
all
, the default) or any one (any
).
tasks:
task2:
type: "action"
command: "./task2.sh"
dependency:
mode: "all"
items:
task1: "action_success"
task3: "ready"
Shepherd can monitor standard output or files to detect user-defined states. This allows you to control the workflow based on custom application states.
Example of monitoring standard output:
tasks:
my_program:
command: "./my_program.sh"
state:
log:
ready: "program is ready"
complete: "program is completed"
Example of monitoring a file:
tasks:
my_task:
type: "action"
command: "./my_task.sh"
state:
file:
path: "/tmp/my_task_state.log"
states:
completed: "Task completed successfully"
Shepherd can generate output files containing state transition times and other logs. You can specify the output file paths in the configuration:
output:
state_times: "state_transition_times.json"
stdout: "stdout.log"
stderr: "stderr.log"
Shepherd can be configured to stop all tasks based on specific conditions, such as a stop signal, maximum runtime, or success criteria:
- Stop Signal: A file that, when created, triggers a controlled shutdown.
- Max Run Time: The maximum runtime for the entire workflow.
- Success Criteria: Criteria that, when met, trigger a shutdown.
stop_signal: "/tmp/shepherd_stop"
max_run_time: 120
success_criteria:
mode: "all"
items:
task1: "action_success"
task2: "action_success"
Only one of these conditions needs to be met for Shepherd to initiate a controlled shutdown.