Skip to content

Commit

Permalink
vine: tune parameter immediate-recovery (#3517)
Browse files Browse the repository at this point in the history
* vine: tune parameter immediate-recovery

Submit recovery tasks for temporary files as soon as their worker is
lost.

Use case: topEFT sorts tasks according to how large they are when
constructing accumulation tasks. This means that it takes a long time
to discover that a result that was expensive to compute was lost. With
this change, recovery tasks are created as soon as the worker is gone.
Since all tmp files are used as inputs at least once, and topEFT removes
tmp files as soon as they are used, this change saves time at the
workflow rampdown.

* format
  • Loading branch information
btovar authored Sep 26, 2023
1 parent 4998b05 commit 358f6e6
Show file tree
Hide file tree
Showing 3 changed files with 16 additions and 2 deletions.
5 changes: 3 additions & 2 deletions doc/manuals/taskvine/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -2374,11 +2374,12 @@ change.
| hungry-minimum | Smallest number of waiting tasks in the manager before declaring it hungry | 10 |
| hungry-minimum-factor | Queue is hungry if number of waiting tasks is less than hungry-minumum-factor x (number of workers) | 2 |
| ramp-down-heuristic | If set to 1 and there are more workers than tasks waiting, then tasks are allocated all the free resources of a worker large enough to run them. If monitoring watchdog is not enabled, then this heuristic has no effect. | 0 |
| immediate-recovery | If set to 1, create recovery tasks for temporary files as soon as their worker disconnects. Otherwise, create recovery tasks only if the temporary files are used as input when trying to dispatch another task. | 0 |
| monitor-interval | Maximum number of seconds between resource monitor measurements. If less than 1, use default. | 5 |
| resource-submit-multiplier | Assume that workers have `resource x resources-submit-multiplier` available.<br> This overcommits resources at the worker, causing tasks to be sent to workers that cannot be immediately executed.<br>The extra tasks wait at the worker until resources become available. | 1 |
| wait-for-workers | Do not schedule any tasks until `wait-for-workers` are connected. | 0 |
| max-retrievals | Sets the max number of tasks to retrievals per manager wait(). If less than 1, the manager prefers to retrievals all completed tasks before dispatching new tasks to workers. | 1 |
| worker-retrievals | If 1, retrievals all completed tasks from a worker when retrieving results, even if going above the parameter max-retrievals . Otherwise, if 0, retrieve just one task before deciding to dispatch new tasks or connect new workers. | 1 |
| max-retrievals | Sets the max number of tasks to retrieve per manager wait(). If less than 1, the manager prefers to retrieve all completed tasks before dispatching new tasks to workers. | 1 |
| worker-retrievals | If 1, retrieve all completed tasks from a worker when retrieving results, even if going above the parameter max-retrievals . Otherwise, if 0, retrieve just one task before deciding to dispatch new tasks or connect new workers. | 1 |


=== "Python"
Expand Down
10 changes: 10 additions & 0 deletions taskvine/src/manager/vine_manager.c
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,8 @@ static void release_all_workers(struct vine_manager *q);
static void vine_manager_send_library_to_workers(struct vine_manager *q, const char *name, time_t stoptime);
static void vine_manager_send_libraries_to_workers(struct vine_manager *q, time_t stoptime);

static int vine_manager_check_inputs_available(struct vine_manager *q, struct vine_task *t);

static void delete_worker_file(
struct vine_manager *q, struct vine_worker_info *w, const char *filename, int flags, int except_flags);

Expand Down Expand Up @@ -793,6 +795,11 @@ static void cleanup_worker(struct vine_manager *q, struct vine_worker_info *w)

reap_task_from_worker(q, w, t, VINE_TASK_READY);

// recreate inputs lost
if (q->immediate_recovery) {
vine_manager_check_inputs_available(q, t);
}

vine_task_clean(t);

itable_firstkey(w->current_tasks);
Expand Down Expand Up @@ -5130,6 +5137,9 @@ int vine_tune(struct vine_manager *q, const char *name, double value)
} else if (!strcmp(name, "ramp-down-heuristic")) {
q->ramp_down_heuristic = MAX(0, (int)value);

} else if (!strcmp(name, "immediate-recovery")) {
q->immediate_recovery = !!((int)value);

} else if (!strcmp(name, "file-source-max-transfers")) {
q->file_source_max_transfers = MAX(1, (int)value);

Expand Down
3 changes: 3 additions & 0 deletions taskvine/src/manager/vine_manager.h
Original file line number Diff line number Diff line change
Expand Up @@ -190,6 +190,9 @@ struct vine_manager {
int proportional_whole_tasks; /* If true, round-up proportions to whole number of tasks. */
int ramp_down_heuristic; /* If true, and there are more workers than tasks waiting, then tasks are allocated all the free resources of a worker large enough to run them.
If monitoring watchdog is not enabled, then this heuristic has no effect. */
int immediate_recovery; /* If true, recovery tasks for tmp files are created as soon as the worker that had them
disconnects. Otherwise, create them only when a tasks needs then as inputs (this is
the default). */
double resource_submit_multiplier; /* Factor to permit overcommitment of resources at each worker. */
double bandwidth_limit; /* Artificial limit on bandwidth of manager<->worker transfers. */
int disk_avail_threshold; /* Ensure this minimum amount of available disk space. (in MB) */
Expand Down

0 comments on commit 358f6e6

Please sign in to comment.