-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vine: Measure and Tune Dispatch Rates #3284
Comments
@colinthomas-z80 please summarize here what you have found and keep some running notes as you go. |
|
We have observed that null task throughput will decline as larger batches of tasks are being submitted and processed by a single worker. Since the number of waiting tasks does not seem to influence the cost of scheduling, we will look further into this performance in work queue, taskvine, as well as in the context of Parsl, where high task throughput is desirable. |
Separate from the issue of scheduling tasks to workers, we have observed that having an exceptionally large waiting task queue will cause the manager to spend unnecessary time iterating through tasks to schedule, when it should be fetching results so the workers will become available again. That is to say, with 10k+ tasks submitted, the manager will dispatch tasks to all workers, yet continue to iterate through the 10k tasks trying to find one that will fit the busy workers. In throughput testing cases, and perhaps some practical cases, the tasks will finish before the manager is even done iterating through the list. Therefore it would be more effective for the manager to retrieve these tasks and perform other bookkeeping rather than iterate through tasks that cannot be scheduled. A simple test where we only attempt to schedule the waiting task at the top of the list shows much better retained throughput as the size of the task queue grows. This however would severely limit workers from being effectively packed in the case of diverse tasks (3 core and 1 core tasks submitted to be run in parallel). Attempting to make a quick judgement about the resources available in the cluster compared to the task requirements in the queue is perhaps not possible. One possible method would be to attempt scheduling a fixed number of tasks, and if none succeed, then we assume no workers are available and results should be fetched. This method shows good throughput results by considering 100 tasks. The implications of this on other aspects of the workflow still need to be studied. |
A slight variation: just keep a cursor in the list, and examine ~100 before going back through the main loop. Then, next time, pick up where you left off. That way, you eventually make it through the entire list, just not all in one scheduling pass. |
Alternate discussed today: add |
list_rotate implemented and merged into work queue. Taskvine equivalent pending. |
Further main loop optimizations for WQ in #3380. This utilizes the list rotate method in the expire_tasks routine as it was previously applied to send_one_task. expire_waiting_tasks by itself is rather expensive and it likely often runs unnecessarily during workflows where tasks are not specified with deadlines. |
Running the manager through gprof shows that a majority of time is spent in work_queue_get_stats, which is a logging mechanism. It may be called multiple times during a single pass of the main loop and causes a 3n iteration of the task list each invocation.
|
Indeed! But have a look at |
#3380 is a nice tune-up for Work Queue. Now let's focus on "doing it right" in TaskVine with some more fundamental changes. |
@colinthomas-z80 after thinking about this a little bit, I think the Please remove that shortcut, and let's keep the others in place. |
@colinthomas-z80 please summarize here what sort of dispatch rates we have now after your latest work. |
Fixed |
Now that we have fast serverless tasks running in TaskVine+Parsl, the scheduling of tasks is likely to be the performance constraint, if we intend to have millions of tasks running on thousands of nodes. Let's understand the performance of the current TaskVine scheduler and if necessary, see what algorithmic improvements can be made to get dispatch rates that are reasonable stable with respect to the number of tasks and workers.
The text was updated successfully, but these errors were encountered: