Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vine: Measure and Tune Dispatch Rates #3284

Closed
3 tasks done
dthain opened this issue Apr 20, 2023 · 15 comments
Closed
3 tasks done

Vine: Measure and Tune Dispatch Rates #3284

dthain opened this issue Apr 20, 2023 · 15 comments

Comments

@dthain
Copy link
Member

dthain commented Apr 20, 2023

Now that we have fast serverless tasks running in TaskVine+Parsl, the scheduling of tasks is likely to be the performance constraint, if we intend to have millions of tasks running on thousands of nodes. Let's understand the performance of the current TaskVine scheduler and if necessary, see what algorithmic improvements can be made to get dispatch rates that are reasonable stable with respect to the number of tasks and workers.

  • Create a simple dispatch benchmark for null tasks.
  • Write up a high level pseudo-code description of the current scheduling loop.
  • Propose some improvements to our approach.
@dthain
Copy link
Member Author

dthain commented May 30, 2023

@colinthomas-z80 please summarize here what you have found and keep some running notes as you go.

@colinthomas-z80
Copy link
Contributor

  • added entries to performance log and vine_profile_dispatch to plot accumulated scheduling time
  • scheduling loop pseudo code identified the complexity increasing factor being the number of workers
  • profiled scheduling loop with gprof to identify hot spots
  • optimized common functions in rmsummary for taskvine and work queue
  • optimized scheduling loop for SCHEDULE_FILES

@colinthomas-z80
Copy link
Contributor

We have observed that null task throughput will decline as larger batches of tasks are being submitted and processed by a single worker. Since the number of waiting tasks does not seem to influence the cost of scheduling, we will look further into this performance in work queue, taskvine, as well as in the context of Parsl, where high task throughput is desirable.

@colinthomas-z80
Copy link
Contributor

Separate from the issue of scheduling tasks to workers, we have observed that having an exceptionally large waiting task queue will cause the manager to spend unnecessary time iterating through tasks to schedule, when it should be fetching results so the workers will become available again.

That is to say, with 10k+ tasks submitted, the manager will dispatch tasks to all workers, yet continue to iterate through the 10k tasks trying to find one that will fit the busy workers. In throughput testing cases, and perhaps some practical cases, the tasks will finish before the manager is even done iterating through the list. Therefore it would be more effective for the manager to retrieve these tasks and perform other bookkeeping rather than iterate through tasks that cannot be scheduled.

A simple test where we only attempt to schedule the waiting task at the top of the list shows much better retained throughput as the size of the task queue grows. This however would severely limit workers from being effectively packed in the case of diverse tasks (3 core and 1 core tasks submitted to be run in parallel). Attempting to make a quick judgement about the resources available in the cluster compared to the task requirements in the queue is perhaps not possible.

One possible method would be to attempt scheduling a fixed number of tasks, and if none succeed, then we assume no workers are available and results should be fetched. This method shows good throughput results by considering 100 tasks. The implications of this on other aspects of the workflow still need to be studied.

@dthain
Copy link
Member Author

dthain commented Jun 5, 2023

A slight variation: just keep a cursor in the list, and examine ~100 before going back through the main loop. Then, next time, pick up where you left off. That way, you eventually make it through the entire list, just not all in one scheduling pass.

@dthain
Copy link
Member Author

dthain commented Jun 6, 2023

Alternate discussed today: add list_rotate option that moves the head to the tail, then you can just consider one item and keep going, up until N.

@colinthomas-z80 colinthomas-z80 moved this to In Progress in TaskVine Phase 2 Jun 6, 2023
@colinthomas-z80
Copy link
Contributor

list_rotate implemented and merged into work queue. Taskvine equivalent pending.

@colinthomas-z80
Copy link
Contributor

Further main loop optimizations for WQ in #3380. This utilizes the list rotate method in the expire_tasks routine as it was previously applied to send_one_task. expire_waiting_tasks by itself is rather expensive and it likely often runs unnecessarily during workflows where tasks are not specified with deadlines.

@colinthomas-z80
Copy link
Contributor

colinthomas-z80 commented Jun 20, 2023

Running the manager through gprof shows that a majority of time is spent in work_queue_get_stats, which is a logging mechanism. It may be called multiple times during a single pass of the main loop and causes a 3n iteration of the task list each invocation.

            0.01   22.53   25000/50005       add_task_report [13]
            0.01   22.53   25005/50005       log_queue_stats [14]
            [7]     64.7    0.02   45.06   50005         work_queue_get_stats [7]

            4.48   40.22  150015/150015      work_queue_get_stats [7]
           [8]     64.2    4.48   40.22  150015         task_state_count [8]
           24.83    0.00 1875300018/2871551875     itable_nextkey [9]
            2.36   12.79 1875150003/2871071303     task_state_is [12]
            0.23    0.00  150015/312725      itable_firstkey [28]

@dthain
Copy link
Member Author

dthain commented Jun 21, 2023

Indeed! But have a look at task_state_count which likely does a troubling amount of work...

@dthain
Copy link
Member Author

dthain commented Jun 21, 2023

#3380 is a nice tune-up for Work Queue. Now let's focus on "doing it right" in TaskVine with some more fundamental changes.

@dthain
Copy link
Member Author

dthain commented Jun 22, 2023

@colinthomas-z80 after thinking about this a little bit, I think the last_waiting_task and last_retrieved_task aren't safe under all conditions. If some other action causes the task to be changed or removed from the data structure (e.g. remove a task) then it's going to result in a crash.

Please remove that shortcut, and let's keep the others in place.

@dthain
Copy link
Member Author

dthain commented Jul 10, 2023

#3387

@dthain
Copy link
Member Author

dthain commented Jul 10, 2023

@colinthomas-z80 please summarize here what sort of dispatch rates we have now after your latest work.

@dthain
Copy link
Member Author

dthain commented Jan 29, 2024

Fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

2 participants