WQ: Further mainloop optimizations #3380

colinthomas-z80 · 2023-06-16T18:42:20Z

No description provided.

benclifford · 2023-06-19T12:42:27Z

I tried this PR (commit fa92953) on my laptop with parsl-perf --time 300, and I still see that I get only 42 tasks/second near the start of a run. This is with around 75000 tasks in the queue at the start. I think that's not a noticeable change for this particular measurement compared to the current cctools master commit. here's a plot of TASK RUNNING events for that run.

(as I've mentioned before, ignore spikes near the beginning, as that is parsl-perf calibrating itself)

colinthomas-z80 · 2023-06-19T16:08:18Z

@benclifford This change improved my performance in the ranges of 10-20k tasks in queue. But yes it still does not hold up in the larger queue sizes. I believe the remaining issue is in the way we retrieve tasks.

colinthomas-z80 · 2023-06-20T17:50:09Z

Task retrieval is shaped up a bit, now the suspect seems to be logging overhead as seen in the profile on #3284

dthain · 2023-06-20T19:47:15Z

Looks like this is making progress but maybe not ready to merge yet.
Let me know when you think it's ready.

colinthomas-z80 · 2023-06-20T19:54:13Z

@dthain Before this is done I wanted to hear your thoughts on adding some sort of "high throughput" mode which disables the performance log.

dthain · 2023-06-21T02:08:04Z

Some quick thoughts -

Ideally, the performance log shouldn't come at an enormous overhead. Check to see if we are doing something unnecessarily expensive in writing to the performance log, like fflush/fsync.
The performance log should only be enabled if specifically requested by vine_enable_perf_log. Perhaps the parsl-taskvine executor has that on by default.

colinthomas-z80 · 2023-06-21T02:35:31Z

@dthain It is not necessarily writing the performance log that is expensive, rather collecting the information about waiting/ready tasks when the queue is large. We do an unnecessarily expensive operation collecting that information which could perhaps be reduced to a single full iteration of the task list, which will still present a problem as the queue size grows.

This is also pertaining to work queue specifically at the moment. Even if the performance log is not specified we still collect the information before we break out of log_queue_stats

btovar · 2023-06-21T12:08:16Z

@colinthomas-z80 An alternative is to increase the interval at which stats are written (e.g in work_queue.c change to 15*ONE_SECOND), and remove the force write in line 7090, (change to log_queue_stats(q, 0), and remove the enclosing if).

dthain · 2023-06-21T13:09:42Z

Ok, so our guiding principles here are:

Make minimal changes to WQ to meet needs and keep it working.
Put our main efforts into TaskVine going forward.

So, if the underlying problem is that we are inefficiently iterating over the entire ready queue in order to count task states when writing the performance lag, then I think we should do the following:

1 - Make a few safe and easy modifications to WQ: just traverse the list once, and do it less frequently.
2 - Modify TaskVine to properly track the count of tasks in each state. Since all task state changes (wisely) go through change_task_state, it ought to be straightforward to update some integer counters there, as long as we are disciplined to ensure that there is no other way to change the state.

btovar · 2023-06-21T13:22:55Z

Regarding point 2, I think that's how it was in wq 1, and why we changed it to the current implementation in wq 2. As code evolved, it became really hard to maintain the state twice (i.e., the real state, and the stats state). I think point 1 (iterating the list once and doing it less frequently) may go a long way.

dthain · 2023-06-21T14:52:30Z

Let's proceed with doing the easy thing for WQ and see how far it gets us.

I'm willing to try the more complex thing for TV if it necessary to meet our performance goals.

colinthomas-z80 · 2023-06-21T17:43:25Z

Ready to merge if there are no further concerns

work_queue/src/work_queue.c

dthain · 2023-06-21T17:46:22Z

What's the performance effect of this change?

dthain · 2023-06-21T18:22:24Z

Ok, how much does this help performance?

colinthomas-z80 · 2023-06-21T18:27:06Z

@dthain Although it is subjective depending on the hardware, This gives me a peak around 500 tasks/sec with 12k tasks in the queue. At queue sizes up to 60k-70k I will have around 200 tasks/sec. The most I have tested is 110k tasks which resulted in a 160 tasks/sec average. So the drop off is much less severe than before.

Before we were seeing 300 ish tasks/sec with 10k in the queue, and less than 100 tasks/sec with 20k+ in the queue.

Once I start working on taskvine I hope to have some more visual depictions of the performance.

work_queue/src/work_queue.c

colinthomas-z80 · 2023-06-21T19:11:04Z

This PR

  ==== Iteration 1 ====
  Will run 10 tasks to target 120 seconds runtime
  Submitting tasks / invoking apps
  warning: using plain-text when communicating with workers.
  warning: use encryption with a key and cert when creating the manager.
  All 10 tasks submitted ... waiting for completion
  Submission took 0.008 seconds = 1235.690 tasks/second
  Runtime: actual 4.432s vs target 120s
  Tasks per second: 2.256
  ==== Iteration 2 ====
  Will run 270 tasks to target 120 seconds runtime
  Submitting tasks / invoking apps
  All 270 tasks submitted ... waiting for completion
  Submission took 0.105 seconds = 2569.648 tasks/second
  Runtime: actual 2.376s vs target 120s
  Tasks per second: 113.648
  ==== Iteration 3 ====
  Will run 13637 tasks to target 120 seconds runtime
  Submitting tasks / invoking apps
  All 13637 tasks submitted ... waiting for completion
  Submission took 5.136 seconds = 2654.964 tasks/second
  Runtime: actual 26.247s vs target 120s
  Tasks per second: 519.573
  ==== Iteration 4 ====
  Will run 62348 tasks to target 120 seconds runtime
  Submitting tasks / invoking apps
  All 62348 tasks submitted ... waiting for completion
  Submission took 25.965 seconds = 2401.229 tasks/second
  Runtime: actual 325.878s vs target 120s
  Tasks per second: 191.323
  Cleaning up DFK
  The end

colinthomas-z80 · 2023-06-21T19:20:49Z

Compared to cctools current

  ==== Iteration 1 ====
  Will run 10 tasks to target 120 seconds runtime
  Submitting tasks / invoking apps
  warning: using plain-text when communicating with workers.
  warning: use encryption with a key and cert when creating the manager.
  All 10 tasks submitted ... waiting for completion
  Submission took 0.008 seconds = 1261.672 tasks/second
  Runtime: actual 4.412s vs target 120s
  Tasks per second: 2.266
  ==== Iteration 2 ====
  Will run 271 tasks to target 120 seconds runtime
  Submitting tasks / invoking apps
  All 271 tasks submitted ... waiting for completion
  Submission took 0.088 seconds = 3077.867 tasks/second
  Runtime: actual 2.372s vs target 120s
  Tasks per second: 114.231
  ==== Iteration 3 ====
  Will run 13707 tasks to target 120 seconds runtime
  Submitting tasks / invoking apps
  All 13707 tasks submitted ... waiting for completion
  Submission took 4.890 seconds = 2802.854 tasks/second
  Runtime: actual 91.802s vs target 120s
  Tasks per second: 149.310
  Cleaning up DFK
  The end

dthain · 2023-06-21T19:22:00Z

Looking good! Ready to merge now?

colinthomas-z80 · 2023-06-21T19:22:35Z

Yes I believe so

use list rotate on expire

fa92953

colinthomas-z80 mentioned this pull request Jun 16, 2023

Vine: Measure and Tune Dispatch Rates #3284

Closed

3 tasks

store last waiting/retrieved tasks to quickly fetch them

206b0c5

should fix python

2017ebd

reduce overhead of get_stats and frequency of performance logging

6b58d83

dthain requested changes Jun 21, 2023

View reviewed changes

work_queue/src/work_queue.c Show resolved Hide resolved

refactor receive_one_task

0e47526

dthain reviewed Jun 21, 2023

View reviewed changes

work_queue/src/work_queue.c Outdated Show resolved Hide resolved

dthain reviewed Jun 21, 2023

View reviewed changes

work_queue/src/work_queue.c Outdated Show resolved Hide resolved

dthain reviewed Jun 21, 2023

View reviewed changes

work_queue/src/work_queue.c Show resolved Hide resolved

colinthomas-z80 added 2 commits June 21, 2023 14:56

fix logic

ab029c7

itable iterate

86595e3

dthain approved these changes Jun 21, 2023

View reviewed changes

dthain merged commit c2c386d into cooperative-computing-lab:master Jun 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WQ: Further mainloop optimizations #3380

WQ: Further mainloop optimizations #3380

colinthomas-z80 commented Jun 16, 2023

benclifford commented Jun 19, 2023

colinthomas-z80 commented Jun 19, 2023

colinthomas-z80 commented Jun 20, 2023

dthain commented Jun 20, 2023

colinthomas-z80 commented Jun 20, 2023

dthain commented Jun 21, 2023

colinthomas-z80 commented Jun 21, 2023

btovar commented Jun 21, 2023

dthain commented Jun 21, 2023

btovar commented Jun 21, 2023

dthain commented Jun 21, 2023

colinthomas-z80 commented Jun 21, 2023

dthain commented Jun 21, 2023

dthain commented Jun 21, 2023

colinthomas-z80 commented Jun 21, 2023

colinthomas-z80 commented Jun 21, 2023 •

edited

Loading

colinthomas-z80 commented Jun 21, 2023

dthain commented Jun 21, 2023

colinthomas-z80 commented Jun 21, 2023

WQ: Further mainloop optimizations #3380

WQ: Further mainloop optimizations #3380

Conversation

colinthomas-z80 commented Jun 16, 2023

benclifford commented Jun 19, 2023

colinthomas-z80 commented Jun 19, 2023

colinthomas-z80 commented Jun 20, 2023

dthain commented Jun 20, 2023

colinthomas-z80 commented Jun 20, 2023

dthain commented Jun 21, 2023

colinthomas-z80 commented Jun 21, 2023

btovar commented Jun 21, 2023

dthain commented Jun 21, 2023

btovar commented Jun 21, 2023

dthain commented Jun 21, 2023

colinthomas-z80 commented Jun 21, 2023

dthain commented Jun 21, 2023

dthain commented Jun 21, 2023

colinthomas-z80 commented Jun 21, 2023

colinthomas-z80 commented Jun 21, 2023 • edited Loading

colinthomas-z80 commented Jun 21, 2023

dthain commented Jun 21, 2023

colinthomas-z80 commented Jun 21, 2023

colinthomas-z80 commented Jun 21, 2023 •

edited

Loading