-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WQ: Further mainloop optimizations #3380
WQ: Further mainloop optimizations #3380
Conversation
I tried this PR (commit fa92953) on my laptop with parsl-perf --time 300, and I still see that I get only 42 tasks/second near the start of a run. This is with around 75000 tasks in the queue at the start. I think that's not a noticeable change for this particular measurement compared to the current cctools master commit. here's a plot of TASK RUNNING events for that run. (as I've mentioned before, ignore spikes near the beginning, as that is parsl-perf calibrating itself) |
@benclifford This change improved my performance in the ranges of 10-20k tasks in queue. But yes it still does not hold up in the larger queue sizes. I believe the remaining issue is in the way we retrieve tasks. |
Task retrieval is shaped up a bit, now the suspect seems to be logging overhead as seen in the profile on #3284 |
Looks like this is making progress but maybe not ready to merge yet. |
@dthain Before this is done I wanted to hear your thoughts on adding some sort of "high throughput" mode which disables the performance log. |
Some quick thoughts -
|
@dthain It is not necessarily writing the performance log that is expensive, rather collecting the information about waiting/ready tasks when the queue is large. We do an unnecessarily expensive operation collecting that information which could perhaps be reduced to a single full iteration of the task list, which will still present a problem as the queue size grows. This is also pertaining to work queue specifically at the moment. Even if the performance log is not specified we still collect the information before we break out of |
@colinthomas-z80 An alternative is to increase the interval at which stats are written (e.g in work_queue.c change to 15*ONE_SECOND), and remove the force write in line 7090, (change to log_queue_stats(q, 0), and remove the enclosing if). |
Ok, so our guiding principles here are:
So, if the underlying problem is that we are inefficiently iterating over the entire ready queue in order to count task states when writing the performance lag, then I think we should do the following: 1 - Make a few safe and easy modifications to WQ: just traverse the list once, and do it less frequently. |
Regarding point 2, I think that's how it was in wq 1, and why we changed it to the current implementation in wq 2. As code evolved, it became really hard to maintain the state twice (i.e., the real state, and the stats state). I think point 1 (iterating the list once and doing it less frequently) may go a long way. |
Let's proceed with doing the easy thing for WQ and see how far it gets us. I'm willing to try the more complex thing for TV if it necessary to meet our performance goals. |
Ready to merge if there are no further concerns |
What's the performance effect of this change? |
Ok, how much does this help performance? |
@dthain Although it is subjective depending on the hardware, This gives me a peak around 500 tasks/sec with 12k tasks in the queue. At queue sizes up to 60k-70k I will have around 200 tasks/sec. The most I have tested is 110k tasks which resulted in a 160 tasks/sec average. So the drop off is much less severe than before. Before we were seeing 300 ish tasks/sec with 10k in the queue, and less than 100 tasks/sec with 20k+ in the queue. Once I start working on taskvine I hope to have some more visual depictions of the performance. |
This PR
|
Compared to cctools current
|
Looking good! Ready to merge now? |
Yes I believe so |
No description provided.