Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ParallelUnbalancedWork for efficient unbalanced parallel loops #7787

Open
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

benaadams
Copy link
Member

@benaadams benaadams commented Nov 21, 2024

Changes

  • Added the ParallelUnbalancedWork class to efficiently execute parallel loops, handling unbalanced workloads.
  • Implemented static For methods to support parallel execution with and without thread-local data, including initialization and finalization functions.
  • Utilized thread pooling and a shared counter SharedCounter to distribute iterations among threads dynamically.
  • Aimed to optimize performance in scenarios where the workload per iteration is uneven, ensuring better resource utilization and reduced execution time; rather than the main thread .Waiting for background threads to complete
  • Less allocations than Parallel.For
Method Mean Error StdDev Ratio Allocated Alloc Ratio
ParallelFor 543.9 ms 12.13 ms 18.15 ms 1.00 19.21 KB 1.00
ParallelForEach 488.5 ms 12.73 ms 17.43 ms 0.90 23.12 KB 1.20
UnbalancedParallel 413.4 ms 1.15 ms 1.73 ms 0.76 5.00 KB 0.26

Types of changes

What types of changes does your code introduce?

  • Optimization

Testing

Requires testing

  • No

Copy link
Member

@LukaszRozmej LukaszRozmej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you provide some kind of benchmark?

- Introduced the `ParallelUnbalancedWork` class to efficiently execute parallel loops over a range of integers, handling unbalanced workloads.
- Added static `For` methods to support parallel execution with and without thread-local data, including initialization and finalization functions.
- Utilized thread pooling and a shared counter (`SharedCounter`) to distribute iterations among threads dynamically.
- Implemented internal classes (`BaseData`, `Data`, and `InitProcessor<TLocal>`) to manage shared state and thread synchronization.
- Aimed to optimize performance in scenarios where the workload per iteration is uneven, ensuring better resource utilization and reduced execution time.
@benaadams benaadams mentioned this pull request Dec 2, 2024
3 tasks
@benaadams
Copy link
Member Author

Method Mean Error StdDev Ratio Allocated Alloc Ratio
ParallelFor 543.9 ms 12.13 ms 18.15 ms 1.00 19.21 KB 1.00
ParallelForEach 488.5 ms 12.73 ms 17.43 ms 0.90 23.12 KB 1.20
UnbalancedParallel 413.4 ms 1.15 ms 1.73 ms 0.76 5.00 KB 0.26

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants