You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Much of the work on synchronization (#13) between command groups (latest commit 654ed42) didn't produce the expected results, as due to a lack of understanding, kernels weren't launched asynchronously, so synchronization did not help.
I wanted to do this with OoO queue execution, but I don't think I can really on that, as the OpenCL implementation is apparently optional. A better course of action seems to be to use a different queue for each command group. In most cases one command group contains one kernel, so this ensures asynchronous execution. What we then also need is synchronization between command groups, which is already in pretty good condition.
The text was updated successfully, but these errors were encountered:
This is proving hard to test. So far I've only been successful by running test4, but first disabling debugging (92bd424), setting N to 32M, running on the GPU and running for instances of the program at the same time. I tested it first by setting buffers to block and then not to block.
I guess this is because even 128MB (32M *4B) is still quite fast to copy.
As hinted to in previous comment, the latest commit on asynchronous kernels (19fd892) seems to do the trick, kernels get executed concurrently, even though the current tests aren't able to really demonstrate it. Need new tests, but as far as the issue is concerned, the goal was reached.
Much of the work on synchronization (#13) between command groups (latest commit 654ed42) didn't produce the expected results, as due to a lack of understanding, kernels weren't launched asynchronously, so synchronization did not help.
I wanted to do this with OoO queue execution, but I don't think I can really on that, as the OpenCL implementation is apparently optional. A better course of action seems to be to use a different queue for each command group. In most cases one command group contains one kernel, so this ensures asynchronous execution. What we then also need is synchronization between command groups, which is already in pretty good condition.
The text was updated successfully, but these errors were encountered: