-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Copying values in parallel #277
Comments
I agree that this code looks fine and shouldn't have any race conditions (since |
ocl::current_device().info() outputs:
|
Using the CPU backend, still works perfectly when commenting out the second openmp directive, I get this error with the uncommented:
I think maybe I must let you know that I am using column major dense matrices. The debugger stops here before segfault (SIGSEGV):
At the return statement. Changing to row major does not help. I have not tried CUDA backend because I have custom OpenCL kernels elsewhere that I need for things like element_max(). |
I'm trying to have an operator that copies all the values from the i-1 cells to i in parallel. I think there shouldn't be any race conditions violated unless I'm missing something. This is basically what the code looks like.
If I remove the second omp directive it works fine and tests fine, but with it I get NaNs in my matrix. Is it not possible to get something like this done quickly. This is indirectly related to #228 .
Actual code can be found here: https://github.com/qalshidi/comfi/blob/master/operators.cpp
The text was updated successfully, but these errors were encountered: