Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Copying values in parallel #277

Open
qalshidi opened this issue May 23, 2019 · 3 comments
Open

Copying values in parallel #277

qalshidi opened this issue May 23, 2019 · 3 comments

Comments

@qalshidi
Copy link

qalshidi commented May 23, 2019

I'm trying to have an operator that copies all the values from the i-1 cells to i in parallel. I think there shouldn't be any race conditions violated unless I'm missing something. This is basically what the code looks like.

#pragma omp parallel for schedule(dynamic)
for(uint j = 0; j < ctx.nz(); j++) {
  im1[j] = viennacl::range(inds(0, j, ctx), inds(ctx.nx()-2, j, ctx)+1);
  i[j] = viennacl::range(inds(1, j, ctx), inds(ctx.nx()-1, j, ctx)+1);
}
#pragma omp parallel for
for(uint j = 0; j < ctx.nz(); j++) {
  viennacl::project(xn_im1, i[j], eq) = viennacl::project(xn, im1[j], eq);
}

If I remove the second omp directive it works fine and tests fine, but with it I get NaNs in my matrix. Is it not possible to get something like this done quickly. This is indirectly related to #228 .

Actual code can be found here: https://github.com/qalshidi/comfi/blob/master/operators.cpp

@karlrupp
Copy link
Collaborator

I agree that this code looks fine and shouldn't have any race conditions (since xn and xn_im1 are different objects). Which backend are you using? Does the problem show up with the conventional CPU backend? I'm not sure whether CUDA and OpenCL are thread-safe within the same context.

@qalshidi
Copy link
Author

qalshidi commented May 24, 2019

ocl::current_device().info() outputs:

Device Info:
Name: GeForce GTX 970
Vendor: NVIDIA Corporation
Type: GPU
Available: 1
Max Compute Units: 13
Max Work Group Size: 1024
Global Mem Size: 4234018816
Local Mem Size: 49152
Local Mem Type: 1
Host Unified Memory: 0

@qalshidi
Copy link
Author

qalshidi commented May 24, 2019

Using the CPU backend, still works perfectly when commenting out the second openmp directive, I get this error with the uncommented:

double free or corruption (top)
ANOM_ABEND auid=1000 uid=1000 gid=1000 ses=1 pid=3154 comm="comfi" exe="/home/qusai/Documents/Code/comfi/build-comfi-Desktop-Debug/comfi/comfi" sig=11 res=1
11:21:44: The program has unexpectedly finished.

I think maybe I must let you know that I am using column major dense matrices. The debugger stops here before segfault (SIGSEGV):

/** @brief A tag for column-major storage of a dense matrix. */
struct column_major
{
  typedef column_major_tag         orientation_category;

  /** @brief Returns the memory offset for entry (i,j) of a dense matrix.
  *
  * @param i   row index
  * @param j   column index
  * @param num_rows  number of entries per row (including alignment)
  */
  static vcl_size_t mem_index(vcl_size_t i, vcl_size_t j, vcl_size_t num_rows, vcl_size_t /* num_cols */)
  {
    return i + j * num_rows;
  }
};

At the return statement. Changing to row major does not help. I have not tried CUDA backend because I have custom OpenCL kernels elsewhere that I need for things like element_max().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants