Copying values in parallel #277

qalshidi · 2019-05-23T19:18:16Z

I'm trying to have an operator that copies all the values from the i-1 cells to i in parallel. I think there shouldn't be any race conditions violated unless I'm missing something. This is basically what the code looks like.

#pragma omp parallel for schedule(dynamic)
for(uint j = 0; j < ctx.nz(); j++) {
  im1[j] = viennacl::range(inds(0, j, ctx), inds(ctx.nx()-2, j, ctx)+1);
  i[j] = viennacl::range(inds(1, j, ctx), inds(ctx.nx()-1, j, ctx)+1);
}
#pragma omp parallel for
for(uint j = 0; j < ctx.nz(); j++) {
  viennacl::project(xn_im1, i[j], eq) = viennacl::project(xn, im1[j], eq);
}

If I remove the second omp directive it works fine and tests fine, but with it I get NaNs in my matrix. Is it not possible to get something like this done quickly. This is indirectly related to #228 .

Actual code can be found here: https://github.com/qalshidi/comfi/blob/master/operators.cpp

The text was updated successfully, but these errors were encountered:

karlrupp · 2019-05-23T23:24:15Z

I agree that this code looks fine and shouldn't have any race conditions (since xn and xn_im1 are different objects). Which backend are you using? Does the problem show up with the conventional CPU backend? I'm not sure whether CUDA and OpenCL are thread-safe within the same context.

qalshidi · 2019-05-24T14:37:16Z

ocl::current_device().info() outputs:

Device Info:
Name: GeForce GTX 970
Vendor: NVIDIA Corporation
Type: GPU
Available: 1
Max Compute Units: 13
Max Work Group Size: 1024
Global Mem Size: 4234018816
Local Mem Size: 49152
Local Mem Type: 1
Host Unified Memory: 0

qalshidi · 2019-05-24T15:38:07Z

Using the CPU backend, still works perfectly when commenting out the second openmp directive, I get this error with the uncommented:

double free or corruption (top)
ANOM_ABEND auid=1000 uid=1000 gid=1000 ses=1 pid=3154 comm="comfi" exe="/home/qusai/Documents/Code/comfi/build-comfi-Desktop-Debug/comfi/comfi" sig=11 res=1
11:21:44: The program has unexpectedly finished.

I think maybe I must let you know that I am using column major dense matrices. The debugger stops here before segfault (SIGSEGV):

/** @brief A tag for column-major storage of a dense matrix. */
struct column_major
{
  typedef column_major_tag         orientation_category;

  /** @brief Returns the memory offset for entry (i,j) of a dense matrix.
  *
  * @param i   row index
  * @param j   column index
  * @param num_rows  number of entries per row (including alignment)
  */
  static vcl_size_t mem_index(vcl_size_t i, vcl_size_t j, vcl_size_t num_rows, vcl_size_t /* num_cols */)
  {
    return i + j * num_rows;
  }
};

At the return statement. Changing to row major does not help. I have not tried CUDA backend because I have custom OpenCL kernels elsewhere that I need for things like element_max().

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Copying values in parallel #277

Copying values in parallel #277

qalshidi commented May 23, 2019 •

edited

Loading

karlrupp commented May 23, 2019

qalshidi commented May 24, 2019 •

edited

Loading

qalshidi commented May 24, 2019 •

edited

Loading

Copying values in parallel #277

Copying values in parallel #277

Comments

qalshidi commented May 23, 2019 • edited Loading

karlrupp commented May 23, 2019

qalshidi commented May 24, 2019 • edited Loading

qalshidi commented May 24, 2019 • edited Loading

qalshidi commented May 23, 2019 •

edited

Loading

qalshidi commented May 24, 2019 •

edited

Loading

qalshidi commented May 24, 2019 •

edited

Loading