orcuda: working with 2D arrays #42

coti · 2021-10-27T00:29:16Z

Hello,

When I am using a 2D array of size, say, M lines, N columns, I am declaring it as follows:

decl static complex_double A[M*N] = random;

And my loop is as follows:

  for(i=0; i<=m-1; i++)
    for(j=0; j<=n-1; j++) {
        A[i*n+j] *= alpha;
      }

The generated code allocates and registers an area of size M*N, which is good:

  cudaMalloc(&dev_A,M *N*sizeof(complex_double));
  cudaHostRegister(A,M *N*sizeof(complex_double),cudaHostRegisterPortable);

It is sending it on the device using multiple streams, using chunks of a size that divides them between the streams. Since my matrix has a size of M*N, if we have nstreams streams, you would expect M*N/nstreams. But instead the chunk size is:

  int chunklen=m/nstreams;
  int chunkrem=m%nstreams;

and we are copying

  for (istream=0; istream<nstreams; istream++ ) {
    soffset=istream*chunklen;
    cudaMemcpyAsync(dev_A+soffset,A+soffset,
                   chunklen*sizeof(complex_double),cudaMemcpyHostToDevice,stream[istream]);
  }
  if (chunkrem!=0) {
    soffset=istream*chunklen;  
    cudaMemcpyAsync(dev_A+soffset,A+soffset,
                    chunkrem*sizeof(complex_double),cudaMemcpyHostToDevice,stream[istream]);
  }

The same issue exists when the result is copied from the device to the host.

So, I see two possibilities to fix this:

either the chunk size becomes M*N
or we are sending chunklen*N using an offset equat to istream*chunklen*N

My reproducer and the generated result are attached.

_repro_complex.c.txt
repro_complex.c.txt

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

orcuda: working with 2D arrays #42

orcuda: working with 2D arrays #42

coti commented Oct 27, 2021 •

edited

Loading

orcuda: working with 2D arrays #42

orcuda: working with 2D arrays #42

Comments

coti commented Oct 27, 2021 • edited Loading

coti commented Oct 27, 2021 •

edited

Loading