Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

orcuda: working with 2D arrays #42

Open
coti opened this issue Oct 27, 2021 · 0 comments
Open

orcuda: working with 2D arrays #42

coti opened this issue Oct 27, 2021 · 0 comments

Comments

@coti
Copy link
Collaborator

coti commented Oct 27, 2021

Hello,

When I am using a 2D array of size, say, M lines, N columns, I am declaring it as follows:

decl static complex_double A[M*N] = random;

And my loop is as follows:

  for(i=0; i<=m-1; i++)
    for(j=0; j<=n-1; j++) {
        A[i*n+j] *= alpha;
      }

The generated code allocates and registers an area of size M*N, which is good:

  cudaMalloc(&dev_A,M *N*sizeof(complex_double));
  cudaHostRegister(A,M *N*sizeof(complex_double),cudaHostRegisterPortable);

It is sending it on the device using multiple streams, using chunks of a size that divides them between the streams. Since my matrix has a size of M*N, if we have nstreams streams, you would expect M*N/nstreams. But instead the chunk size is:

  int chunklen=m/nstreams;
  int chunkrem=m%nstreams;

and we are copying

  for (istream=0; istream<nstreams; istream++ ) {
    soffset=istream*chunklen;
    cudaMemcpyAsync(dev_A+soffset,A+soffset,
                   chunklen*sizeof(complex_double),cudaMemcpyHostToDevice,stream[istream]);
  }
  if (chunkrem!=0) {
    soffset=istream*chunklen;  
    cudaMemcpyAsync(dev_A+soffset,A+soffset,
                    chunkrem*sizeof(complex_double),cudaMemcpyHostToDevice,stream[istream]);
  }

The same issue exists when the result is copied from the device to the host.

So, I see two possibilities to fix this:

  • either the chunk size becomes M*N
  • or we are sending chunklen*N using an offset equat to istream*chunklen*N

My reproducer and the generated result are attached.

_repro_complex.c.txt
repro_complex.c.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant