orcuda: working on submatrices #43

coti · 2021-10-27T00:45:39Z

Hello,

I have a 2D matrix of size M*LDA and I want to work on a submatrix of size M*N.

My matrix is declared as follows:

    decl static complex_double A[M*LDA] = random;

and my loop is:

  for(i=0; i<=m-1; i++)
    for(j=0; j<=n-1; j++) {
        A[i*lda+j] *= alpha;
      }

The generated code allocates an area of size M*LDA and copies all the memory from the host to the device (when we are using a single stream), which is good:

  cudaMalloc(&dev_A,M *LDA*sizeof(complex_double));
  cudaMemcpy(dev_A,A,M *LDA*sizeof(complex_double),cudaMemcpyHostToDevice);

However, the generated kernel does not pass this size parameter, although it is using:

__global__ void orcu_kernel161(const int n, const int m, double alpha, complex_double* A) {
  const int tid=blockIdx.x*blockDim.x+threadIdx.x;
  const int gsize=gridDim.x*blockDim.x;
  int j;
  for (int i=tid; i<=m-1; i+=gsize) {
    for (j=0; j<=n-1; j++ ) {
      A[i*lda+j]=A[i*lda+j]*alpha;
    }
  }
}

Therefore, the code cannot compile.

My reproducer and some generated code are attached.

repro_submat_complex.c.txt
__orio_perftest1.cu.txt

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

orcuda: working on submatrices #43

orcuda: working on submatrices #43

coti commented Oct 27, 2021

orcuda: working on submatrices #43

orcuda: working on submatrices #43

Comments

coti commented Oct 27, 2021