You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The generated code allocates an area of size M*LDA and copies all the memory from the host to the device (when we are using a single stream), which is good:
However, the generated kernel does not pass this size parameter, although it is using:
__global__ void orcu_kernel161(const int n, const int m, double alpha, complex_double* A) {
const int tid=blockIdx.x*blockDim.x+threadIdx.x;
const int gsize=gridDim.x*blockDim.x;
int j;
for (int i=tid; i<=m-1; i+=gsize) {
for (j=0; j<=n-1; j++ ) {
A[i*lda+j]=A[i*lda+j]*alpha;
}
}
}
Therefore, the code cannot compile.
My reproducer and some generated code are attached.
Hello,
I have a 2D matrix of size
M*LDA
and I want to work on a submatrix of sizeM*N
.My matrix is declared as follows:
and my loop is:
The generated code allocates an area of size
M*LDA
and copies all the memory from the host to the device (when we are using a single stream), which is good:However, the generated kernel does not pass this size parameter, although it is using:
Therefore, the code cannot compile.
My reproducer and some generated code are attached.
repro_submat_complex.c.txt
__orio_perftest1.cu.txt
The text was updated successfully, but these errors were encountered: