You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It is sending it on the device using multiple streams, using chunks of a size that divides them between the streams. Since my matrix has a size of M*N, if we have nstreams streams, you would expect M*N/nstreams. But instead the chunk size is:
int chunklen=m/nstreams;
int chunkrem=m%nstreams;
and we are copying
for (istream=0; istream<nstreams; istream++ ) {
soffset=istream*chunklen;
cudaMemcpyAsync(dev_A+soffset,A+soffset,
chunklen*sizeof(complex_double),cudaMemcpyHostToDevice,stream[istream]);
}
if (chunkrem!=0) {
soffset=istream*chunklen;
cudaMemcpyAsync(dev_A+soffset,A+soffset,
chunkrem*sizeof(complex_double),cudaMemcpyHostToDevice,stream[istream]);
}
The same issue exists when the result is copied from the device to the host.
So, I see two possibilities to fix this:
either the chunk size becomes M*N
or we are sending chunklen*N using an offset equat to istream*chunklen*N
My reproducer and the generated result are attached.
Hello,
When I am using a 2D array of size, say,
M
lines,N
columns, I am declaring it as follows:And my loop is as follows:
The generated code allocates and registers an area of size
M*N
, which is good:It is sending it on the device using multiple streams, using chunks of a size that divides them between the streams. Since my matrix has a size of
M*N
, if we havenstreams
streams, you would expectM*N/nstreams
. But instead the chunk size is:and we are copying
The same issue exists when the result is copied from the device to the host.
So, I see two possibilities to fix this:
M*N
chunklen*N
using an offset equat toistream*chunklen*N
My reproducer and the generated result are attached.
_repro_complex.c.txt
repro_complex.c.txt
The text was updated successfully, but these errors were encountered: