You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Part of the problem seems to be you're trying to write a serial algorithm on the GPU, your outer loop has ti.loop_config(serialize=True). For it to be fast on GPU your outer loop needs to be the parallel one.
Other things which can help (though I don't think are relevant here) is to use data which resides on GPU to avoid transfer, e.g. torch tensor on the device can avoid copying.
import taichi as ti
import numpy as np
ti.init(arch=ti.gpu)
benchmark = True
N = 15000
if benchmark:
a_numpy = np.random.randint(0,100,N,dtype=np.int32)
b_numpy = np.random.randint(0,100,N,dtype=np.int32)
else:
a_numpy = np.array([0,1,0,2,4,3,1,2,1],dtype=np.int32)
b_numpy = np.array([4,0,1,4,5,3,1,2],dtype=np.int32)
f = ti.field(dtype=ti.i32,shape=(N+1,N+1))
@ti.kernel
def compute_lcs(a: ti.types.ndarray(),b: ti.types.ndarray()) -> ti.i32:
len_a,len_b = a.shape[0],b.shape[0]
ti.loop_config(serialize=True)
for i in range(1,len_a + 1):
for j in range(1,len_b + 1):
f[i,j] = ti.max(f[i-1,j-1] + (a[i-1] == b[j-1]),ti.max(f[i-1,j],f[i,j -1]))
return f[len_a,len_b]
print(compute_lcs(a_numpy,b_numpy))
The following is not to start the cpu
<img width="528" alt="image" src="https://github.com/user-attachments/assets/03d62b88-11b2-4a14-a4b3-
The text was updated successfully, but these errors were encountered: