Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is it so much slower to turn on a gpu than not #8589

Open
xzlinux opened this issue Sep 11, 2024 · 1 comment
Open

Why is it so much slower to turn on a gpu than not #8589

xzlinux opened this issue Sep 11, 2024 · 1 comment
Labels
question Question on using Taichi

Comments

@xzlinux
Copy link

xzlinux commented Sep 11, 2024

import taichi as ti

import numpy as np

ti.init(arch=ti.gpu)
benchmark = True
N = 15000
if benchmark:
a_numpy = np.random.randint(0,100,N,dtype=np.int32)
b_numpy = np.random.randint(0,100,N,dtype=np.int32)
else:
a_numpy = np.array([0,1,0,2,4,3,1,2,1],dtype=np.int32)
b_numpy = np.array([4,0,1,4,5,3,1,2],dtype=np.int32)
f = ti.field(dtype=ti.i32,shape=(N+1,N+1))

@ti.kernel
def compute_lcs(a: ti.types.ndarray(),b: ti.types.ndarray()) -> ti.i32:
len_a,len_b = a.shape[0],b.shape[0]
ti.loop_config(serialize=True)
for i in range(1,len_a + 1):
for j in range(1,len_b + 1):
f[i,j] = ti.max(f[i-1,j-1] + (a[i-1] == b[j-1]),ti.max(f[i-1,j],f[i,j -1]))
return f[len_a,len_b]

print(compute_lcs(a_numpy,b_numpy))
image
The following is not to start the cpu
<img width="528" alt="image" src="https://github.com/user-attachments/assets/03d62b88-11b2-4a14-a4b3-

@xzlinux xzlinux added the question Question on using Taichi label Sep 11, 2024
@github-project-automation github-project-automation bot moved this to Untriaged in Taichi Lang Sep 11, 2024
@oliver-batchelor
Copy link
Contributor

Part of the problem seems to be you're trying to write a serial algorithm on the GPU, your outer loop has ti.loop_config(serialize=True). For it to be fast on GPU your outer loop needs to be the parallel one.

Other things which can help (though I don't think are relevant here) is to use data which resides on GPU to avoid transfer, e.g. torch tensor on the device can avoid copying.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Question on using Taichi
Projects
Status: Untriaged
Development

No branches or pull requests

2 participants