Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance gain degradation in GpuKMeans vs Spark ml KMeans API #78

Open
a-agrz opened this issue Feb 20, 2018 · 2 comments
Open

Performance gain degradation in GpuKMeans vs Spark ml KMeans API #78

a-agrz opened this issue Feb 20, 2018 · 2 comments

Comments

@a-agrz
Copy link

a-agrz commented Feb 20, 2018

Hello!

I used the example of GpuKMeans to write a KMeans application over MNIST dataset that contains 60000 images, on the other hand, I used org.apache.spark.ml.clustering API over the same dataset in order to compare performance...

this is what I got as results for 20 iterations:
CPU: 16.54 s
GPU: 41,1 s
It's 2.48 slowdown !!!*

So how can I make KMeans application run faster on GPU ?

Ps: those results are obtained on spark-shell fired over 40 cores and M60 GPU

Best regards

aguerzaa

@josiahsams
Copy link
Member

Are you using mini-batches during training ?
GPUEnabler works best when it can contain the entire dataset in GPU memory and cacheGpu is used between transformations just to keep the data movement between GPU & CPU to the minimal.
Data movement are quite costly and it can easily overtake the computational gain we get from running on GPU.

@a-agrz
Copy link
Author

a-agrz commented Feb 21, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants