Performance gain degradation in GpuKMeans vs Spark ml KMeans API #78

a-agrz · 2018-02-20T08:44:16Z

Hello!

I used the example of GpuKMeans to write a KMeans application over MNIST dataset that contains 60000 images, on the other hand, I used org.apache.spark.ml.clustering API over the same dataset in order to compare performance...

this is what I got as results for 20 iterations:
CPU: 16.54 s
GPU: 41,1 s
It's 2.48 slowdown !!!*

So how can I make KMeans application run faster on GPU ?

Ps: those results are obtained on spark-shell fired over 40 cores and M60 GPU

Best regards

aguerzaa

josiahsams · 2018-02-21T07:18:27Z

Are you using mini-batches during training ?
GPUEnabler works best when it can contain the entire dataset in GPU memory and cacheGpu is used between transformations just to keep the data movement between GPU & CPU to the minimal.
Data movement are quite costly and it can easily overtake the computational gain we get from running on GPU.

a-agrz · 2018-02-21T08:33:56Z

No all data is first cached in GPU because all the data can be fit in GPU memory of 7 Gb and cacheGpu is used between transformations just like the example *GpuKMeans.scala* Best regards aguerzaa

…

On Wed, Feb 21, 2018 at 8:18 AM, Josiah Samuel ***@***.***> wrote: Are you using mini-batches during training ? GPUEnabler works best when it can contain the entire dataset in GPU memory and cacheGpu is used between transformations just to keep the data movement between GPU & CPU to the minimal. Data movement are quite costly and it can easily overtake the computational gain we get from running on GPU. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#78 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AQRvaHqT7fXDvtbotaZIfIuR_SWuLNVMks5tW8NEgaJpZM4SLl0h> .

-- ==================================================== Embedded Software Engineer. Abdallah AGUERZAME ENSIMAG INP-Grenoble Mob: + 33(0)7 81 11 65 88 ====================================================

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance gain degradation in GpuKMeans vs Spark ml KMeans API #78

Performance gain degradation in GpuKMeans vs Spark ml KMeans API #78

a-agrz commented Feb 20, 2018

josiahsams commented Feb 21, 2018

a-agrz commented Feb 21, 2018 via email

Performance gain degradation in GpuKMeans vs Spark ml KMeans API #78

Performance gain degradation in GpuKMeans vs Spark ml KMeans API #78

Comments

a-agrz commented Feb 20, 2018

josiahsams commented Feb 21, 2018

a-agrz commented Feb 21, 2018 via email