-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPU Memory cost too much with platoon #84
Comments
Is it CPU or GPU memory? How do you see that 4x difference?
How many GPUs are used in parallel?
Normally, it should not use more memory on the GPU. But it could use more
memory on the CPU depending how you use it. Each process/GPU use extra CPU
memory.
…On Thu, Dec 8, 2016 at 3:52 AM, mingxuan ***@***.***> wrote:
I write a neural machine translation system with platoon.
The batch size is 80 and sync every 10 mini-batches.
I found that the memory cost about 4 times larger than the same system
without platoon.
Does someone else have the same experience?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#84>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AALC-wAprY7wXTSZFxX8IQ0OHanPhwNdks5rF3DegaJpZM4LHYb0>
.
|
It's GPU memory. I use the command "nvidia-smi" to see the GPU memory cost. |
I meet the same problem, and it is worse for me to have the "out of memory" error, so my nmt system can not train with platoon at all. Have you finally solved this problem? Thanks for your help. |
The problem may comes from NCCL and pygpu. I find that theano built with NCCL and pygpu cost much more memory than previous version. |
Yes. The more memory cost does caused by the new back-end of Theano. We prefer to use |
I write a neural machine translation system with platoon.
The batch size is 80 and sync every 10 mini-batches.
I found that the memory cost about 4 times larger than the same system without platoon.
Does someone else have the same experience?
I have also test the "lstm" example, which cost about 5GB memory with 16 batch size and 1024 hidden size.
Could some else help me to find the problem?
The text was updated successfully, but these errors were encountered: