Skip to content
This repository has been archived by the owner on Nov 1, 2021. It is now read-only.

Sometimes, torch-ipc cannot start successfully #26

Open
chienlinhuang1116 opened this issue Jul 5, 2016 · 1 comment
Open

Sometimes, torch-ipc cannot start successfully #26

chienlinhuang1116 opened this issue Jul 5, 2016 · 1 comment

Comments

@chienlinhuang1116
Copy link

Hi,

I want to run 6 GPUs which will start 6 luajit jobs. However, the system only starts 5 GPUs sometimes. Currently, I will restart the training at this moment. Do you have any idea?

Thank you,
Chien-Lin

@chienlinhuang1116
Copy link
Author

Hi,

We found the reason is because of "/ipc/DiscoveredTree.lua:15: ERROR: (/home/chienh/big/twitter/torch-ipc/src/cliser.c, 318): (9, Bad file descriptor)".

And, this error only happens when the server is busy on other jobs. Do you have any idea?

Thank you,
Chien-Lin

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant