You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm currently running a system with Ubuntu 22, an i7 8700, 16gb of RAM, and a GTX 1080 with 8gb of VRAM. I keep running into this issue: ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[1024,2048] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cpu [[{{node gradients_1/global/qvalues/rnn/while/basic_lstm_cell/MatMul_grad/MatMul_1}} = MatMul[T=DT_FLOAT, transpose_a=true, transpose_b=false, _device="/job:localhost/replica:0/task:0/device:CPU:0"](gradients_1/global/qvalues/rnn/while/basic_lstm_cell/MatMul_grad/MatMul_1/StackPopV2, gradients_1/global/qvalues/rnn/while/basic_lstm_cell/split_grad/concat)]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
I've tried using less resources in the python ray.init() function and have tried turning down NUM_META_AGENTS but it still seems to be too much for my system.
I will eventually be deploying on a much larger system, but in order to test my changes I'd like to have it running on a small scale before deploying since compute time is expensive.
The text was updated successfully, but these errors were encountered:
I'm currently running a system with Ubuntu 22, an i7 8700, 16gb of RAM, and a GTX 1080 with 8gb of VRAM. I keep running into this issue:
ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[1024,2048] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cpu [[{{node gradients_1/global/qvalues/rnn/while/basic_lstm_cell/MatMul_grad/MatMul_1}} = MatMul[T=DT_FLOAT, transpose_a=true, transpose_b=false, _device="/job:localhost/replica:0/task:0/device:CPU:0"](gradients_1/global/qvalues/rnn/while/basic_lstm_cell/MatMul_grad/MatMul_1/StackPopV2, gradients_1/global/qvalues/rnn/while/basic_lstm_cell/split_grad/concat)]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
I've tried using less resources in the
python ray.init()
function and have tried turning down NUM_META_AGENTS but it still seems to be too much for my system.I will eventually be deploying on a much larger system, but in order to test my changes I'd like to have it running on a small scale before deploying since compute time is expensive.
The text was updated successfully, but these errors were encountered: