Scaling down PRIMAL2 for testing #12

JacksonArthurClark · 2023-06-23T21:00:00Z

I'm currently running a system with Ubuntu 22, an i7 8700, 16gb of RAM, and a GTX 1080 with 8gb of VRAM. I keep running into this issue:
ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[1024,2048] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cpu [[{{node gradients_1/global/qvalues/rnn/while/basic_lstm_cell/MatMul_grad/MatMul_1}} = MatMul[T=DT_FLOAT, transpose_a=true, transpose_b=false, _device="/job:localhost/replica:0/task:0/device:CPU:0"](gradients_1/global/qvalues/rnn/while/basic_lstm_cell/MatMul_grad/MatMul_1/StackPopV2, gradients_1/global/qvalues/rnn/while/basic_lstm_cell/split_grad/concat)]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

I've tried using less resources in the python ray.init() function and have tried turning down NUM_META_AGENTS but it still seems to be too much for my system.

I will eventually be deploying on a much larger system, but in order to test my changes I'd like to have it running on a small scale before deploying since compute time is expensive.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scaling down PRIMAL2 for testing #12

Scaling down PRIMAL2 for testing #12

JacksonArthurClark commented Jun 23, 2023

Scaling down PRIMAL2 for testing #12

Scaling down PRIMAL2 for testing #12

Comments

JacksonArthurClark commented Jun 23, 2023