Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scaling down PRIMAL2 for testing #12

Open
JacksonArthurClark opened this issue Jun 23, 2023 · 0 comments
Open

Scaling down PRIMAL2 for testing #12

JacksonArthurClark opened this issue Jun 23, 2023 · 0 comments

Comments

@JacksonArthurClark
Copy link

I'm currently running a system with Ubuntu 22, an i7 8700, 16gb of RAM, and a GTX 1080 with 8gb of VRAM. I keep running into this issue:
ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[1024,2048] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cpu [[{{node gradients_1/global/qvalues/rnn/while/basic_lstm_cell/MatMul_grad/MatMul_1}} = MatMul[T=DT_FLOAT, transpose_a=true, transpose_b=false, _device="/job:localhost/replica:0/task:0/device:CPU:0"](gradients_1/global/qvalues/rnn/while/basic_lstm_cell/MatMul_grad/MatMul_1/StackPopV2, gradients_1/global/qvalues/rnn/while/basic_lstm_cell/split_grad/concat)]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

I've tried using less resources in the python ray.init() function and have tried turning down NUM_META_AGENTS but it still seems to be too much for my system.

I will eventually be deploying on a much larger system, but in order to test my changes I'd like to have it running on a small scale before deploying since compute time is expensive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant