Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In multiple GPUs: RuntimeError: vk::PhysicalDevice::createDeviceUnique: ErrorInitializationFailed #20 #47

Open
DelinQu opened this issue Oct 31, 2024 · 7 comments

Comments

@DelinQu
Copy link

DelinQu commented Oct 31, 2024

Hi 👋, I caught the error "RuntimeError: vk::PhysicalDevice::createDeviceUnique: ErrorInitializationFailed #20" when starting the environment in a non-zero device. The simulator works well when using the default CUDA device 0, but it failed in any other devices in 1,2,3,4,5,6,7:

I tried the solution at #20 and haosulab/ManiSkill#73, but it doesn't work for me.
image

@xuanlinli17
Copy link
Collaborator

xuanlinli17 commented Oct 31, 2024

You can try ManiSkill3 version of the Bridge envs. @StoneT2000 will migrate the Google Robot envs later.

I think a fix for ManiSkill2 is to set DISPLAY="" CUDA_VISIBLE_DEVICES=x python {}

@xuanlinli17
Copy link
Collaborator

xuanlinli17 commented Oct 31, 2024

See e.g.., haosulab/ManiSkill#79 (from old ManiSkill2)

@StoneT2000
Copy link
Collaborator

for maniskill 3 the fix should work proposed by xuanlin should work

@DelinQu
Copy link
Author

DelinQu commented Nov 1, 2024

Thanks for your replies, the DISPLAY has already been unset in https://github.com/simpler-env/SimplerEnv/blob/d55e19162be86794875839725fd484b768e25873/simpler_env/main_inference.py#L21C2-L21C31, I have no idea why it doesn't work for me. So I will migrate the environment to maniskiill3, does it cause any difference to the evaluation results, compared with mainiskill2? I must make the evaluation fair.

@xuanlinli17
Copy link
Collaborator

It should be very similar.

@DelinQu
Copy link
Author

DelinQu commented Nov 1, 2024

Weird. Simpler start-up successfully if I set the CUDA_VISIBLE_DEVICES=x,0 python {}. The Memory and utils of CUDA:0 are almost zero, but it's critical for setup maniskill2 environments:

image

@StoneT2000
Copy link
Collaborator

If you only need to measure success / partial success rate in the main study (not the texture randomization ablations, which have not been ported over) I'd just recommend moving to ManiSkill3, it has better support and less bugs related to rendering/gpus.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants