Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for GPU acceleration in Windows (only NVIDIA validated) #476

Open
wants to merge 2 commits into
base: humble-devel
Choose a base branch
from

Conversation

dpascualhe
Copy link
Collaborator

@dpascualhe dpascualhe commented Dec 15, 2024

We have been able to achieve GPU acceleration for NVIDIA GPUs in Windows when launching from within WSL2. The user needs to have a valid installation of WSL2 + CUDA and Docker Desktop, and the Docker container must be launched from within the WSL2 terminal. Some extra flags are required in the docker run command:

docker run -it --gpus all -v /usr/lib/wsl:/usr/lib/wsl -e LD_LIBRARY_PATH=/usr/lib/wsl/lib --device /dev/dri -p 7164:7164 -p 6080:6080 -p 1108:1108 -p 7163:7163 jderobot/robotics-academy:latest)

Output from the script:

--- GPU acceleration info ---
GPUs found:
        24d7:00:00.0 3D controller [0302]: Microsoft Corporation Basic Render Driver [1414:008e]
INFO: GPU candidates found at /dev/dri/card0 (keeping /dev/dri/card0).
INFO: 'nvidia-smi' available. Most likely an NVIDIA GPU in WSL.
-----------------------------

Oddly enough, all GPUs seem to be visible within the WSL, but they are disguised as Microsoft devices so the actual vendor information can't be accessed. We'll have to settle with 'Microsoft' vendor for now. The new set_dri_name.sh adds Microsoft vendor as the last resort, checks if there is any card available in /dev/dri, and keeps the first one. It also checks if nvidia-smi can be run from within the container, which would mean that the selected GPU is likely to be NVIDIA. In dual GPU systems I have not been able to access Intel GPUs, so further testing would be required in that regard.

@dpascualhe dpascualhe linked an issue Dec 15, 2024 that may be closed by this pull request
@dpascualhe dpascualhe marked this pull request as ready for review December 15, 2024 20:49
@javizqh
Copy link
Collaborator

javizqh commented Dec 15, 2024

Sadly I cannot test this in Windows. If someone else can test it, it would be appreciated

@dduro2020
Copy link
Collaborator

I have the same problem, I can't test it in Windows

@dpascualhe
Copy link
Collaborator Author

Maybe @codezerro can help?

@codezerro
Copy link

@dpascualhe how can i help you?

@dpascualhe
Copy link
Collaborator Author

dpascualhe commented Dec 17, 2024

@codezerro we need to validate that the new version of the introspection script for GPU selection in this PR is working as expected for Windows environments. Given that it is just a single file, the easier way for me to test it is to simply run the latest RoboticsBackend container, copy the updated file, and then manually launch the entrypoint.

Instructions (all of them must be run from a WSL2 environment with working CUDA drivers; check that you can run nvidia-smi from within the WSL2 environment first):

  1. Start database:
docker run --hostname my-postgres --name academy_db -d -e POSTGRES_DB=academy_db -e POSTGRES_USER=user-dev -e POSTGRES_PASSWORD=robotics-academy-dev -e POSTGRES_PORT=5432 -d -p 5432:5432 jderobot/robotics-database:latest
  1. Once finished, start RoboticsAcademy, but overriding the entrypoint so that we can get access to the container:
docker run --rm -it $(nvidia-smi >/dev/null 2>&1 && echo "--gpus all" || echo "") -v /usr/lib/wsl:/usr/lib/wsl -e LD_LIBRARY_PATH=/usr/lib/wsl/lib --device /dev/dri -p 6080:6080 -p 1108:1108 -p 7163:7163 -p 7164:7164 --link academy_db --entrypoint /bin/bash jderobot/robotics-academy:latest
  1. Copy the updated set_dri_name.sh to the container:
docker cp set_dri_name.sh <robotics backend container id>:set_dri_name.sh
  1. From within the docker container, run the entrypoint:
./entrypoint.sh

Things we need to check:

  • The GPU info logged by the script (first traces when running the entrypoint).
  • Access an exercise with Gazebo (e.g., follow line) and check the GPU status (top right).
  • Run nvidia-smi from another terminal and check that gzserver process is listed (while running follow line).

(@javizqh @dduro2020 maybe this is a weird process? let me know if you think there's a more straightforward approach)

@javizqh
Copy link
Collaborator

javizqh commented Dec 17, 2024

If you are not building a RADI to test, I cannot think of an easier way to test it

@codezerro
Copy link

@dpascualhe let me some time.

@codezerro
Copy link

@dpascualhe first attempt was not very good. I faced some technical issues. I'll update you when I get some results.

@dpascualhe
Copy link
Collaborator Author

@dpascualhe first attempt was not very good. I faced some technical issues. I'll update you when I get some results.

Issues due to the WSL2+CUDA+Docker environment or RoboticsAcademy?

@codezerro
Copy link

@dpascualhe i done it successfully, but some tasks are unclear to me. you can use my system.

@dpascualhe
Copy link
Collaborator Author

From @codezerro PC (Windows 11 + CUDA 12.7 + WSL with Ubuntu 24.04), after following the instructions the above:

  • gzserver and gzclient appear as processes when running nvidia-smi command.
  • MICROSOFT appears as GPU vendor.
  • RT Factor goes up to 1.
  • GPU introspection script output:
-- GPU acceleration info ---
GPUs found:
5dc0:00:00.0 3D controller [0302]: Microsoft Corporation Device [1414:008e]
INFO: GPU candidates found at /dev/dri/card0 (keeping /dev/dri/card0).
INFO: 'nvidia-smi' available. Most likely an NVIDIA GPU in WSL.
GPU selected:
5dc0:00:00.0 3D controller [0302]: Microsoft Corporation Device [1414:008e]
DRI_VENDOR: microsoft
DRI_NAME: card0

A screenshot for further proof:
2025-01-03_10-31

@javizqh , from my side, I consider the changes in this PR sufficiently validated for merging (being aware that our testing pool for different Windows/WSL/CUDA configurations has been small)

@javizqh
Copy link
Collaborator

javizqh commented Jan 4, 2025

Have you checked that it still works fine in Linux @dpascualhe? If so I will merge it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add GPU acceleration support for Windows
4 participants