-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nvidia passthrough not working for TrueNAS 24.04.0 (Dragonfish) #127
Comments
I had the same problem, @dalgibbard's solution fixed it for me. I did an |
Glad to hear you were able to get it working @dalgibbard. I found the same error message mentioned in: NVIDIA/libnvidia-container#224. But I'm leaving investigating the root cause and implementing a fix for this issue up to the community. I don't have an nvidia GPU in my system and I'm not looking forward to another blind, back-and-forth, trial and error fixing process akin to #4. |
i still couldn't get immich to use graphics to find out the stuff jlmkr edit docker made changes, saved, jlmkr restart docker
when trying to deploy immich i get this *noticed when i had this it wouldn't start the docker jail. so i removed it from the config and it worked again |
You shouldn't ened all these bind mounts, the only one you need is |
ty will update accordingly |
I created a new jail with nvidia gpu passthrough enabled and mounted the suggested directory as
When I attempt to use the card in Plex, I get the following errors:
I did
Here is the
Jail config:
Plex docker compose file:
I can see the library is there. Any ideas? |
Did you install the Nvidia runtime and configure Plex to use it? |
Yes, just got it working. I missed setting the NVIDIA_* vars in my env file. After that it fires up. So, in addition to above, I had to do the following:
Working Plex compose.yml:
Working
|
moogle here. glad it helped gratz :} |
Thank you for pointing me in the right direction. 😊 |
np by the way may i ask how you got ffmpeg installed? i need that for my jellyfin setup to get it to work. jailmaker/jails/docker/rootfs/usr/lib/jellyfin-ffmpeg/ffmpeg this location i think |
I'll take a look tomorrow. I'll grab the compose file and the docker image and spin it up and see if I can get it to recognize. |
np. i would really appreciate this when you are able to do so. i think the graphics card part is setup correctly but it's how to install ffmpeg i don't know when taking into account jailmaker. In jailmaker we have this /mnt/tank/jailmaker/jails/docker/rootfs/usr/lib in jellyfin the ffmpeg points to
but that jellyfin-ffmpeg/ffmpeg does not exist there at that location. so i thought i'd have to go jlmkr shell docker then install the repo and the custom jellyfin ffmpeg, but i don't know how. I'm using debian bookworm for jailmaker docker. |
You shouldn't really be modifying the contents of a jail from outside the jail itself, nor should you be modifying the contents of a docker container image from outside the container itself - it's just asking for trouble. Which Docker image are you using for Jellyfin? I can see there's 3 available and I'd be surprised if they don't come with ffmpeg already installed, especially as jellyfin has its own fork. Looking at the |
i use linuxserver jellyfin' but when i enable nvidia hardware acceleration it doesn't work. i think ffmpeg doesn't get detected so it exits. my error message |
How do you do this?
|
tried adding this to environment but still didn't work
|
I think you're going down the wrong track. The error message you linked to doesn't say FFMPEG isn't present, it says FFMPEG exited with an error (Possibly due to transcoding but not necessarily). I am 99.99% certain that FFMPEG is installed and isn't the issue. So you tried
You should get the same result, it's just running EDIT: Looking at your docker compose script, you don't appear to have set
Something like that. You also don't need all that stuff about reserving the GPU, either. You can remove all of the stuff in |
ty Neo. Now i have a better angle to look at. I used dockge entered bash, did the command but got this
did a google went to jlmkr shell docker then docker exec -it bash which is pretty much the same as the dockge earlier. not sure how to sh into the container to be able to run that command |
No no, the command is to be run on the jail, not within the container. It'll spin up a new container that only displays that output. So 'jlmkr shell docker' then 'sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi' |
but i think my jail for docker uses debian. so isn't ubuntu incorrect? confused i deployed docker using the debian docker template
ya thought so. that doesn't look right since i'm not using ubuntu |
The container image is based off Ubuntu, that's normal in the docker world. It's fine. Containers can be based off of other operating systems like arch and will happily run on any docker host. |
well after running that it says this
|
Okay good, so update your docker compose file to add |
how to add runtime: nvidia ? is that under environment? I use dockge so runtime=nvidia ? |
I did a search i found this
I'll try |
yeah I did give an example in this post but you might have missed it. |
Tried didn't work still. went to look at other stuff to check troubleshoot
hm... |
Nice! I don't know that you need the runtime because it's defined in the docker daemon config, but it can't hurt to have it.
Just to be safe, I did add it to my compose.yml, but it didn't affect it either way. The big one is the |
mine is same like yours in that location
thx for the update |
Would be great if anyone could focus on getting nvidia passthrough working again without having to do workarounds and provide a PR because this issue is getting too long IMHO. 😅 |
Lol yeah, can we take the unrelated issues offline? |
I have a custom config for nvidia that I'm testing now. I'll push it soon with example instructions how to deploy. |
I can confirm that the config is working out of the box with Nvidia. I just created a jail and booted Plex on it and hw transcoding is working. I'll submit it shortly. |
This is the PR, I'm going to work on adding documentation now as well. |
@dasunsrule32 reports not running into the "file too short" error on SCALE 24.04 when using the pre-release version of jailmaker from the develop branch: I don't think I added anything in particular to fix the "file too short" error though... |
@Jip-Hop I'll get a PR raised in a bit; it's caused by the mounts for Nvidia common; unrelated to the template/init changes etc. TLDR; the current method for locating the Nvidia modules to bind mount should use the parent dir for the nvidia-common folder, else it misses some of the libs/modules. ie. What this issue was actually about all along, before people went on huge tangents 🤣 |
But isn't it strange the "file too short" error doesn't occur for some users on SCALE 24.04? |
The old nvidia issue may also provide some helpful insights as to how the current method of passing through came to be. Specifically this comment: #4 (comment). |
It'll depend on the application being run I suspect (My application is cuda based for example, so needs certain libs that might not be mounted, compared to say, nvenc.), plus any manual implementation they have already added. I myself can attest to multiple custom bind mounts already :) |
PR for review: #165 |
I suspect that this is valid in places where running the Nvidia container runtime, but not when running GPU workloads directly in the jail? If we're already mounting a few files from that dir anyway, I don't see any harm in extending that to be the directory personally. FWIW, I'm running both container-based and non-container based workloads with this change, and both are working correctly. |
Re-raised PR against develop branch: #166 |
This is a good approach. I might suggest using |
Yeah... I guess. But in this case we're iterating through like 10 lines of output- I figured the grep implementation is more readable than awk here. I could test perf difference, but given the volume of text, it's very likely to be minimal, if any. Equivalent awk implementation would look like this I guess:
Which is... Incomprehensible to most lol |
I don't have time to look at this properly this week but I prefer a pure python implementation. I think we're pretty close to solving this cleanly :) |
"Pure python" implementation is possible (ignoring the subprocess calls, but at least getting rid of the 'shell' requirement), by pre-running the subtractive command and storing that, and then subtracting it from the existing list. Since that's the preference, I'll sort this in a bit :) |
Is anyone able to test and confirm that this issue is fixed in: https://github.com/dalgibbard/jailmaker/blob/issue-127-nvidia-passthrough/jlmkr.py? I'd like to test it myself but can't without nvidia GPU. Would be great to get additional confirmation before creating a new release. Please create a new jail with nvidia passthrough enabled. Don't manually add any additional mounts. Thanks again @dalgibbard for reporting this issue and providing a PR. |
So I just gave this a test and hit an error even starting the jail -
I used the v1.4.1 version of the script in the branch you linked and the docker template from the same branch. The only thing I changed in the template was setting However, I am not sure if this is related to the nvidia changes as I tried again another new jail using the v1.4.1 script, without setting
EDIT: I reverted back to v1.2.1 of jlmkr and it created a jail just fine. Let me see where the breaking change came in. |
This is a sysctl setting that needs tweaked. https://forums.truenas.com/t/linux-jails-sandboxes-containers-with-jailmaker/417/142?u=dasunsrule32 |
Ahhh, that's useful to know! I'll make that tweak and try again. |
Yup, can confirm the v1.4.1 script works a treat! I installed the jail using the docker template with I then shelled into the jail and ran I then ran |
Confirming all these steps worked for me too! |
* Fix Nvidia Passthrough closing #127 * Mount libraries parent directory * Use the dynamic library path from the existing code
Anyone here able to help debug issue |
Latest version of jailmaker (1.1.5)
As per title; in Dragonfish 24.04.0, Nvidia passthrough seems to be broken --
nvidia-smi
working fine on host, but inside container it gives:Seems the script uses
nvidia-container-cli list
to find nvidia files which need mounting, but container expects files outside of this:Note that this list doesn't include the file the container is expecting.
Adding a manual mount to my jail's config for
--bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current
resolved it though; not sure if that's a good idea or not, but it works at least :)The text was updated successfully, but these errors were encountered: