Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eGPU not enabled after running script #6

Open
Faraclas opened this issue Dec 30, 2022 · 13 comments
Open

eGPU not enabled after running script #6

Faraclas opened this issue Dec 30, 2022 · 13 comments

Comments

@Faraclas
Copy link

I have an RTX 3060 Ti inside a Razer Core X enclosure. This is working great (dual)booting into windows, however there is no display in linux [gentoo, systemd, gnome, waylad].

I took a look at the service status:

○ all-ways-egpu.service - Configure eGPU as primary under Wayland desktops
     Loaded: loaded (/etc/systemd/system/all-ways-egpu.service; enabled; preset: disabled)
     Active: inactive (dead) since Fri 2022-12-30 10:27:22 EST; 39s ago
   Duration: 75ms
    Process: 1000 ExecStart=all-ways-egpu boot (code=exited, status=0/SUCCESS)
   Main PID: 1000 (code=exited, status=0/SUCCESS)
        CPU: 29ms

Dec 30 10:27:22 gentoo systemd[1]: Started Configure eGPU as primary under Wayland desktops.
Dec 30 10:27:22 gentoo all-ways-egpu[1011]: find: ‘/sys/class/drm/card[0-9]*/card[0-9]*-*/../device/driver’: No such file or directory
Dec 30 10:27:22 gentoo all-ways-egpu[1000]: No eGPU detected
Dec 30 10:27:22 gentoo all-ways-egpu[1014]: /usr/bin/all-ways-egpu: line 199: echo: write error: No such device
Dec 30 10:27:22 gentoo all-ways-egpu[1014]: /usr/bin/all-ways-egpu: line 205: /sys/bus/pci/drivers/i915/unbind: Permission denied
Dec 30 10:27:22 gentoo all-ways-egpu[1014]: /usr/bin/all-ways-egpu: line 206: /sys/bus/pci/devices/0000:0000:00:02.0/remove: No such file or directory
Dec 30 10:27:22 gentoo all-ways-egpu[1014]: /usr/bin/all-ways-egpu: line 212: echo: write error: No such device
Dec 30 10:27:22 gentoo systemd[1]: all-ways-egpu.service: Deactivated successfully.

The first error about not being able to find: --> No such file or directory, I verfied the following files exist:

elias@gentoo ~ $ ls /sys/class/drm/card0/card0-DP-3/device/device/driver/module/drivers/
pci:i915
elias@gentoo ~ $ ls /sys/class/drm/card1/card1-DP-5/device/device/driver/module/drivers/
pci:nvidia  pci:nvidia-nvswitch

I am happy to help debug etc to get this working, Thank you for your scripts!

@ewagner12
Copy link
Owner

Hi I'm on holiday away from my eGPU until January 3rd so I'll be able to test this more after that, but just from your description it seems like a similar issue to #5 where the glob is not expanding properly. Are you using bash or a different shell?

@Faraclas
Copy link
Author

Faraclas commented Dec 31, 2022 via email

@Faraclas
Copy link
Author

Faraclas commented Jan 3, 2023

After reading issue #5 , I took a look at my script and it seems that change is already included.

Line 229:
`
EGPU_DETECT=0

for CARD in $(lspci -d ::0300 | cut -c -7); do
	set -- /sys/bus/pci/devices/0000:"$CARD"
	for BOOT_VGA_PATH in "$@"; do
		if grep -q "$CARD" < "$USER_IDS_DIR"/egpu-bus-ids; then
			echo "$BOOT_VGA_PATH"  | tee -a "$USER_IDS_DIR"/bind-paths
			mount -n --bind -o ro "$USER_IDS_DIR"/1  "$BOOT_VGA_PATH"/boot_vga
			EGPU_DETECT=1
		else
			if grep -q "1" < "${BOOT_VGA_PATH}"/boot_vga; then
				echo "$BOOT_VGA_PATH"  | tee -a "$USER_IDS_DIR"/bind-paths
				mount -n --bind -o ro "$USER_IDS_DIR"/0 "$BOOT_VGA_PATH"/boot_vga
			fi
		fi
	done
done

`

@ewagner12
Copy link
Owner

Ok so a couple of things here.

First off, I noticed that the script is trying to find the file "/sys/bus/pci/devices/0000:0000:00:02.0/remove" which isn't working because there's an extra "0000:". Did you use the guided setup or did you manually enter the bus IDs? If you manually enter the ids they should be in a form like "00:02.0".

Second just a note on how this script works, Method 2 is the recommended method and if it works for you, you don't need to setup the internal bus ids to remove. Did you try just setting the eGPU as primary with method 2 and not entering any internal gpu ids to remove?

Lastly, I also took a look at the other issues you're seeing here and I believe I worked out the issues causing the output you're seeing with this part of the script. I'm in the process of testing these on my end to make sure they work correctly and I'll let you know when I push these changes to the github repo.

Hopefully with all of these changes this should fix this issue.

@Faraclas
Copy link
Author

Faraclas commented Jan 5, 2023 via email

@ewagner12
Copy link
Owner

You're correct that you don't need to do anything to undo method 2 and that's all correct on what you should try once the changes are pushed.

One reason method 2 didn't work in the first place could be because the guided setup was giving it the wrong IDs in the first place. To help debug this could you post your output of lspci?

The systemd prompt is expected when you login if you say yes to both prompts during setup. There's 2 different systemd services, one that runs before the display manager starts and is supposed to remove the iGPU and one that runs after the login and can restart the iGPU after login. With the gnome wayland desktop that lets you get a picture on the laptop screen while still keeping the eGPU as primary.

@Faraclas
Copy link
Author

Faraclas commented Jan 5, 2023

$ lspci
0000:00:00.0 Host bridge: Intel Corporation 11th Gen Core Processor Host Bridge/DRAM Registers (rev 01)
0000:00:02.0 VGA compatible controller: Intel Corporation TigerLake-LP GT2 [Iris Xe Graphics] (rev 01)
0000:00:04.0 Signal processing controller: Intel Corporation TigerLake-LP Dynamic Tuning Processor Participant (rev 01)
0000:00:06.0 System peripheral: Intel Corporation RST VMD Managed Controller
0000:00:07.0 PCI bridge: Intel Corporation Tiger Lake-LP Thunderbolt 4 PCI Express Root Port #0 (rev 01)
0000:00:07.2 PCI bridge: Intel Corporation Tiger Lake-LP Thunderbolt 4 PCI Express Root Port #2 (rev 01)
0000:00:08.0 System peripheral: Intel Corporation GNA Scoring Accelerator module (rev 01)
0000:00:0a.0 Signal processing controller: Intel Corporation Tigerlake Telemetry Aggregator Driver (rev 01)
0000:00:0d.0 USB controller: Intel Corporation Tiger Lake-LP Thunderbolt 4 USB Controller (rev 01)
0000:00:0d.2 USB controller: Intel Corporation Tiger Lake-LP Thunderbolt 4 NHI #0 (rev 01)
0000:00:0d.3 USB controller: Intel Corporation Tiger Lake-LP Thunderbolt 4 NHI #1 (rev 01)
0000:00:0e.0 RAID bus controller: Intel Corporation Volume Management Device NVMe RAID Controller
0000:00:12.0 Serial controller: Intel Corporation Tiger Lake-LP Integrated Sensor Hub (rev 20)
0000:00:14.0 USB controller: Intel Corporation Tiger Lake-LP USB 3.2 Gen 2x1 xHCI Host Controller (rev 20)
0000:00:14.2 RAM memory: Intel Corporation Tiger Lake-LP Shared SRAM (rev 20)
0000:00:15.0 Serial bus controller: Intel Corporation Tiger Lake-LP Serial IO I2C Controller #0 (rev 20)
0000:00:15.1 Serial bus controller: Intel Corporation Tiger Lake-LP Serial IO I2C Controller #1 (rev 20)
0000:00:16.0 Communication controller: Intel Corporation Tiger Lake-LP Management Engine Interface (rev 20)
0000:00:19.0 Serial bus controller: Intel Corporation Tiger Lake-LP Serial IO I2C Controller #4 (rev 20)
0000:00:19.1 Serial bus controller: Intel Corporation Tiger Lake-LP Serial IO I2C Controller #5 (rev 20)
0000:00:1c.0 PCI bridge: Intel Corporation Device a0b8 (rev 20)
0000:00:1d.0 PCI bridge: Intel Corporation Device a0b3 (rev 20)
0000:00:1e.0 Communication controller: Intel Corporation Tiger Lake-LP Serial IO UART Controller #0 (rev 20)
0000:00:1f.0 ISA bridge: Intel Corporation Tiger Lake-LP LPC Controller (rev 20)
0000:00:1f.3 Multimedia audio controller: Intel Corporation Tiger Lake-LP Smart Sound Technology Audio Controller (rev 20)
0000:00:1f.4 SMBus: Intel Corporation Tiger Lake-LP SMBus Controller (rev 20)
0000:00:1f.5 Serial bus controller: Intel Corporation Tiger Lake-LP SPI Controller (rev 20)
0000:39:00.0 PCI bridge: Intel Corporation Thunderbolt 4 Bridge [Goshen Ridge 2020] (rev 02)
0000:3a:00.0 PCI bridge: Intel Corporation Thunderbolt 4 Bridge [Goshen Ridge 2020] (rev 02)
0000:3a:01.0 PCI bridge: Intel Corporation Thunderbolt 4 Bridge [Goshen Ridge 2020] (rev 02)
0000:3a:02.0 PCI bridge: Intel Corporation Thunderbolt 4 Bridge [Goshen Ridge 2020] (rev 02)
0000:3a:03.0 PCI bridge: Intel Corporation Thunderbolt 4 Bridge [Goshen Ridge 2020] (rev 02)
0000:3a:04.0 PCI bridge: Intel Corporation Thunderbolt 4 Bridge [Goshen Ridge 2020] (rev 02)
0000:4d:00.0 PCI bridge: Intel Corporation JHL6340 Thunderbolt 3 Bridge (C step) [Alpine Ridge 2C 2016] (rev 02)
0000:4e:01.0 PCI bridge: Intel Corporation JHL6340 Thunderbolt 3 Bridge (C step) [Alpine Ridge 2C 2016] (rev 02)
0000:4f:00.0 VGA compatible controller: NVIDIA Corporation GA104 [GeForce RTX 3060 Ti Lite Hash Rate] (rev a1)
0000:4f:00.1 Audio device: NVIDIA Corporation GA104 High Definition Audio Controller (rev a1)
0000:71:00.0 Network controller: Qualcomm QCA6390 Wireless Network Adapter (rev 01)
0000:72:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS5260 PCI Express Card Reader (rev 01)
10000:e0:06.0 PCI bridge: Intel Corporation 11th Gen Core Processor PCIe Controller (rev 01)
10000:e1:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller PM9A1/PM9A3/980PRO

image

@ewagner12
Copy link
Owner

@Faraclas
The changes were just pushed in commit 7de0f7c
If you could download the latest github version, re-run the setup and see if you have any issues that would be great!

@Faraclas
Copy link
Author

Faraclas commented Jan 6, 2023

@ewagner12 Thank you for the changes. I cloned the repo, ran the install command, and then used the guided setup. When I tried to boot (with the eGPU connected), I got stuck in the boot screen and never made it to the GDM login. I was able to boot into the system with the eGPU powered off.

@ewagner12
Copy link
Owner

Ok can you post the output of the all-way-egpu status command?

@Faraclas
Copy link
Author

Faraclas commented Jan 6, 2023

I can try again. However there are a few things I found out on my system that might make a difference.

  • I am not sure why, but files I tried to add into /etc/udev/rules.d/ do not seem to have any effect.
  • In fact, all of the rules are in /usr/lib/udev/rules.d/ and there is one particular of interest.
  • Most interesting is 61-gdm.rules
    • With the eGPU turned on, it turns out that I am actually logging into X
    • With the eGPU turned off, I am in Wayland
    • Line 55 on seems to be what is switching to X. I checked all of the conditions manually and indeed they would all trigger the jump to gdm_disable_wayland
    • To test, I commented out the RUN+= commands ( lines 131 and 135). When I booted, the display connected to the nvidia card did turn on, but the screen remained blank. Both the laptop screen and another monitor connected to the intel gpu were stuck in what looked like the boot message screen.

@ewagner12
Copy link
Owner

If I had to guess, I would guess that the iGPU is being removed correctly, but the nvidia card is not being picked up by X/Wayland for whatever reason. If that's the case here's some things I would try based on my experience with this:

  • remove the splash kernel parameter
  • choose "n" to the option "Attempt to re-enable these iGPU/initially disabled devices after boot" during setup or don't setup iGPU to be removed
  • try turning on autologin to bypass GDM and go straight to the desktop

@ewagner12
Copy link
Owner

FYI I just pushed commit 618fd62 which improves Method 1 removal reliability and sometimes prevents black screens at least on my end. So you may want to try the latest git again and see if anything changes for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants