Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Haiku OS guest support #61

Open
10 tasks done
LekKit opened this issue Dec 3, 2022 · 183 comments
Open
10 tasks done

Haiku OS guest support #61

LekKit opened this issue Dec 3, 2022 · 183 comments
Labels
discussion Debate for improvement enhancement New feature or request help wanted Extra attention is needed

Comments

@LekKit
Copy link
Owner

LekKit commented Dec 3, 2022

Milestones, progress

  • Out-of-tree Haiku boots using M-mode haiku_loader under RVVM with ATA drive
  • Nightly Haiku EFI boots through U-Boot with NVMe (Unstable framebuffer, app_server crashes, NVMe in polling mode)
  • Input devices with upstream Haiku drivers support (See Absolute positioned input device support #58)
  • Network card with upstream Haiku drivers support (RTL8169, but need stable tap_user first)
  • Verify other utility devices (RTC, syscon) work. Are goldfish or DS1742 RTCs supported in Haiku?
  • Fix NVMe IRQ loss with Haiku EFI (Doesn't happen in QEMU)
  • Fix Haiku EFI bootloader crashing with new U-Boot efifb (Somehow works now, weird fluke?...)
  • Fix Haiku EFI bootloader crashing in SMP (Doesn't happen in QEMU)
  • NVMe support in M-Mode haiku_loader
  • Fix display server crashing upon late userspace init in Haiku EFI (Unrelated to RVVM?)
@LekKit LekKit added enhancement New feature or request help wanted Extra attention is needed labels Dec 3, 2022
@LekKit
Copy link
Owner Author

LekKit commented Dec 3, 2022

@X547 As far as I see, NVMe IRQ loss doesn't happen when running your Haiku build with ATA & haiku_loader.riscv.
It doesn't timeout or anything, ATA & NVMe coexist happily, both drive partitions are enumerated, no complaints about NVMe polling, etc. Perhaps there are some non-upstream changes that fix it.
There is a small issue with haiku_loader.riscv however. If ATA isn't the first device on the PCI bus, it crashes, i.e. NVMe cannot be attached before ATA right now.

Another interesting thing, EFI framebuffer seems to work properly now with nightly images & new U-Boot. I don't know what fixed it, I tried older RVVM commits but it still works... Maybe it was just a fluke or some local issue, huh.

Would be happy if you can verify these.

u-boot.bin.zip

image

@X547
Copy link
Contributor

X547 commented Dec 3, 2022

If ATA isn't the first device on the PCI bus, it crashes, i.e. NVMe cannot be attached before ATA right now.

ATA MMIO address is currently hardcoded both in boot loader and kernel. Kernel ATA driver need refactor because it currently assumes that register addresses are 16 bit.

@LekKit
Copy link
Owner Author

LekKit commented Dec 3, 2022

If ATA isn't the first device on the PCI bus, it crashes, i.e. NVMe cannot be attached before ATA right now.

ATA MMIO address is currently hardcoded both in boot loader and kernel. Kernel ATA driver need refactor because it currently assumes that register addresses are 16 bit.

I hope we can just ignore all of this and get NVMe running instead. Where can I find haiku_loader.riscv sources? I could try writing a simple NVMe driver, why not (Fine deal imo, since you're working on I2C HID).

@X547
Copy link
Contributor

X547 commented Dec 3, 2022

Where can I find haiku_loader.riscv sources? I could try writing a simple NVMe driver, why not (Fine deal imo, since you're working on I2C HID).

It is here: https://github.com/haiku/haiku/blob/master/src/system/boot/platform/riscv/devices.cpp#L33. It is needed to add some NvmeBlockDevice. I have an unpublished boot loader PCI bus code.

@X547
Copy link
Contributor

X547 commented Dec 3, 2022

My Haiku RVVM branch: https://github.com/X547/haiku/tree/rvvm2.

WIP NVMe boot loader driver: https://github.com/X547/haiku/blob/e717045595ebbd71a30731bc57c96a5d1a68ef52/src/system/boot/platform/riscv/NvmeBlockDevice.cpp.

@LekKit
Copy link
Owner Author

LekKit commented Dec 3, 2022

I'm not sure I properly understand how to build Haiku
../../buildtools/jam/jam0 -j16 -q @minimum-mmc

Asked for riscv target boot platform 
Unknown path to handle adding to image 
don't know how to make @minimal-mmc
...patience...
...found 1 target(s)...
...can't find 1 target(s)...

@X547
Copy link
Contributor

X547 commented Dec 3, 2022

I'm not sure I properly understand how to build Haiku

You need to configure build first. It will build GCC for riscv64 target. Assuming that current directory contains haiku and buildtools.

mkdir -p generated.riscv64
cd generated.riscv64
../configure -j4 --build-cross-tools riscv64 --cross-tools-source ../../buildtools --distro-compatibility official

@X547
Copy link
Contributor

X547 commented Dec 3, 2022

../../buildtools/jam/jam0 -j16 -q @minimum-mmc
don't know how to make @minimal-mmc

Spell miss? Correct is @minimum-mmc.

@LekKit
Copy link
Owner Author

LekKit commented Dec 4, 2022

You need to configure build first

I did that already using this guide https://www.haiku-os.org/guides/building/compiling-riscv64

Spell miss? Correct is @minimum-mmc.

I tried many, none worked (With the same error)

@LekKit
Copy link
Owner Author

LekKit commented Dec 4, 2022

Improved upstream ATA in 43aeba3, at least it's no longer a security hellhole (3 CWEs fixed, lol). Merged your API changes so you no longer need to hack on it each time.

Is it worth adding some kind of -ata option for those drives in upstream?

@X547
Copy link
Contributor

X547 commented Dec 4, 2022

I tried many, none worked (With the same error)

https://www.haiku-os.org/guides/building/pre-reqs

<jam-install-command>

To install jam you can use one of two commands: The first requires administrative privilege, as jam will be installed to ‘/usr/local/bin/’

    sudo ./jam0 install
    ./jam0 -sBINDIR=$HOME/bin install

@X547
Copy link
Contributor

X547 commented Dec 4, 2022

Is it worth adding some kind of -ata option for those drives in upstream?

Ideally it will be nice to have an option to specify drive type for each image independently.

@LekKit
Copy link
Owner Author

LekKit commented Dec 4, 2022

Is it worth adding some kind of -ata option for those drives in upstream?

Ideally it will be nice to have an option to specify drive type for each image independently.

Yes, it's just a convention that -i/-image means "Just give me any kind of storage that is preferred".
ATA is kind of deprecated because I see little use for it in context of a RISC-V system (Outside of Haiku bootloader, and even this is temporary), and because it isn't maintained well.
It's not like I'm against this device, but no one is gonna implement missing features / non-critical fixes for it any more. I only ran a bit of fuzzing/coverage because I don't want to put my users under security risk for using it, and because someone had to do it.

I have no plans for more storage devices currently. That's why I don't know what should I do with the CLI interface, really.

@X547
Copy link
Contributor

X547 commented Dec 4, 2022

Did you solve a problem of Haiku build? What Haiku source version are you using? What happens if run jam @minimum-raw kernel?

@LekKit
Copy link
Owner Author

LekKit commented Dec 4, 2022

Did you solve a problem of Haiku build? What Haiku source version are you using? What happens if run jam @minimum-raw kernel?

Using your Haiku fork, rvvm2 branch
Figured the jam issue, thanks. There are some compilation errors tho

../src/system/boot/platform/efi/arch/riscv64/arch_traps.cpp: In function 'void WriteSstatus(uint64_t)':
../src/system/boot/platform/efi/arch/riscv64/arch_traps.cpp:50:30: error: no matching function for call to 'SstatusReg::SstatusReg(uint64_t&)'
   50 |         SstatusReg status(val);
      |                              ^

@X547
Copy link
Contributor

X547 commented Dec 4, 2022

Figured the jam issue, thanks. There are some compilation errors tho

Fixed, source updated.

@X547
Copy link
Contributor

X547 commented Dec 4, 2022

Functional configuration:

  • RVVM: X547@9579da6
  • Haiku: X547/haiku@dc7f664
  • Haiku build: jam -q -j4 @minimum-raw haiku-minimum.image
  • Run: rvvm -mem 512M -res 1024x768 --rv64 objects/haiku/riscv64/release/system/boot/riscv/haiku_loader.riscv --image haiku-minimum.image

@LekKit
Copy link
Owner Author

LekKit commented Dec 4, 2022

Functional configuration:

Hurray, I now can at least try it since we have working i2c hid and stuff... Feels great (Tho perf could be better, I'm currently losing to QEMU here perhaps. Haiku uses floats a lot, right?)

Will proceed to NVMe bootloader driver

@LekKit
Copy link
Owner Author

LekKit commented Dec 4, 2022

Hmm, sometimes I2C HID deadlocks apparently. Pretty rare to spot but I've seen these 2 times already.

WARN: Possible deadlock at src/devices/hid-mouse.c@97
WARN: The lock was previously held at src/devices/hid-mouse.c@213
WARN: Version: RVVM v0.5-8e8f200-git
WARN: Attempting to recover execution...
 * * * * * * *

WARN: Possible deadlock at src/devices/i2c-hid.c@155
WARN: The lock was previously held at src/devices/i2c-hid.c@318
WARN: Version: RVVM v0.5-8e8f200-git
WARN: Attempting to recover execution...
 * * * * * * *

@LekKit
Copy link
Owner Author

LekKit commented Dec 4, 2022

Verify other utility devices (RTC, syscon) work. Are goldfish or DS1742 RTCs supported in Haiku?

Syscon doesn't seem to work. Powering off the system from the guest leaves me with some win2000-vibe message "It's now safe to turn off the computer" and the machine never actually powers down.

Should be trivial to implement, syscon is just a single mmio register with specific values for poweroff/reset. This is also used in QEMU and on SiFive boards AFAIK.

@X547
Copy link
Contributor

X547 commented Dec 5, 2022

Haiku currently support shutdown and RTC with HTIF commands. RTC HTIF interface is my extension and it work only in my TinyEMU fork.

@LekKit
Copy link
Owner Author

LekKit commented Dec 5, 2022

Haiku currently support shutdown and RTC with HTIF commands. RTC HTIF interface is my extension and it work only in my TinyEMU fork.

I can implement that as well probably?

@X547
Copy link
Contributor

X547 commented Dec 5, 2022

I can implement that as well probably?

I think that it is better to implement more standard interfaces.

HTIF commands currently used by Haiku:

  • (device: 0, cmd: 0, arg: 1): shutdown.
  • (device: 1, cmd: 1, arg: ch): print char ch.
  • (device: 2, cmd: 0, arg: 0): get UNIX time in microseconds.

Executing HTIF command:

// host-target interface
struct HtifRegs
{
	uint32 toHostLo;
	uint32 toHostHi;
	uint32 fromHostLo;
	uint32 fromHostHi;
};

uint64
HtifCmd(uint32 device, uint8 cmd, uint32 arg)
{
	if (gHtifRegs == 0)
		return 0;

	uint64 htifTohost = ((uint64)device << 56)
		+ ((uint64)cmd << 48) + arg;
	gHtifRegs->toHostLo = htifTohost % ((uint64)1 << 32);
	gHtifRegs->toHostHi = htifTohost / ((uint64)1 << 32);
	return (uint64)gHtifRegs->fromHostLo
		+ ((uint64)gHtifRegs->fromHostHi << 32);
}

FDT compatible value: ucb,htif0.

@LekKit
Copy link
Owner Author

LekKit commented Dec 5, 2022

I think that it is better to implement more standard interfaces.

Yeah, syscon and goldfish rtc were implemented just because they match basic QEMU machine. I would prefer emulating hardware from real RV boards (current PLIC/CLINT/UART/I2C-OC/NVMe fall into this category well) or just some generic common hardware (simple-fb perhaps counts?.. My RPi uses this driver as well).
As I see HTIF is a basic interface for debugging FPGA boards, right? That's great if it's part of official spec, I've just never seen it for some reason.

@LekKit
Copy link
Owner Author

LekKit commented Dec 5, 2022

Haiku dd reports incorrect transfer speed (0.0 or -0.0!), crashes the kernel. No meaningful backtrace, perhaps I should build Haiku with -fno-omit-frame-pointer.
As for "does it happen outside RVVM" I dunno, should be checked soon. Values -0.0 suspiciously look like some FPU-related trouble.
Don't mind the terrible read speeds, my host laptop HDD is really that slow

vm_page_fault: vm_soft_fault returned error 'Bad address' on fault at 0x303231353237313e, ip 0x3eabcd2226, write 0, user 1, exec 0, thread 0x1a8
PANIC: thread_hit_serious_debug_event
Welcome to Kernel Debugging Land...
Thread 424 "dd" running on CPU 0
Stack:
FP: 0x0
kdebug> bt
Stack:
FP: 0xffffffc006664870
FP: 0xffffffc005c58298, PC: 0xffffffc0000ca8a6 <kernel_riscv64> invoke_debugger_command.localalias + 170
FP: 0x0, PC: 0x100000040 0x100000040
kdebug> 

image
image

@X547
Copy link
Contributor

X547 commented Dec 5, 2022

PANIC: thread_hit_serious_debug_event

This is userland process crash. It can be continued with es command. This kernel panic usually don't happen. It is temporary added here (https://github.com/X547/haiku/blob/dc7f6642ec87bfaa4d9731a91e57fdbf0c0e7cc0/src/system/kernel/debug/user_debugger.cpp#L816) for debugging purposes.

dd seems miscompiled. Crash happends on all RISC-V platforms (all supported emulators and real hardware).

@LekKit
Copy link
Owner Author

LekKit commented Feb 25, 2024

Is it upstream or some branch?

Latest nightly image from https://download.haiku-os.org/nightly-images/riscv64/

I fixed it in my device_manager2 branch by switching to mimalloc allocator.

Ah, I see

@LekKit
Copy link
Owner Author

LekKit commented Mar 4, 2024

I see there is an experimental stateful kernel event API in Haiku, and kqueue emulation in libbsd (https://review.haiku-os.org/c/haiku/+/6746).

I tried to compile RVVM with kqueue networking event implementation and it seems to work fairly well without issues (But need downgrade to hrev57545 because of https://dev.haiku-os.org/ticket/18327).

This brings support for infinite amount of connections / port forwards and better performance.

image

Is there any way I can check for it's presence at compile time & runtime to add support for this without breaking compatibility with older Haiku versions?

@X547
Copy link
Contributor

X547 commented Mar 4, 2024

Is there any way I can check for it's presence at compile time & runtime to add support for this without breaking compatibility with older Haiku versions?

I think nothing better that function symbol presence detection in libbsd.so is possible.

@LekKit
Copy link
Owner Author

LekKit commented Mar 7, 2024

some non-zero percentage of my keyboard inputs are lost when using RVVM and typing input into a guest

Should be fixed now (But the design will be further improved).

non-working r8169 ethernet (as noted in #118)

Should be also fixed as #118 is closed, but I don't have a proper Haiku image to test this fully.
My self-built image lacks netcat, sshd, tcpdump, and any other means to test/debug this. Ping works, MAC is detected. FreeBSD guest has fully working network (If you run kldload if_re, it doesn't seem to load driver by default).
Running pkgman full-sync in Haiku errors out with "Operation not supported".
Currently I don't have a fast enough host to build better Haiku image so if anyone can upload or make I2C-HID work on upstream nightly RISC-V images I could test this further.

Upstreamed new RTC device (Dallas DS1742) which is present on some boards; Implemented SiFive GPIO device & GPIO API.

@diversys
Copy link

diversys commented Mar 7, 2024

Running pkgman full-sync in Haiku errors out with "Operation not supported".

This is likely because by default Haiku builds without SSL support. You need to specify HAIKU_BUILD_FEATURE_SSL = 1; in haiku/build/jam/UserBuildConfig.

@X547
Copy link
Contributor

X547 commented Mar 7, 2024

Implemented SiFive GPIO device & GPIO API.

It would be more useful to have VisionFive 2 GPIO. SiFive Unmatched if effectively end of life.

@LekKit
Copy link
Owner Author

LekKit commented Mar 7, 2024

It would be more useful to have VisionFive 2 GPIO. SiFive Unmatched if effectively end of life.

Will try to implement it, I just need any already good working one for a side project related to RVVM. I also expect SiFive GPIO to be present in more boards in future (MilkV Oasis?).

I also had a look at gpio-poweroff FDT device again (As you suggested in #60), and it seems we completely missed the point of it. It is not a mechanism to signal shutdown to OS, but instead for OS to power off the board (Aka replacement for syscon).

@LekKit
Copy link
Owner Author

LekKit commented Mar 8, 2024

Can't build device_manager2 branch: Seems it's trying to download packages from localhost when building mimimum-mmc or nightly-mmc.

DownloadLocatedFile1 download/libsolv-0.3.0_haiku_2014_12_22-1-riscv64.hpkg 
--2024-03-08 12:56:15--  http://localhost/HaikuPorts/2f7e3a987492461768abfbe75df5d9dbcaa553636988dcd0e8eb44707fead985/packages/libsolv-0.3.0_haiku_2014_12_22-1-riscv64.hpkg
Resolving localhost (localhost)... ::1, 127.0.0.1
Connecting to localhost (localhost)|::1|:80... failed: Connection refused.
Connecting to localhost (localhost)|127.0.0.1|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2024-03-08 12:56:15 ERROR 404: Not Found.

Perhaps some CI for those WIP Haiku versions is needed? Many complications arise each time I try to build them, upstream Haiku still doesn't have working I2C HID in RVVM and has memory corruptions.

@Begasus
Copy link

Begasus commented Mar 15, 2024

Can't build device_manager2 branch: Seems it's trying to download packages from localhost when building mimimum-mmc or nightly-mmc.

This is a known thing (and have it happened to me here too), try replacing it (create a backup) with this file in the source: https://github.com/haiku/haiku/blob/master/build/jam/repositories/HaikuPorts/riscv64

@LekKit LekKit added the discussion Debate for improvement label Mar 25, 2024
@LekKit
Copy link
Owner Author

LekKit commented Mar 26, 2024

net_server is very CPU hungry for whatever reason, passing -nonet to RVVM fixes this

image

@X547
Copy link
Contributor

X547 commented Mar 26, 2024

Try to kill it, it will auto-restart and maybe fix problem.

It may be a bug in net_server DHCP handling and/or obscure behavior of RVVM DHCP emulation.

@LekKit
Copy link
Owner Author

LekKit commented Apr 20, 2024

Crash in radeon_hd driver when trying to VFIO passthrough an AMD RX580 GPU to Haiku hrev57708 guest

Haiku UART log
radeon_hd: init_hardware
radeon_hd: init_driver
radeon_hd: init_driver: GPU(0) Radeon RX 470/480, revision = 0x0
radeon_hd: publish_devices
radeon_hd: find_device
loaded driver /boot/system/add-ons/kernel/drivers/dev/graphics/radeon_hd
ati: init_hardware() - no supported devices
framebuffer: init_hardware()
radeon_hd: device_open: open(name = graphics/radeon_hd_000400)
radeon_hd: card(0): radeon_hd_init: called
radeon_hd: radeon_hd_init: card(0): Radeon Polaris 10 1002:67DF
radeon_hd: radeon_hd_init: Error: found 0MB video ram, using PCI bar size...
radeon_hd: radeon_hd_init: mapping a frame buffer of 256MB out of 0MB video ram
radeon_hd: framebuffer paddr: 0x50000000
radeon_hd: frambuffer vaddr: 0xffffffc023000000
radeon_hd: frambuffer size: 0x10000000
radeon_hd: card(0): radeon_hd_getbios: called
radeon_hd: mapAtomBIOSACPI: seeking AtomBIOS from ACPI
module: Search for bus_managers/acpi/v1 failed.
radeon_hd: radeon_hd_getbios: AtomBIOS not found using active method 0 at 0x0
radeon_hd: mapAtomBIOS: seeking AtomBIOS @ 0x50000000 [size: 0x40000]
radeon_hd: mapAtomBIOS: BIOS signature incorrect @ 0x50000000 (0)
radeon_hd: radeon_hd_getbios: AtomBIOS not found using active method 1 at 0x50000000
radeon_hd: radeon_hd_getbios: No base found at PCI ROM BAR
radeon_hd: radeon_hd_getbios: AtomBIOS not found using active method 2 at 0x0
radeon_hd: radeon_hd_getbios: Active AtomBIOS search failed.
radeon_hd: card(0): radeon_hd_getbios_ni: called
radeon_hd: radeon_hd_getbios_ni: No AtomBIOS location found at PCI ROM BAR
radeon_hd: radeon_hd_init: Can't find an AtomBIOS rom! Trying shadow rom...
radeon_hd: mapAtomBIOS: seeking AtomBIOS @ 0xC0000 [size: 0x20000]
PANIC: Unexpected exception occurred in kernel mode!
Welcome to Kernel Debugging Land...
Thread 480 "app_server" running on CPU 3
Stack:
FP: 0xffffffc00251e610
FP: 0xffffffc00251e630, PC: 0xffffffc0021497a8 <kernel_riscv64> arch_debug_call_with_fault_handler + 32
FP: 0xffffffc00251e680, PC: 0xffffffc0020d2268 <kernel_riscv64> debug_call_with_fault_handler.localalias + 120
FP: 0xffffffc00251e710, PC: 0xffffffc0020d349e <kernel_riscv64> _ZL20kernel_debugger_loopPKcS0_Pvi + 272
FP: 0xffffffc00251e790, PC: 0xffffffc0020d3744 <kernel_riscv64> _ZL24kernel_debugger_internalPKcS0_Pvi + 130
FP: 0xffffffc00251e7d0, PC: 0xffffffc0020d3a4a <kernel_riscv64> panic + 78
FP: 0xffffffc00251e900, PC: 0xffffffc00214a9da <kernel_riscv64> STrap + 588
FP: 0xffffffc00251ea20, PC: 0xffffffc002148660 <kernel_riscv64> SVec + 96
STrap(exception loadAccessFault)
  sstatus: (ie: {}, pie: {s}, spp: s, fs: dirty, xs: off, sum: 0, mxr: 0, uxl: 2, sd: 1)
  stval: 0xffffffc00267d000
   ra: 0xffffffc002676110   t6: 0x0000000000000004   sp: 0xffffffc00251ea20   gp: 0x0000000000000000
   tp: 0xffffffc0077068c0   t0: 0xffffffc002177938   t1: 0xffffffc00207d4dc   t2: 0x0000000000000020
   t5: 0x0000000000000002   s1: 0xffffffc007195150   a0: 0xffffffc00267d000   a1: 0x0000000000010000
   a2: 0x0000000000000000   a3: 0xffffffffffffffff   a4: 0x00000000000019d7   a5: 0x0000000000000055
   a6: 0xffffffc00214f062   a7: 0xffffffc0022512a0   s2: 0x0000000000000000   s3: 0x000000000000186c
   s4: 0x0000000000020000   s5: 0x0000000000020000   s6: 0x00000000000c0000   s7: 0x0000000000000000
   s8: 0x0000000000000000   s9: 0x0000000000000000  s10: 0xffffffffffffffff  s11: 0xffffffc007195150
   t3: 0xffffffc0020a4ee8   t4: 0x0000000000060000   fp: 0xffffffc00251ea90  epc: 0xffffffc002676120
FP: 0xffffffc00251ea90, PC: 0xffffffc002676120 </boot/system/add-ons/kernel/drivers/dev/graphics/radeon_hd> find_device + 836
FP: 0xffffffc00251eb50, PC: 0xffffffc002676d2c </boot/system/add-ons/kernel/drivers/dev/graphics/radeon_hd> _Z14radeon_hd_initR11radeon_info + 2324
FP: 0xffffffc00251eb80, PC: 0xffffffc0026760a4 </boot/system/add-ons/kernel/drivers/dev/graphics/radeon_hd> find_device + 712
FP: 0xffffffc00251ecc0, PC: 0xffffffc0020e68d0 <kernel_riscv64> _ZL10devfs_openP9fs_volumeP8fs_vnodeiPPv + 132
FP: 0xffffffc00251ed00, PC: 0xffffffc00210a68c <kernel_riscv64> _ZL10open_vnodeP5vnodeib + 42
FP: 0xffffffc00251ed40, PC: 0xffffffc00210f81e <kernel_riscv64> _ZL9file_openiPcib + 88
FP: 0xffffffc00251eda0, PC: 0xffffffc002115de6 <kernel_riscv64> _user_open + 116
FP: 0xffffffc00251ede0, PC: 0xffffffc0020a757e <kernel_riscv64> syscall_dispatcher + 3364
FP: 0xffffffc00251eed0, PC: 0xffffffc00214ab14 <kernel_riscv64> STrap + 902
FP: 0xffffffc00251eff0, PC: 0xffffffc002148728 <kernel_riscv64> SVecU + 120
STrap(exception uEcall)
  sstatus: (ie: {}, pie: {s}, spp: u, fs: dirty, xs: off, sum: 0, mxr: 0, uxl: 2, sd: 1)
  stval: 0x0
   ra: 0x0000001da03cf94c   t6: 0x0000000d795406d0   sp: 0x00000038e797d990   gp: 0x0000000000000000
   tp: 0x00000038e797e000   t0: 0x0000000000000070   t1: 0x0000001da03befec   t2: 0xffffffffffffffff
   t5: 0x0000000d795407c8   s1: 0x0000000000000000   a0: 0xffffffffffffffff   a1: 0x00000038e797da00
   a2: 0x0000000000000002   a3: 0x0000000000000000   a4: 0xffffffffffffffff   a5: 0xffffffff8000000d
   a6: 0x0000000000000001   a7: 0x0000000000000002   s2: 0x0000000d7951ed10   s3: 0x00000038e797da00
   s4: 0x0000000d7951ed10   s5: 0x0000000d794d9990   s6: 0x0000000000000000   s7: 0x0000000000000000
   s8: 0x0000000000000000   s9: 0x0000000000000000  s10: 0x0000000000000000  s11: 0x0000000000000000
   t3: 0x0000001da03c90d0   t4: 0x0000000000000040   fp: 0x00000038e797d9c0  epc: 0x0000001da03c90d8
FP: 0x38e797d9c0, PC: 0x1da03c90d8 <libroot.so> _kern_open + 8
FP: 0x38e797de30, PC: 0x3b92ea278e <_APP_> _ZThn88_N14AppFontManagerD0Ev + 9082
FP: 0x38e797de60, PC: 0x3b92ea29aa <_APP_> _ZThn88_N14AppFontManagerD0Ev + 9622
FP: 0x38e797deb0, PC: 0x3b92e6b40a <_APP_> _ZN13ScreenManager15_AddHWInterfaceEP11HWInterface + 70
FP: 0x38e797df30, PC: 0x3b92e6b6ce <_APP_> _ZN13ScreenManagerC2Ev + 80
FP: 0x38e797df70, PC: 0x3b92e550e8 <_APP_> _ZN9AppServerC2EPi + 202
FP: 0x38e797dfa0, PC: 0x3b92e54326 <_APP_> main + 38
FP: 0x38e797dfd0, PC: 0x3b92e5466e <_APP_> _start + 58
FP: 0x38e797e000, PC: 0x2d03d9ddfa </boot/system/runtime_loader> 0xddfa
FP: 0x0, PC: 0x370a640248 <commpage> commpage_thread_exit + 0
kdebug>

Note it mentions failure to get VBIOS from PCI ROM BAR, but PCI ROM BAR is for some reason "disabled" in host lspci and isn't available to VFIO API.
This seems common for VFIO GPU passthrough, but Linux manages to handle it properly.

0a:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] (rev e7) (prog-if 00 [VGA controller])
        Subsystem: ASUSTeK Computer Inc. Device 0519
        Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Interrupt: pin A routed to IRQ 62
        IOMMU group: 15
        Region 0: Memory at d0000000 (64-bit, prefetchable) [size=256M]
        Region 2: Memory at e0000000 (64-bit, prefetchable) [size=2M]
        Region 4: I/O ports at e000 [size=256]
        Region 5: Memory at fce00000 (32-bit, non-prefetchable) [size=256K]
-       Expansion ROM at 000c0000 [disabled] [size=128K]
        Kernel driver in use: vfio-pci

@X547
Copy link
Contributor

X547 commented Apr 21, 2024

but Linux manages to handle it properly.

Any ideas how? AtomBIOS is critical for Haiku radeon_hd driver operation.

@LekKit
Copy link
Owner Author

LekKit commented Apr 22, 2024

but Linux manages to handle it properly.

Any ideas how? AtomBIOS is critical for Haiku radeon_hd driver operation.

Here is amdgpu log from RISC-V Linux on the same VM & host setup

[   11.931998] [drm] amdgpu kernel modesetting enabled.
[   11.934998] [drm] initializing kernel modesetting (POLARIS10 0x1002:0x67DF 0x10DC:0xEBA1 0x00).
[   11.934998] [drm] register mmio base: 0x50300000
[   11.934998] [drm] register mmio size: 262144
[   11.935998] [drm] add ip block number 0 <vi_common>
[   11.935998] [drm] add ip block number 1 <gmc_v8_0>
[   11.935998] [drm] add ip block number 2 <tonga_ih>
[   11.935998] [drm] add ip block number 3 <gfx_v8_0>
[   11.936998] [drm] add ip block number 4 <sdma_v3_0>
[   11.936998] [drm] add ip block number 5 <powerplay>
[   11.936998] [drm] add ip block number 6 <dm>
[   11.936998] [drm] add ip block number 7 <uvd_v6_0>
[   11.936998] [drm] add ip block number 8 <vce_v3_0>
[   11.936998] amdgpu 0000:00:03.0: ROM [??? 0x00000000 flags 0x20000000]: can't assign; bogus alignment
[   12.108998] amdgpu 0000:00:03.0: amdgpu: Fetched VBIOS from ROM
[   12.109998] amdgpu: ATOM BIOS: 115-D000PIL-100
[   12.115998] [drm] UVD is enabled in VM mode
[   12.115998] [drm] UVD ENC is enabled in VM mode
[   12.115998] [drm] VCE enabled in VM mode
[   12.116998] amdgpu 0000:00:03.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported
[   12.116998] amdgpu 0000:00:03.0: amdgpu: PCIE atomic ops is not supported
[   12.117998] [drm] vm size is 64 GB, 2 levels, block size is 10-bit, fragment size is 9-bit
[   12.122998] amdgpu 0000:00:03.0: amdgpu: VRAM: 8192M 0x000000F400000000 - 0x000000F5FFFFFFFF (8192M used)
[   12.122998] amdgpu 0000:00:03.0: amdgpu: GART: 256M 0x000000FF00000000 - 0x000000FF0FFFFFFF
[   12.122998] [drm] Detected VRAM RAM=8192M, BAR=256M
[   12.122998] [drm] RAM width 256bits GDDR5
[   12.125998] [drm] amdgpu: 8192M of VRAM memory ready
[   12.125998] [drm] amdgpu: 3968M of GTT memory ready.
[   12.126998] [drm] GART: num cpu pages 65536, num gpu pages 65536
[   12.131998] [drm] PCIE GART of 256M enabled (table at 0x000000F400800000).
[   12.139998] [drm] Chained IB support enabled!
[   12.167998] amdgpu: hwmgr_sw_init smu backed is polaris10_smu
[   12.195998] [drm] Found UVD firmware Version: 1.130 Family ID: 16
[   12.219998] [drm] Found VCE firmware Version: 53.26 Binary ID: 3
[   12.358998] [drm] Display Core v3.2.266 initialized on DCE 11.2
[   12.455998] [drm] UVD and UVD ENC initialized successfully.
[   12.558998] [drm] VCE initialized successfully.
[   12.559998] amdgpu 0000:00:03.0: amdgpu: SE 4, SH per SE 1, CU per SH 9, active_cu_number 36
[   12.586998] amdgpu 0000:00:03.0: amdgpu: Using BACO for runtime pm
[   12.597998] [drm] Initialized amdgpu 3.57.0 20150101 for 0000:00:03.0 on minor 0
[   12.669998] fbcon: Deferring console take-over
[   12.669998] amdgpu 0000:00:03.0: [drm] fb0: amdgpudrmfb frame buffer device

Apparently PCI ROM availability depends on CSM (Legacy BIOS features) but my host system boots from EFI and has CSM disabled in the firmware options. This is why PCI ROM BAR is "disabled" in lspci and isn't available to VFIO.

Linux amdgpu driver has multiple other ways to get VBIOS, in case of RVVM guest it dumps ROM using this method:
https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/amd/amdgpu/amdgpu_bios.c#L439

Here is what it does on a different x86 machine (Reads from VFCT which is part of ACPI tables):

ACPI + amdgpu log
[    0.003568] ACPI: Early table checksum verification disabled
[    0.003570] ACPI: RSDP 0x00000000BCC36014 000024 (v02 ALASKA)
[    0.003572] ACPI: XSDT 0x00000000BCC35728 0000E4 (v01 ALASKA A M I    01072009 AMI  01000013)
[    0.003575] ACPI: FACP 0x00000000BC40C000 000114 (v06 ALASKA A M I    01072009 AMI  00010013)
[    0.003578] ACPI: DSDT 0x00000000BC3D8000 0068F9 (v02 ALASKA A M I    01072009 INTL 20190509)
[    0.003580] ACPI: FACS 0x00000000BCC30000 000040
[    0.003582] ACPI: SSDT 0x00000000BC41B000 00AFC0 (v02 GBT    GSWApp   00000001 INTL 20190509)
[    0.003583] ACPI: SSDT 0x00000000BC412000 008CE9 (v02 AMD    AmdTable 00000002 MSFT 04000000)
[    0.003585] ACPI: SSDT 0x00000000BC40E000 003D7C (v02 AMD    AMD AOD  00000001 INTL 20190509)
[    0.003586] ACPI: SSDT 0x00000000BC40D000 000221 (v02 ALASKA CPUSSDT  01072009 AMI  01072009)
[    0.003587] ACPI: FIDT 0x00000000BC405000 00009C (v01 ALASKA A M I    01072009 AMI  00010013)
[    0.003589] ACPI: MCFG 0x00000000BC404000 00003C (v01 ALASKA A M I    01072009 MSFT 00010013)
[    0.003590] ACPI: HPET 0x00000000BC403000 000038 (v01 ALASKA A M I    01072009 AMI  00000005)
[    0.003592] ACPI: IVRS 0x00000000BC402000 0001A4 (v02 AMD    AmdTable 00000001 AMD  00000001)
[    0.003593] ACPI: FPDT 0x00000000BC401000 000044 (v01 ALASKA A M I    01072009 AMI  01000013)
[    0.003594] ACPI: VFCT 0x00000000BC3F2000 00E284 (v01 ALASKA A M I    00000001 AMD  33504F47)
[    0.003596] ACPI: TPM2 0x00000000BC3F1000 00004C (v04 ALASKA A M I    00000001 AMI  00000000)
[    0.003597] ACPI: PCCT 0x00000000BC3F0000 00006E (v02 AMD    AmdTable 00000001 AMD  00000001)
[    0.003598] ACPI: SSDT 0x00000000BC3EB000 004133 (v02 AMD    AmdTable 00000001 AMD  00000001)
[    0.003600] ACPI: CRAT 0x00000000BC3EA000 000F10 (v01 AMD    AmdTable 00000001 AMD  00000001)
[    0.003601] ACPI: CDIT 0x00000000BC3E9000 000029 (v01 AMD    AmdTable 00000001 AMD  00000001)
[    0.003603] ACPI: SSDT 0x00000000BC3E8000 00068E (v02 AMD    ArticDGP 00000001 INTL 20190509)
[    0.003604] ACPI: SSDT 0x00000000BC3E6000 001522 (v02 AMD    ArticTPX 00000001 INTL 20190509)
[    0.003605] ACPI: SSDT 0x00000000BC3E5000 000788 (v02 AMD    ArticNOI 00000001 INTL 20190509)
[    0.003607] ACPI: SSDT 0x00000000BC3E1000 003A23 (v02 AMD    ArticN   00000001 INTL 20190509)
[    0.003608] ACPI: WSMT 0x00000000BC3E0000 000028 (v01 ALASKA A M I    01072009 AMI  00010013)
[    0.003609] ACPI: APIC 0x00000000BC3DF000 00015E (v04 ALASKA A M I    01072009 AMI  00010013)
[    0.003611] ACPI: SSDT 0x00000000BC40A000 00147F (v02 AMD    ArticC   00000001 INTL 20190509)
[    0.003612] ACPI: SSDT 0x00000000BC409000 0000BF (v01 AMD    AmdTable 00001000 INTL 20190509)
[    0.003613] ACPI: Reserving FACP table memory at [mem 0xbc40c000-0xbc40c113]
[    0.003614] ACPI: Reserving DSDT table memory at [mem 0xbc3d8000-0xbc3de8f8]
[    0.003614] ACPI: Reserving FACS table memory at [mem 0xbcc30000-0xbcc3003f]
[    0.003615] ACPI: Reserving SSDT table memory at [mem 0xbc41b000-0xbc425fbf]
[    0.003615] ACPI: Reserving SSDT table memory at [mem 0xbc412000-0xbc41ace8]
[    0.003616] ACPI: Reserving SSDT table memory at [mem 0xbc40e000-0xbc411d7b]
[    0.003616] ACPI: Reserving SSDT table memory at [mem 0xbc40d000-0xbc40d220]
[    0.003616] ACPI: Reserving FIDT table memory at [mem 0xbc405000-0xbc40509b]
[    0.003617] ACPI: Reserving MCFG table memory at [mem 0xbc404000-0xbc40403b]
[    0.003617] ACPI: Reserving HPET table memory at [mem 0xbc403000-0xbc403037]
[    0.003618] ACPI: Reserving IVRS table memory at [mem 0xbc402000-0xbc4021a3]
[    0.003618] ACPI: Reserving FPDT table memory at [mem 0xbc401000-0xbc401043]
[    0.003618] ACPI: Reserving VFCT table memory at [mem 0xbc3f2000-0xbc400283]
[    0.003619] ACPI: Reserving TPM2 table memory at [mem 0xbc3f1000-0xbc3f104b]
[    0.003619] ACPI: Reserving PCCT table memory at [mem 0xbc3f0000-0xbc3f006d]
[    0.003620] ACPI: Reserving SSDT table memory at [mem 0xbc3eb000-0xbc3ef132]
[    0.003620] ACPI: Reserving CRAT table memory at [mem 0xbc3ea000-0xbc3eaf0f]
[    0.003621] ACPI: Reserving CDIT table memory at [mem 0xbc3e9000-0xbc3e9028]
[    0.003621] ACPI: Reserving SSDT table memory at [mem 0xbc3e8000-0xbc3e868d]
[    0.003621] ACPI: Reserving SSDT table memory at [mem 0xbc3e6000-0xbc3e7521]
[    0.003622] ACPI: Reserving SSDT table memory at [mem 0xbc3e5000-0xbc3e5787]
[    0.003622] ACPI: Reserving SSDT table memory at [mem 0xbc3e1000-0xbc3e4a22]
[    0.003623] ACPI: Reserving WSMT table memory at [mem 0xbc3e0000-0xbc3e0027]
[    0.003623] ACPI: Reserving APIC table memory at [mem 0xbc3df000-0xbc3df15d]
[    0.003623] ACPI: Reserving SSDT table memory at [mem 0xbc40a000-0xbc40b47e]
[    0.003624] ACPI: Reserving SSDT table memory at [mem 0xbc409000-0xbc4090be]

...


[    7.848686] amdgpu 0000:09:00.0: amdgpu: Fetched VBIOS from VFCT
[    7.848689] amdgpu: ATOM BIOS: 113-EXT90440-100

It might be possible to implement a workaround and read PCI ROM from RVVM using another method instead of VFIO, then pass it to the guest as PCI ROM BAR, but I don't know how feasible it will be yet.

@LekKit
Copy link
Owner Author

LekKit commented Apr 22, 2024

This is what is I guess used for my GPU (Which is gfx8.0) and most modern ones:

https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/amd/amdgpu/vi.c#L635

A few other AMD GPU families have different ROM dump method, namely Southern Islands (SI), Sea Islands (CIK), and some iGPU variations:

https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/amd/amdgpu/si.c#L1306
https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/amd/amdgpu/cik.c#L1012, etc

@X547
Copy link
Contributor

X547 commented May 3, 2024

Got this and freeze after attempting to download a file hosted on host Haiku with wget in guest Haiku.

WARN: Possible deadlock at tap_user.c@1029
WARN: The lock was previously held at tap_user.c@585
WARN: Version: RVVM v0.7-cfb4333
WARN: Attempting to recover execution...
 * * * * * * *

@LekKit
Copy link
Owner Author

LekKit commented May 3, 2024

Got this and freeze after attempting to download a file hosted on host Haiku with wget in guest Haiku.

Did it eventually unfreeze, and if so did it continue the download? Or if it was otherwise stuck, were those messages repeated or only reported once?

WARN: The lock was previously held at tap_user.c@585

Interesting. Looking at the code in question there isn't real deadlock potential, but it may invoke a handful of syscall variations under this lock, which could cause this.

I know this is not ideal (Locking should preferably only happen around structure manipulation and not heavyweight syscalls), but the finer grained locking design was postponed to prevent introducing new subtle sync bugs.
This entire design worked fine but only under the assumptions that non-blocking syscalls are actually non-blocking and return immediately. If Haiku ever gets stuck in connect() or send() on a socket that was explicitly made non-blocking, then I could see how this happened. I will soon check whether that's possible.

WARN: Version: RVVM v0.7-cfb4333

What upstream revision is that based on?

@X547
Copy link
Contributor

X547 commented May 3, 2024

Did it eventually unfreeze, and if so did it continue the download? Or if it was otherwise stuck, were those messages repeated or only reported once?

It stuck forever and the same message repeats.

What upstream revision is that based on?

c4cc337 with minor Haiku window changes.

@LekKit
Copy link
Owner Author

LekKit commented May 3, 2024

It stuck forever and the same message repeats.

Is it before wget reports connection success or at the later stage?

@LekKit
Copy link
Owner Author

LekKit commented May 3, 2024

@X547 Okay I discovered a weird kinda-bug, kinda-disagreement in how RVVM & Haiku understand POSIX API.

Look at this: 087bd6b.
It's exactly what introduced this problem, but only for Haiku, and it was intended to be an optimization (Setting nonblocking flag is 2 syscalls with fcntl(), but only one with ioctl().

Historically, ioctl(FIONBIO) existed before fcntl(O_NONBLOCK) flag, and it was for sockets only. On all modern POSIX systems that I touched (Linux, FreeBSD etc) they map to the same flag internally in the kernel, but on Haiku setting ioctl(FIONBIO) still doesn't enable O_NONBLOCK from fcntl() perspective, and connect() to a non-responding address blocks forever.

With upstream ioctl() way of setting nonblocking:
image

Reverted 087bd6b:
image

Why connect() block forever in your case I don't know, but apparently a host is unreachable or you made a typo

@LekKit
Copy link
Owner Author

LekKit commented May 3, 2024

I should probably patch this and enable ioctl(FIONBIO) path only for OSes that are known to handle this properly so this won't happen in future.

But it should be also looked at from Haiku side. I found this ioctl() trick in nginx so it's not just my gimmick, and it could probably mean that nginx and other software is subtly broken on Haiku.

@waddlesplash
Copy link

I looked at Haiku's source code. FIONBIO on a socket does set SO_NONBLOCK socket option, so, I don't know why connect() would block here. Indeed O_NONBLOCK won't be set when FIONBIO is used, and that should be fixed, but the relevant option does seem to be set.

@LekKit
Copy link
Owner Author

LekKit commented May 3, 2024

It seems I mixed things up a bit, my bad. Indeed the only Haiku bug is that FIONBIO has no effect on O_NONBLOCK, and RVVM ioctl() code is also buggy. The RVVM breakage wasn't observed on Linux/FreeBSD since they have another optimization to pass SOCK_NONBLOCK to socket() and create non-blocking socket from the get-go...

UPD: Should be fixed by aaf8995

@LekKit
Copy link
Owner Author

LekKit commented Jun 19, 2024

net_server is very CPU hungry for whatever reason, passing -nonet to RVVM fixes this

Try to kill it, it will auto-restart and maybe fix problem.

It may be a bug in net_server DHCP handling and/or obscure behavior of RVVM DHCP emulation.

Bisecting the issue leads to f791aa5. So somehow the tiny difference to clocksource precision made net_server very CPU hungry.

This is one of those optimizations I purposely made at the beginning of v0.7-git development cycle to determine if it breaks anything during this version development. However, I am not sure that net_server is very nice for behaving this way, since all other guests and pieces of software work well.

I'll try to inspect how exactly timings are impacted by that RVVM commit and report back - this is probably also very host-dependent. Perhaps this net_server CPU-hungry behavior could even be reproduced on v0.6 when the host is wack.

@X547
Copy link
Contributor

X547 commented Jun 19, 2024

However, I am not sure that net_server is very nice for behaving this way, since all other guests and pieces of software work well.

It is very likely that the bug is on Haiku side and RVVM behavior is fine. Anyway DHCP client should not crash when receive malformed data.

@LekKit
Copy link
Owner Author

LekKit commented Jun 19, 2024

It is very likely that the bug is on Haiku side and RVVM behavior is fine. Anyway DHCP client should not crash when receive malformed data.

Well DHCP handling has probably nothing to do with it. Something in net_server likely uses timers/sleeps the wrong way, maybe does something where precision matters (us-precise delays, etc). I can't tell without knowing the code, so I'll look at that soon, but I see that it never shows DHCP notification "establishing internet connection" if the CPU wastage happens so perhaps it never even reaches that point.

I checked, and can confirm that RVVM timer clocksource never jumps back when using CLOCK_MONOTONIC_COARSE. I dunno what other guarantees a host clocksource can portably provide honestly.

This net_server guest issue can be reliably reproduced on any POSIX host by applying this RVVM patch:

diff --git a/src/rvtimer.c b/src/rvtimer.c
index 0c17c44..06c4031 100644
--- a/src/rvtimer.c
+++ b/src/rvtimer.c
@@ -74,6 +74,9 @@ uint64_t rvtimer_clocksource(uint64_t freq)
 {
     struct timespec now = {0};
     clock_gettime(CHOSEN_POSIX_CLOCK, &now);
+
+    now.tv_nsec -= now.tv_nsec % 1000000;
+
     return (now.tv_sec * freq) + (now.tv_nsec * freq / 1000000000ULL);
 }
 

This drops RISC-V timer precision to 1ms. Note that 1ms is not that bad precision, and even peak for some hosts (x86 machines without TSC/APIC, etc). Commit f791aa5 simply extends this list.

@LekKit
Copy link
Owner Author

LekKit commented Jun 21, 2024

I reverted back to CLOCK_MONOTONIC in dcd391a due to this issue, and some guest timing/scheduling degradation across the board.

If you want to debug net_server issue in Haiku, you can keep using the above RVVM patch which is still usable to reproduce that problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Debate for improvement enhancement New feature or request help wanted Extra attention is needed
Projects
Status: Testing
Development

No branches or pull requests

6 participants