Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Illegal instruction exception on vector instructions in Linux user-mode programs #60

Closed
gaoyichuan opened this issue Oct 25, 2024 · 6 comments

Comments

@gaoyichuan
Copy link

I'm running Saturn+Shuttle on a custom FPGA platform, and successfully boot mainline Linux 6.1.114. Programs without vector instructions are working fine.

However, many vector programs raise illegal exception on the first vector instruction executed after vset{i}vl{i}. The programs are all tested on Spike and runs normally. The failing asm sequence is below. (this program is the start of vaadd_vv-0 testcase from riscv-vector-tests, compiled with MODE=user, if relevant)

0000000080000000 <_start>:
    80000000:	00000297          	auipc	t0,0x0
    80000004:	00c28293          	addi	t0,t0,12 # 8000000c <_start+0xc>
    80000008:	8282                	jr	t0
    8000000a:	0001                	nop
    8000000c:	003072d7          	vsetvli	t0,zero,e8,m8,tu,mu
    80000010:	5e003057          	vmv.v.i	v0,0     # <- illegal instruction raised here
    80000014:	5e003457          	vmv.v.i	v8,0
    80000018:	5e003857          	vmv.v.i	v16,0
    8000001c:	5e003c57          	vmv.v.i	v24,0
    80000020:	00a05073          	csrwi	vxrm,0
    80000024:	0000f517          	auipc	a0,0xf
    80000028:	0dc50513          	addi	a0,a0,220 # 8000f100 <testdata>

Linux kernel dmesg on this exception:

# ./vaadd_vv-0
[ 1122.705050] vaadd_vv-0[145]: unhandled signal 4 code 0x1 at 0x0000000080000010 in vaadd_vv-0[80000000+e000]
[ 1122.737900] CPU: 0 PID: 145 Comm: vaadd_vv-0 Not tainted 6.1.114 #2
[ 1122.758400] Hardware name: ucb-bar,chipyard (DT)
[ 1122.773450] epc : 0000000080000010 ra : 0000002abe650ef4 sp : 0000003fc7295d10
[ 1122.796550]  gp : 0000002abe6efd90 tp : 0000003fb1498780 t0 : 0000000000000200
[ 1122.819750]  t1 : 0000002abe60ae8c t2 : 0000002abe6ca118 s0 : 0000002abe6f1670
[ 1122.842850]  s1 : 0000002abe6f1648 a0 : 0000000000000000 a1 : 0000002abe6f1670
[ 1122.865950]  a2 : 0000002abe6f1680 a3 : 7f7f7f7f7f7f7f7f a4 : 0000000000008000
[ 1122.889200]  a5 : 000000000000002f a6 : 2f2f2f2f2f2f2f2f a7 : 00000000000000dd
[ 1122.912200]  s2 : 0000002abe6f1680 s3 : 0000002abe6e6841 s4 : 0000002abe6c9f18
[ 1122.935250]  s5 : 0000002abe6efec8 s6 : 0000000000000008 s7 : 0000000000000000
[ 1122.958300]  s8 : 0000002abe6f1648 s9 : 0000000000000000 s10: 0000000000000000
[ 1122.981650]  s11: 0000002abe6f1668 t3 : 0000003fb15583d8 t4 : 8cff8b8c85d1cfd2
[ 1123.004700]  t5 : 0000000000000018 t6 : 000000000000003c
[ 1123.021850] status: 8000000200004620 badaddr: 000000005e003057 cause: 0000000000000002
Illegal instruction

Upon further inspection, I found if I insert 5 or more 16-bit NOP instruction between vsetvli on 0x8000000c and vmv.v.i on 0x80000010, the program can run normally without exception. Also, further vector instructions in the same program won't raise exception, only the first vector instruction fetched raise this exception. The modified asm looks like:

    8000000c:	003072d7          	vsetvli	t0,zero,e8,m8,tu,mu
    80000010:	0001                	nop    # <- injected NOPs
    80000012:	0001                	nop
    80000014:	0001                	nop
    80000016:	0001                	nop
    80000018:	0001                	nop
    8000001a:	5e003057          	vmv.v.i	v0,0
    8000001e:	5e003457          	vmv.v.i	v8,0
    80000022:	5e003857          	vmv.v.i	v16,0
    80000026:	5e003c57          	vmv.v.i	v24,0

I tested the same on both GENV256D128ShuttleConfig and REFV512D512ShuttleConfig, the results seem the same. I'm guessing this is caused by dependencies of status.VS to other vector instructions not correctly handled by Shuttle core. Since the added NOPs could have break this dependency. Additionally, Saturn+Rocket (REFV256D128RocketConfig) works normally without this issue.

However, since I'm running on a FPGA platform, getting detailed waveform can be a little tricky. So I'm filing this issue to ask for any help or ideas, thanks a lot for this awesome project!

@jerryz123
Copy link
Contributor

Thanks for reporting this. I wasn't able to reproduce this, oddly, but I suspect your analysis might be on to something.

What is odd is that status.VS should already be enabled (not 0) prior to the vsetvl in userspace Linux. So the dependency that would cause an illegal instruction is vtype.vill, but I can confirm in my testing that vtype.vill is being forwarded correctly to a vector instruction proceeding after a vset.

Can you clarify what versions of Shuttle and Saturn you are using?

@gaoyichuan
Copy link
Author

I'm currently using chipyard 1.13.0 with bundled Saturn (4ed795b) + Shuttle (b431fecd). In my setup I added some modifications for running on FPGA (add MMIO and slave port, unified clock frequency, etc.), I also used a custom verilog top module to connect to the SoC, but I don't think they are relevant to this issue.

saturn-issue60.zip

Attached is some waveform I captured on FPGA, it seems vill is not set around the decode stage of vmv.v.i, but I forgot to add status register to the ILA, so I wonder if it is helpful. If more signals trace are needed, I can help on this (need to re-synthesis the design though)

@jerryz123
Copy link
Contributor

If this was a vill dependency, then it should be reproducible from a simple bare-metal code example ---

   101b0:	80f072d7          	vsetvl	t0,zero,a5
   101b4:	003072d7          	vsetvli	t0,zero,e8,m8,tu,mu
   101b8:	5e003057          	vmv.v.i	v0,0

The first instruction sets vill, then the second instruction unsets it. I was unable to reproduce the failure with this code example ... the forwarding of the updated vtype seems to be correct.
image

Can you add the vill signals to your ILA? Thanks

@gaoyichuan
Copy link
Author

Thanks for your test, I added vill and a bunch of vector related signals to ILA, and captured two waveforms for comparison using the original code (failed) and NOPs inserted code (success).

It looks like vill was not set at all on any of the waveforms, so I think it does not caused the exception. However, in the failed waveform, CSRFile reported a lot of read_illegal and vector_illegal, so maybe that's the cause.

Waveforms: vaadd_vv_waveforms.zip, you can have a look at them, thanks!

@jerryz123
Copy link
Contributor

Interesting, it looks like mstatus.VS is set to Off in this code. So the real bug is that the first vset should have trapped, while the implementation did not trap.

The fix for shuttle and rocket are here: chipsalliance/rocket-chip#3692 ucb-bar/shuttle#7
I think the kernel has to be compiled with vector support such that VS is set to 1 (Initial) in user code.

@gaoyichuan
Copy link
Author

Thanks for the fix! I also updated kernel to 6.11.4 with CONFIG_RISCV_ISA_V=y and CONFIG_RISCV_ISA_V_DEFAULT_ENABLE=y, everything works fine now. I'm closing the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants