Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interface to vector units #3599

Merged
merged 64 commits into from
May 17, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
64 commits
Select commit Hold shift + click to select a range
0b1ef9c
Add vector-unit interface
jerryz123 Aug 23, 2023
cb4e742
Suppress vconfig for id-xcpt
jerryz123 Aug 31, 2023
ad37acb
Add set_vconfig to vector-unit
jerryz123 Sep 27, 2023
bff48a8
Propagate request size/cmd to TLB resp
jerryz123 Sep 27, 2023
d8f6caf
Add SimpleHellaCacheIF mask
jerryz123 Oct 2, 2023
f51bca4
Vector CSR data hazard
jerryz123 Nov 28, 2023
9dc08fe
Pass vxrm to vector impl
jerryz123 Nov 28, 2023
80dffc8
Merge commit '50adbdb' into ifv
jerryz123 Dec 27, 2023
12139be
scalar read, rm
dpgrubb13 Dec 7, 2023
cd4b38b
Simplify vector-fpu integration
jerryz123 Dec 28, 2023
bf79222
add vector FP exceptions
dpgrubb13 Dec 16, 2023
66bd400
Fix scalar FP to vector
jerryz123 Dec 28, 2023
48602b9
Vector trap-check should block younger exceptions
jerryz123 Jan 3, 2024
28bbca5
Add vector ll scalar wb interface
jerryz123 Jan 3, 2024
a68cfc1
Fix vector-to-scalar trace
jerryz123 Jan 3, 2024
fbd0fb8
Add vector/fp interface
jerryz123 Jan 8, 2024
6d5b054
Vec should kill in-flight dcache
dpgrubb13 Jan 22, 2024
af11ed4
StoreGen supported maxSize > dat.length
jerryz123 Jan 22, 2024
56a4da7
Set vector killm for all killm cases
jerryz123 Jan 23, 2024
203fc1f
Fix vsetvl
jerryz123 Jan 23, 2024
5930109
Add diplomatic node to rocket vector unit
Dec 18, 2023
de696aa
Fix set vstart
jerryz123 Jan 25, 2024
3172ee6
Support swap12 in fpu external interface
jerryz123 Jan 25, 2024
fb3c8dd
Add scalar FPU-to-vector support
dpgrubb13 Jan 26, 2024
0292f13
Remove dontCare from fpuOpt
jerryz123 Jan 26, 2024
0bff786
Fix shared FPU for divSqrt ops
jerryz123 Jan 29, 2024
5bcdcb2
Merge commit '749a3ea' into ifv
jerryz123 Jan 29, 2024
ef2876c
Pass vconfig to vec-decode
jerryz123 Feb 12, 2024
f3951e7
Fix vstart bypassing
jerryz123 Feb 12, 2024
eb141eb
Fix vector interface gating in FPU
jerryz123 Feb 20, 2024
072bc41
Fix vsetvl with rs1=x0
jerryz123 Feb 29, 2024
5a17c56
Allow pulling out full output from iterative imul
jerryz123 Feb 29, 2024
f8105ce
Add full_data to pipelined-mul-unit
jerryz123 Feb 29, 2024
4996086
Merge commit '8026b6b' into ifv
jerryz123 Mar 1, 2024
6e554f3
Add tlb_port to NBDcache as well
jerryz123 Mar 12, 2024
d8afe64
Improve NBDCache performance
jerryz123 Mar 12, 2024
e47a188
Add req.no_resp to ScratchpadSlavePort
jerryz123 Mar 12, 2024
174a4b9
add mem.req.no_resp to rocc examples
jerryz123 Mar 12, 2024
1e9fef1
Fix vector integration
jerryz123 Mar 13, 2024
527560a
Merge remote-tracking branch 'origin/dev' into ifv
jerryz123 Mar 18, 2024
0b2d940
Support v-impls which issue vconfig to backend
jerryz123 Mar 20, 2024
28bf141
Fix TLFragmenter assert
jerryz123 Mar 20, 2024
ea3d882
Only connect vector dcache port if requested by VU
jerryz123 Mar 20, 2024
ea35ee2
Remove DebugROB requires
jerryz123 Mar 20, 2024
c2651f4
Fix debug rob for some vector units
jerryz123 Mar 20, 2024
84409c8
Fix s1_data when coreDataBits > xLen
jerryz123 Mar 20, 2024
db35cb8
Merge remote-tracking branch 'origin/dev' into ifv
jerryz123 Mar 21, 2024
73ee196
Update LazyRoCC blackbox
jerryz123 Mar 21, 2024
3c888a2
Add message to usingVector require
jerryz123 Mar 21, 2024
b8d59a0
Fix ll_resp not writing into FPU
jerryz123 Mar 26, 2024
7a12ab9
Add store_pending notification bit to DCache
jerryz123 Mar 28, 2024
b60568c
Fix RoCCBlackbox
jerryz123 Mar 28, 2024
80ff66a
Store pending bit should include IOMSHRs
jerryz123 Mar 29, 2024
8ff7929
Avoid structural hazard-induced nacks on external fpu reqs
jerryz123 Mar 30, 2024
521a1d1
Set fp sboard for vector writes into fpregfile
jerryz123 Mar 30, 2024
b613fbe
Fix DelayQueue
jerryz123 Apr 16, 2024
3bd6174
Merge remote-tracking branch 'origin/dev' into ifv
jerryz123 Apr 19, 2024
cc1395b
Merge remote-tracking branch 'origin/dev' into ifv
jerryz123 Apr 23, 2024
0b556a1
Set TLRAM setName based on devName
jerryz123 May 1, 2024
4bd4675
Decode vector insns as illegal when vill
jerryz123 May 14, 2024
10bc824
Fix vlMax computation
jerryz123 May 14, 2024
724974d
setvl should use new vtype to compute vlMax
jerryz123 May 14, 2024
22cc8aa
Don't gate of ctrl.vec with vill
jerryz123 May 14, 2024
d92922a
Fix vector debug trace
jerryz123 May 17, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions src/main/resources/vsrc/RoccBlackBox.v
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,7 @@ module RoccBlackBox
input rocc_mem_s2_xcpt_ae_ld,
input rocc_mem_s2_xcpt_ae_st,
input rocc_mem_ordered,
input rocc_mem_store_pending,
input rocc_mem_perf_acquire,
input rocc_mem_perf_release,
input rocc_mem_perf_grant,
Expand Down Expand Up @@ -159,6 +160,7 @@ module RoccBlackBox
output [fLen:0] rocc_fpu_req_bits_in1,
output [fLen:0] rocc_fpu_req_bits_in2,
output [fLen:0] rocc_fpu_req_bits_in3,
output rocc_fpu_req_bits_vec,
output rocc_fpu_resp_ready,
input rocc_fpu_resp_valid,
input [fLen:0] rocc_fpu_resp_bits_data,
Expand Down
5 changes: 3 additions & 2 deletions src/main/scala/rocket/AMOALU.scala
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ import org.chipsalliance.cde.config.Parameters
class StoreGen(typ: UInt, addr: UInt, dat: UInt, maxSize: Int) {
val size = Wire(UInt(log2Up(log2Up(maxSize)+1).W))
size := typ
val dat_padded = dat.pad(maxSize*8)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do this padding?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maxSize is the width of the memory word. In configurations with wide DCache, maxSize > dat, since dat is xLen.

This is needed to handle that case.

def misaligned: Bool =
(addr & ((1.U << size) - 1.U)(log2Up(maxSize)-1,0)).orR

Expand All @@ -24,8 +25,8 @@ class StoreGen(typ: UInt, addr: UInt, dat: UInt, maxSize: Int) {
}

protected def genData(i: Int): UInt =
if (i >= log2Up(maxSize)) dat
else Mux(size === i.U, Fill(1 << (log2Up(maxSize)-i), dat((8 << i)-1,0)), genData(i+1))
if (i >= log2Up(maxSize)) dat_padded
else Mux(size === i.U, Fill(1 << (log2Up(maxSize)-i), dat_padded((8 << i)-1,0)), genData(i+1))

def data = genData(0)
def wordData = genData(2)
Expand Down
2 changes: 2 additions & 0 deletions src/main/scala/rocket/CSR.scala
Original file line number Diff line number Diff line change
Expand Up @@ -245,6 +245,7 @@ class CSRDecodeIO(implicit p: Parameters) extends CoreBundle {
val fp_illegal = Output(Bool())
val vector_illegal = Output(Bool())
val fp_csr = Output(Bool())
val vector_csr = Output(Bool())
val rocc_illegal = Output(Bool())
val read_illegal = Output(Bool())
val write_illegal = Output(Bool())
Expand Down Expand Up @@ -914,6 +915,7 @@ class CSRFile(
io_dec.fp_illegal := io.status.fs === 0.U || reg_mstatus.v && reg_vsstatus.fs === 0.U || !reg_misa('f'-'a')
io_dec.vector_illegal := io.status.vs === 0.U || reg_mstatus.v && reg_vsstatus.vs === 0.U || !reg_misa('v'-'a')
io_dec.fp_csr := decodeFast(fp_csrs.keys.toList)
io_dec.vector_csr := decodeFast(vector_csrs.keys.toList)
io_dec.rocc_illegal := io.status.xs === 0.U || reg_mstatus.v && reg_vsstatus.xs === 0.U || !reg_misa('x'-'a')
val csr_addr_legal = reg_mstatus.prv >= CSR.mode(addr) ||
usingHypervisor.B && !reg_mstatus.v && reg_mstatus.prv === PRV.S.U && CSR.mode(addr) === PRV.H.U
Expand Down
3 changes: 2 additions & 1 deletion src/main/scala/rocket/DCache.scala
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@ class DCache(staticIdForMetadataUseOnly: Int, val crossing: ClockCrossingType)(i

class DCacheTLBPort(implicit p: Parameters) extends CoreBundle()(p) {
val req = Flipped(Decoupled(new TLBReq(coreDataBytes.log2)))
val s1_resp = Output(new TLBResp)
val s1_resp = Output(new TLBResp(coreDataBytes.log2))
val s2_kill = Input(Bool())
}

Expand Down Expand Up @@ -926,6 +926,7 @@ class DCacheModule(outer: DCache) extends HellaCacheModule(outer) {
val s1_isSlavePortAccess = s1_req.no_xcpt
val s2_isSlavePortAccess = s2_req.no_xcpt
io.cpu.ordered := !(s1_valid && !s1_isSlavePortAccess || s2_valid && !s2_isSlavePortAccess || cached_grant_wait || uncachedInFlight.asUInt.orR)
io.cpu.store_pending := (cached_grant_wait && isWrite(s2_req.cmd)) || uncachedInFlight.asUInt.orR

val s1_xcpt_valid = tlb.io.req.valid && !s1_isSlavePortAccess && !s1_nack
io.cpu.s2_xcpt := Mux(RegNext(s1_xcpt_valid), s2_tlb_xcpt, 0.U.asTypeOf(s2_tlb_xcpt))
Expand Down
3 changes: 0 additions & 3 deletions src/main/scala/rocket/DebugROB.scala
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,6 @@ class WidenedTracedInstruction extends Bundle {
// These is not synthesizable, they use a C++ blackbox to implement the
// write-back reordering
class DebugROBPushTrace(implicit val p: Parameters) extends BlackBox with HasBlackBoxResource with HasCoreParameters {
require(traceHasWdata && (vLen max xLen) <= 512)
val io = IO(new Bundle {
val clock = Input(Clock())
val reset = Input(Bool())
Expand All @@ -45,7 +44,6 @@ class DebugROBPushTrace(implicit val p: Parameters) extends BlackBox with HasBla

class DebugROBPushWb(implicit val p: Parameters) extends BlackBox
with HasBlackBoxResource with HasCoreParameters {
require(traceHasWdata && (vLen max xLen) <= 512)
val io = IO(new Bundle {
val clock = Input(Clock())
val reset = Input(Bool())
Expand All @@ -59,7 +57,6 @@ class DebugROBPushWb(implicit val p: Parameters) extends BlackBox
}

class DebugROBPopTrace(implicit val p: Parameters) extends BlackBox with HasBlackBoxResource with HasCoreParameters {
require(traceHasWdata && (vLen max xLen) <= 512)
val io = IO(new Bundle {
val clock = Input(Clock())
val reset = Input(Bool())
Expand Down
1 change: 1 addition & 0 deletions src/main/scala/rocket/HellaCache.scala
Original file line number Diff line number Diff line change
Expand Up @@ -187,6 +187,7 @@ class HellaCacheIO(implicit p: Parameters) extends CoreBundle()(p) {
val s2_gpa_is_pte = Input(Bool())
val uncached_resp = tileParams.dcache.get.separateUncachedResp.option(Flipped(Decoupled(new HellaCacheResp)))
val ordered = Input(Bool())
val store_pending = Input(Bool()) // there is a store in a store buffer somewhere
val perf = Input(new HellaCachePerfEvents())

val keep_clock_enabled = Output(Bool()) // should D$ avoid clock-gating itself?
Expand Down
1 change: 1 addition & 0 deletions src/main/scala/rocket/HellaCacheArbiter.scala
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,7 @@ class HellaCacheArbiter(n: Int)(implicit p: Parameters) extends Module
io.requestor(i).s2_gpa := io.mem.s2_gpa
io.requestor(i).s2_gpa_is_pte := io.mem.s2_gpa_is_pte
io.requestor(i).ordered := io.mem.ordered
io.requestor(i).store_pending := io.mem.store_pending
io.requestor(i).perf := io.mem.perf
io.requestor(i).s2_nack := io.mem.s2_nack && s2_id === i.U
io.requestor(i).s2_nack_cause_raw := io.mem.s2_nack_cause_raw
Expand Down
10 changes: 10 additions & 0 deletions src/main/scala/rocket/IDecode.scala
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ class IntCtrlSigs(aluFn: ALUFN = ALUFN())(implicit val p: Parameters) extends Bu
val fence = Bool()
val amo = Bool()
val dp = Bool()
val vec = Bool()

def default: List[BitPat] =
// jal renf1 fence.i
Expand Down Expand Up @@ -433,6 +434,15 @@ class D64Decode(aluFn: ALUFN = ALUFN())(implicit val p: Parameters) extends Deco
FCVT_D_LU-> List(Y,Y,N,N,N,N,N,Y,A2_X, A1_RS1, IMM_X, DW_X, aluFn.FN_X, N,M_X, N,N,N,Y,N,N,N,CSR.N,N,N,N,Y))
}

class VCFGDecode(aluFn: ALUFN = ALUFN())(implicit val p: Parameters) extends DecodeConstants
{
val table: Array[(BitPat, List[BitPat])] = Array(
VSETVLI -> List(Y,N,N,N,N,N,N,Y,A2_X, A1_X, IMM_X, DW_X, aluFn.FN_X, N,M_X, N,N,N,N,N,N,Y,CSR.N,N,N,N,N),
VSETIVLI -> List(Y,N,N,N,N,N,N,N,A2_X, A1_X, IMM_X, DW_X, aluFn.FN_X, N,M_X, N,N,N,N,N,N,Y,CSR.N,N,N,N,N),
VSETVL -> List(Y,N,N,N,N,N,Y,Y,A2_X, A1_X, IMM_X, DW_X, aluFn.FN_X, N,M_X, N,N,N,N,N,N,Y,CSR.N,N,N,N,N))
}


class RoCCDecode(aluFn: ALUFN = ALUFN())(implicit val p: Parameters) extends DecodeConstants
{
val table: Array[(BitPat, List[BitPat])] = Array(
Expand Down
6 changes: 6 additions & 0 deletions src/main/scala/rocket/NBDcache.scala
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,7 @@ class IOMSHR(id: Int)(implicit edge: TLEdgeOut, p: Parameters) extends L1HellaCa
val mem_access = Decoupled(new TLBundleA(edge.bundle))
val mem_ack = Flipped(Valid(new TLBundleD(edge.bundle)))
val replay_next = Output(Bool())
val store_pending = Output(Bool())
})

def beatOffset(addr: UInt) = addr.extract(beatOffBits - 1, wordOffBits)
Expand Down Expand Up @@ -119,6 +120,7 @@ class IOMSHR(id: Int)(implicit edge: TLEdgeOut, p: Parameters) extends L1HellaCa
io.resp.bits.data_word_bypass := loadgen.wordData
io.resp.bits.store_data := req.data
io.resp.bits.replay := true.B
io.store_pending := state =/= s_idle && isWrite(req.cmd)

when (io.req.fire) {
req := io.req.bits
Expand Down Expand Up @@ -335,6 +337,7 @@ class MSHRFile(implicit edge: TLEdgeOut, p: Parameters) extends L1HellaCacheModu
val probe_rdy = Output(Bool())
val fence_rdy = Output(Bool())
val replay_next = Output(Bool())
val store_pending = Output(Bool())
})

// determine if the request is cacheable or not
Expand Down Expand Up @@ -443,6 +446,8 @@ class MSHRFile(implicit edge: TLEdgeOut, p: Parameters) extends L1HellaCacheModu
TLArbiter.lowestFromSeq(edge, io.mem_acquire, mshrs.map(_.io.mem_acquire) ++ mmios.map(_.io.mem_access))
TLArbiter.lowestFromSeq(edge, io.mem_finish, mshrs.map(_.io.mem_finish))

io.store_pending := sdq_val =/= 0.U || mmios.map(_.io.store_pending).orR

io.resp <> resp_arb.io.out
io.req.ready := Mux(!cacheable,
mmio_rdy,
Expand Down Expand Up @@ -1051,6 +1056,7 @@ class NonBlockingDCacheModule(outer: NonBlockingDCache) extends HellaCacheModule
io.cpu.resp.bits.data_word_bypass := loadgen.wordData
io.cpu.resp.bits.data_raw := s2_data_word
io.cpu.ordered := mshrs.io.fence_rdy && !s1_valid && !s2_valid
io.cpu.store_pending := mshrs.io.store_pending
io.cpu.replay_next := (s1_replay && s1_read) || mshrs.io.replay_next

val s1_xcpt_valid = dtlb.io.req.valid && !s1_nack
Expand Down
Loading
Loading