Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ara integration in Cheshire (1) #112

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 12 additions & 1 deletion Bender.lock
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,17 @@ packages:
dependencies:
- apb
- register_interface
ara:
revision: 3dad93de70c6bb20c4c0b20780d96d4243f94136
version: null
source:
Git: https://github.com/pulp-platform/ara.git
dependencies:
- apb
- axi
- common_cells
- cva6
- tech_cells_generic
axi:
revision: ac5deb3ff086aa34b168f392c051e92603d6c0e2
version: 0.39.2
Expand Down Expand Up @@ -92,7 +103,7 @@ packages:
Git: https://github.com/pulp-platform/common_verification.git
dependencies: []
cva6:
revision: 9338c2ca7cf1a47aef54322f89ce867825c3c8d5
revision: 99ae53bde1a94b90c1d9bbbe7fe272a9336200a6
version: null
source:
Git: https://github.com/pulp-platform/cva6.git
Expand Down
3 changes: 2 additions & 1 deletion Bender.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,8 @@ dependencies:
clint: { git: "https://github.com/pulp-platform/clint.git", version: 0.2.0 }
common_cells: { git: "https://github.com/pulp-platform/common_cells.git", version: 1.33.0 }
common_verification: { git: "https://github.com/pulp-platform/common_verification.git", version: 0.2.0 }
cva6: { git: "https://github.com/pulp-platform/cva6.git", rev: pulp-v1.0.0 }
cva6: { git: "https://github.com/pulp-platform/cva6.git", rev: pulp-v1 }
Copy link
Collaborator

@paulsc96 paulsc96 Oct 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is better to make new release here; pointing to a moving branch head is not a good idea.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have released a second PR that will have tagged commits on main branches.

ara: { git: "https://github.com/pulp-platform/ara.git", rev: mp/cva6-pulpv1/rebase }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also be merged and get a release before merging this PR.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have released a second PR that will have tagged commits on main branches.

iDMA: { git: "https://github.com/pulp-platform/iDMA.git", version: 0.5.1 }
irq_router: { git: "https://github.com/pulp-platform/irq_router.git", version: 0.0.1-beta.1 }
opentitan_peripherals: { git: "https://github.com/pulp-platform/opentitan_peripherals.git", version: 0.4.0 }
Expand Down
2 changes: 1 addition & 1 deletion cheshire.mk
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ chs-clean-deps:
######################

CHS_NONFREE_REMOTE ?= [email protected]:pulp-restricted/cheshire-nonfree.git
CHS_NONFREE_COMMIT ?= f731b17
CHS_NONFREE_COMMIT ?= dc0a4e4c

CHS_PHONY += chs-nonfree-init
chs-nonfree-init:
Expand Down
1,221 changes: 603 additions & 618 deletions docs/img/arch.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
14 changes: 14 additions & 0 deletions docs/um/arch.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ Cheshire is highly configurable; available features and resources depend on its
- **Cores**:
- Up to 31 Linux-capable CVA6 cores with self-invalidation-based coherence
- A RISC-V debug module with JTAG transport
- An Ara RISC-V vector accelerator

- **Peripherals**:
- Various standard IO interfaces (UART, I2C, SPI, and GPIOs)
Expand Down Expand Up @@ -135,6 +136,19 @@ Cheshire defaults on using CVA6 with hypervisor and CLIC support enabled; RV32 c

Each CVA6 core is a standalone AXI4 manager at the crossbar. Coherence is maintained through a self-invalidation scheme and RISC-V atomics are handled through a custom, user-channel-based AXI4 extension. For the latter, we wrap the cores and other managers to give each a default user channel assignment and, for atomics-capable managers, a unique ID on a slice of user bits.

### Ara Vector Accelerator

[Ara](https://github.com/pulp-platform/ara) is a RISC-V V vector coprocessor tightly coupled with CVA6. Ara can be instantiated in Cheshire to enable RISC-V V support. Ara exposes the following parameters:

| Parameter | Type / Range | Description |
| ------------------------ | ------------ | --------------------------------------------------------------- |
| `Ara` | `bit` | Enable the Ara Vector Accelerator |
| `AraNrLanes` | `byte_bt` | Number of parallel vector lanes in Ara |
| `AraVlen` | `word_bt` | RISC-V V VLEN parameter (default vector register length in bit) |
| `AraParMemReq` | `byte_bt` | Number of possible outstanding memory requests from Ara |

Ara has a private AXI memory port resized to 64-bit to fit the current L2 memory bandwidth. Currently, we tested 2-lane Ara instances with `VLEN = 2048` without performance loss. Higher lane counts will instantiate a memory port with bandwidth greater than 64-bit/cycle. However, this will be bottlenecked by the current memory bandwidth (64-bit/cycle).

### Interconnect

The interconnect is composed of a main [AXI4](https://github.com/pulp-platform/axi) crossbar with AXI5 atomic operations (ATOPs) support and an auxiliary [Regbus](https://github.com/pulp-platform/register_interface) demultiplexer providing access to numerous peripherals and configuration interfaces. The Regbus has a static data width of 32 bit.
Expand Down
16 changes: 14 additions & 2 deletions hw/cheshire_pkg.sv
Original file line number Diff line number Diff line change
Expand Up @@ -138,6 +138,7 @@ package cheshire_pkg;
bit Clic;
bit IrqRouter;
bit BusErr;
bit Ara;
// Parameters for Debug Module
jtag_idcode_t DbgIdCode;
dw_bt DbgMaxReqs;
Expand Down Expand Up @@ -196,6 +197,10 @@ package cheshire_pkg;
aw_bt AxiRtNumAddrRegions;
bit AxiRtCutPaths;
bit AxiRtEnableChecks;
// Parameters for Ara
byte_bt AraNrLanes;
word_bt AraVlen;
byte_bt AraParMemReq;
} cheshire_cfg_t;

//////////////////
Expand Down Expand Up @@ -295,6 +300,7 @@ package cheshire_pkg;
typedef struct packed {
aw_bt [2**MaxCoresWidth-1:0] cores;
aw_bt dbg;
aw_bt ara;
aw_bt dma;
aw_bt slink;
aw_bt vga;
Expand All @@ -308,6 +314,7 @@ package cheshire_pkg;
int unsigned i = 0;
for (int j = 0; j < cfg.NumCores; j++) begin ret.cores[i] = i; i++; end
ret.dbg = i;
if (cfg.Ara) begin i++; ret.ara = i; end
if (cfg.Dma) begin i++; ret.dma = i; end
if (cfg.SerialLink) begin i++; ret.slink = i; end
if (cfg.Vga) begin i++; ret.vga = i; end
Expand Down Expand Up @@ -499,9 +506,9 @@ package cheshire_pkg;
XF8ALT : 0,
RVA : 1,
RVB : 0,
RVV : 0,
RVV : cfg.Ara,
RVC : 1,
RVH : 1,
RVH : ~cfg.Ara,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are complex parameterization constraints affecting the rest of the system (which is okay generally), we need to talk these over offline and document them carefully.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can discuss this offline!

RVZCB : 1,
XFVec : 0,
CvxifEn : 0,
Expand Down Expand Up @@ -612,6 +619,7 @@ package cheshire_pkg;
Clic : 0,
IrqRouter : 0,
BusErr : 1,
Ara : 0,
// Debug
DbgIdCode : CheshireIdCode,
DbgMaxReqs : 4,
Expand Down Expand Up @@ -669,6 +677,10 @@ package cheshire_pkg;
AxiRtWBufferDepth : 16,
AxiRtNumAddrRegions : 2,
AxiRtCutPaths : 1,
// Ara
AraNrLanes : 2,
AraVlen : 2048,
AraParMemReq : 4,
// All non-set values should be zero
default: '0
};
Expand Down
128 changes: 126 additions & 2 deletions hw/cheshire_soc.sv
Original file line number Diff line number Diff line change
Expand Up @@ -556,6 +556,26 @@ module cheshire_soc import cheshire_pkg::*; #(

// TODO: Implement X interface support

// Accelerator ports
acc_pkg::accelerator_req_t acc_req;
acc_pkg::accelerator_resp_t acc_resp;

// CVA6-Ara memory consistency
logic acc_cons_en;
logic [Cfg.AddrWidth-1:0] inval_addr;
logic inval_valid;
logic inval_ready;

// Pack invalidation interface into acc interface
acc_pkg::accelerator_resp_t acc_resp_pack;
always_comb begin : pack_inval
acc_resp_pack = acc_resp;
acc_resp_pack.inval_valid = inval_valid;
acc_resp_pack.inval_addr = inval_addr;
inval_ready = acc_req.inval_ready;
acc_cons_en = acc_req.acc_cons_en;
end

`CHESHIRE_TYPEDEF_AXI_CT(axi_cva6, addr_t, cva6_id_t, axi_data_t, axi_strb_t, axi_user_t)

localparam config_pkg::cva6_cfg_t Cva6Cfg = gen_cva6_cfg(Cfg);
Expand Down Expand Up @@ -606,6 +626,8 @@ module cheshire_soc import cheshire_pkg::*; #(
.axi_w_chan_t ( axi_cva6_w_chan_t ),
.b_chan_t ( axi_cva6_b_chan_t ),
.r_chan_t ( axi_cva6_r_chan_t ),
.cvxif_req_t ( acc_pkg::accelerator_req_t ),
.cvxif_resp_t ( acc_pkg::accelerator_resp_t ),
.noc_req_t ( axi_cva6_req_t ),
.noc_resp_t ( axi_cva6_rsp_t )
) i_core_cva6 (
Expand All @@ -626,8 +648,8 @@ module cheshire_soc import cheshire_pkg::*; #(
.clic_kill_req_i ( clic_irq_kill_req ),
.clic_kill_ack_o ( clic_irq_kill_ack ),
.rvfi_probes_o ( ),
.cvxif_req_o ( ),
.cvxif_resp_i ( '0 ),
.cvxif_req_o ( acc_req ),
.cvxif_resp_i ( acc_resp_pack ),
.noc_req_o ( core_out_req ),
.noc_resp_i ( core_out_rsp )
);
Expand Down Expand Up @@ -747,6 +769,105 @@ module cheshire_soc import cheshire_pkg::*; #(
.mst_req_o ( axi_in_req[AxiIn.cores[i]] ),
.mst_resp_i ( axi_in_rsp[AxiIn.cores[i]] )
);

// Generate Ara RVV vector processor if enabled
if (Cfg.Ara) begin : gen_ara
// Configure Ara with the right AXI id width
typedef logic [Cfg.AxiMstIdWidth-1:0] ara_id_t;
// Default Ara AXI data width
localparam int unsigned AraDataWideWidth = 32 * Cfg.AraNrLanes;
typedef logic [AraDataWideWidth -1 : 0] axi_ara_wide_data_t;
typedef logic [AraDataWideWidth/8 -1 : 0] axi_ara_wide_strb_t;
`AXI_TYPEDEF_ALL(
axi_ara_wide, addr_t, ara_id_t, axi_ara_wide_data_t, axi_ara_wide_strb_t, axi_user_t)
axi_ara_wide_req_t axi_ara_wide_req_inval, axi_ara_wide_req;
axi_ara_wide_resp_t axi_ara_wide_resp_inval, axi_ara_wide_resp;

axi_mst_req_t axi_ara_narrow_req;
axi_mst_rsp_t axi_ara_narrow_resp;

ara #(
.NrLanes ( Cfg.AraNrLanes ),
.VLEN ( Cfg.AraVlen ),
.AxiDataWidth ( AraDataWideWidth ),
.AxiAddrWidth ( Cfg.AddrWidth ),
.axi_ar_t ( axi_ara_wide_ar_chan_t ),
.axi_r_t ( axi_ara_wide_r_chan_t ),
.axi_aw_t ( axi_ara_wide_aw_chan_t ),
.axi_w_t ( axi_ara_wide_w_chan_t ),
.axi_b_t ( axi_ara_wide_b_chan_t ),
.axi_req_t ( axi_ara_wide_req_t ),
.axi_resp_t ( axi_ara_wide_resp_t )
) i_ara (
.clk_i ( clk_i ),
.rst_ni ( rst_ni ),
.scan_enable_i ( 1'b0 ),
.scan_data_i ( 1'b0 ),
.scan_data_o ( /* Unused */ ),
.acc_req_i ( acc_req ),
.acc_resp_o ( acc_resp ),
.axi_req_o ( axi_ara_wide_req ),
.axi_resp_i ( axi_ara_wide_resp )
);

// Issue invalidations to CVA6 L1D$
axi_inval_filter #(
.MaxTxns ( Cfg.AraParMemReq ),
.AddrWidth ( Cfg.AddrWidth ),
.L1LineWidth( ariane_pkg::DCACHE_LINE_WIDTH/8 ),
.aw_chan_t ( axi_ara_wide_aw_chan_t ),
.req_t ( axi_ara_wide_req_t ),
.resp_t ( axi_ara_wide_resp_t )
) i_ara_axi_inval_filter (
.clk_i ( clk_i ),
.rst_ni ( rst_ni ),
.en_i ( acc_cons_en ),
.slv_req_i ( axi_ara_wide_req ),
.slv_resp_o ( axi_ara_wide_resp ),
.mst_req_o ( axi_ara_wide_req_inval ),
.mst_resp_i ( axi_ara_wide_resp_inval ),
.inval_addr_o ( inval_addr ),
.inval_valid_o( inval_valid ),
.inval_ready_i( inval_ready )
);

// Convert from AraDataWideWidth (axi_ara_wide) to Cfg.AxiDataWidth (axi_ara_narrow)
axi_dw_converter #(
.AxiSlvPortDataWidth ( AraDataWideWidth ),
.AxiMstPortDataWidth ( Cfg.AxiDataWidth ),
.AxiMaxReads ( Cfg.AraParMemReq ),
.AxiAddrWidth ( Cfg.AddrWidth ),
.AxiIdWidth ( Cfg.AxiMstIdWidth ),
.aw_chan_t ( axi_ara_wide_aw_chan_t ),
.mst_w_chan_t ( axi_mst_w_chan_t ),
.slv_w_chan_t ( axi_ara_wide_w_chan_t ),
.b_chan_t ( axi_ara_wide_b_chan_t ),
.ar_chan_t ( axi_ara_wide_ar_chan_t ),
.mst_r_chan_t ( axi_mst_r_chan_t ),
.slv_r_chan_t ( axi_ara_wide_r_chan_t ),
.axi_mst_req_t ( axi_mst_req_t ),
.axi_mst_resp_t ( axi_mst_rsp_t ),
.axi_slv_req_t ( axi_ara_wide_req_t ),
.axi_slv_resp_t ( axi_ara_wide_resp_t )
) i_ara_axi_dw_converter (
.clk_i ( clk_i ),
.rst_ni ( rst_ni ),
.slv_req_i ( axi_ara_wide_req_inval ),
.slv_resp_o ( axi_ara_wide_resp_inval ),
.mst_req_o ( axi_ara_narrow_req ),
.mst_resp_i ( axi_ara_narrow_resp )
);

// Assign to crossbar input/master
assign axi_in_req[AxiIn.ara] = axi_ara_narrow_req;
assign axi_ara_narrow_resp = axi_in_rsp[AxiIn.ara];

end else begin : gen_no_ara
// Tie-to-safe the Ara-related signals
assign acc_resp = '0;
assign inval_valid = '0;
assign inval_addr = '0;
end
mp-17 marked this conversation as resolved.
Show resolved Hide resolved
end

/////////////////////////
Expand Down Expand Up @@ -1729,4 +1850,7 @@ module cheshire_soc import cheshire_pkg::*; #(
// TODO: many other things I most likely forgot
// TODO: check that LLC only exists if its output is connected (the reverse is allowed)

if (Cfg.Ara && (NumIntHarts > 1))
$error("Ara is only compatible with a single-core architecture.");

endmodule
12 changes: 11 additions & 1 deletion target/sim/src/tb_cheshire_pkg.sv
Original file line number Diff line number Diff line change
Expand Up @@ -16,11 +16,21 @@ package tb_cheshire_pkg;
return ret;
endfunction

// A dedicated Ara config
function automatic cheshire_cfg_t gen_cheshire_ara_cfg();
cheshire_cfg_t ret = DefaultCfg;
ret.Ara = 1;
ret.AraNrLanes = 2;
ret.AraVlen = 2048;
return ret;
endfunction

// Number of Cheshire configurations
localparam int unsigned NumCheshireConfigs = 32'd2;
localparam int unsigned NumCheshireConfigs = 32'd3;

// Assemble a configuration array indexed by a numeric parameter
localparam cheshire_cfg_t [NumCheshireConfigs-1:0] TbCheshireConfigs = {
gen_cheshire_ara_cfg(), // 2: Ara-enabled configuration
gen_cheshire_rt_cfg(), // 1: RT-enabled configuration
DefaultCfg // 0: Default configuration
};
Expand Down
Loading