Skip to content
This repository has been archived by the owner on Mar 20, 2024. It is now read-only.
Andy Glew edited this page Mar 11, 2020 · 7 revisions

RISC-V V Extension Wiki

This page is intended to present design rationale and notes on future extensions that are currently out of scope.

Features deferred to V++ 64-bit instruction encoding

  • Statically encoding SEW and LMUL

  • Predicates

    • Predicating instructions with the complement of v0
    • Predicating instructions with a register other than v0

    Note, for straightforward implementations, this feature adds another regfile read port (or map-table read port for renamed implementations)

    • 2 input predicates? - useful in SIMT emulation (aggressive, interleaving diverged)
  • memory addressing modes

    • Indexed memory accesses that implicitly scale the index by SEW/8
    • Indexed memory accesses that decouple index width from data width
    • BaseReg + scale * IndexReg + offset
  • Combinatoric explosion of operand types

    • This has historically been the biggest reason why I (Ag) want more than 32 bits of instruction for vectors - all of the following are fairly simple and could fit in the RV32 format but there are just too many of them!

    • Mixed width, widening

      • e.g. vs1.8[i] * vs2.16[i] =+ vd.32[i]
        • signed X signed, signed X unsigned, unsigned X unsigned
    • DSP datatypes, with saturation

      • SS: saturate signed N bits --> signed M bits, M < N
      • UU: saturate unsigned N bits --> unsigned M bits, M < N
      • US: saturate unsigned N bits --> signed M bits, M < N
      • SU: saturate signed N bits --> unsigned M bits, M < N
        • this is ReLU, a common function in DL
        • although this particular saturation would mainly be used at the end of a dot product
          • e.g. in a reduction, or in an actual dot product
    • New FP types including instructions with Mixed FP types

      • single X single =+ double
      • FP16, BFLOAT16
        • fp16 X fp16 =+ {single, fp16}
        • bfloat16 X bfloat16 =+ {single, bfloat16}
        • fp16 X single =+ single
        • bfloat16 X single =+ single
      • eight bit floating-point types...
    • Mixed integer/fixed/floating point instructions

  • unums ??

  • complex

    • chunky or interleaved (re,im) vs (im,re)
    • planar or SOA
      • most common for existing GPU and/or vectors without complex support
      • e.g. planar vector vector ops like add needs four inputs and two outputs
        • but doing it as one instruction rather than decomposing improves ratio of compute to data movement
  • Improved "scalar" support in vector registers

    • e.g. instead of having reductions always write vd[0], and "wasting" rest of vd, specify which vector element the reduction "scalar" should be written to
      • both static, and dynamic determined by another scalar
    • similarly for "large scalars" that occupy more than one vector element * LMUL max, as occurs in some crypto instruction proposals
Clone this wiki locally