riscvarchive · kdockser · Feb 25, 2022 · Feb 25, 2022 · Feb 25, 2022 · Feb 26, 2022
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,5 @@
+*.swp
+*.vim
+.DS_Store
+
+build/
diff --git a/doc/riscv-bfloat16-appx-rationale.adoc b/doc/riscv-bfloat16-appx-rationale.adoc
@@ -0,0 +1,69 @@
+[appendix]
+[[BFloat16_appx_rationale]]
+= Extension Rationale
+
+== Format Rationale
+Various choices were made in the RISC-V BFloat16 format and behavior.
+Some of these choices are allowed by IEEE-754 and others are deviations
+from the standard
+
+=== Rounding Modes
+
+==== Round to odd
+Round to odd is not a '754 supported rounding mode. However, it avoids double
+rounding that can occur when accumulating a result in a wider format and then
+converting the result to a narrower format before subsequent usage.
+
+==== Round to nearest - even
+Round to nearest, ties to even is the default '754 rounding format. It is unbiased
+and minimize rounding error.
+
+=== Subnormal Handling
+
+=== NaN handling
+
+=== Zeros and Infinities
+
+== Instruction Rationale
+
+This section contains various rationale, design notes and usage
+recommendations for the instructions in the BFloat16 extension.
+It also tries to record how the designs of instructions were
+derived, or where they were contributed from.
+
+=== Conversion Instructions
+
+
+The most common and important conversion instructions are between BFloat16 and FP32
+(Single Precision).
+
+We chose not to have direct conversion between BFloat16 and other formats as they
+
+can typically be performed by a combination of instructions. 
+
+.Notes to software developers
+[NOTE,caption="SH"]
+====
+In some cases, for example convert from FP64 to BFloat16 there can be double rounding.
+It is up to software to eliminate such sources of error if this is important to the
+application.
+====
+
+=== FMA
+
+Fused multiply add.
+
+=== Dot Product
+
+Somewhat unaptly named, yet very useful instructions.
+
+
+.Notes to software developers
+[NOTE,caption="SH"]
+====
+Significant speedup
+
+E Pluribus Unum
+
+====
+
diff --git a/doc/riscv-bfloat16-audience.adoc b/doc/riscv-bfloat16-audience.adoc
@@ -0,0 +1,47 @@
+[[crypto_scalar_audience]]
+=== Intended Audience
+ THIS IS VERY PRELIMINARY - TO BE UPDATED
+
+Floating-point arithmetic is a specialized subject, requiring people with many different
+backgrounds to cooperate in its correct and efficient implementation.
+Where possible, we have written this specification to be understandable by
+all, though we recognize that the motivations and references to
+algorithms or other specifications and standards may be unfamiliar to those
+who are not domain experts.
+
+This specification anticipates being read and acted on by various people
+with different backgrounds.
+We have tried to capture these backgrounds
+here, with a brief explanation of what we expect them to know, and how
+it relates to the specification.
+We hope this aids people's understanding of which aspects of the specification
+are particularly relevant to them, and which they may (safely!) ignore or
+pass to a colleague.
+
+Software developers::
+These are the people we expect to write code using the instructions
+in this specification.
+They should understand fairly obviously the motivations for the
+instructions we include, and be familiar with most of the algorithms
+and outside standards to which we refer.
+
+Computer architects::
+We do  expect architects to have a floating-point background.
+We nonetheless expect architects to be able to examine our instructions
+for implementation issues, understand how the instructions will be used
+in context, and advise on how best to fit the functionality the
+cryptographers want to the ISA interface.
+
+Digital design engineers & micro-architects::
+These are the people who will implement the specification inside a
+core. Floating-point expertise is assumed as not all of the corner
+cases are pointed out in the specification.
+
+Verification engineers::
+Responsible for ensuring the correct implementation of the extension
+in hardware.
+
+
+These are by no means the only people concerned with the specification,
+but they are the ones we considered most while writing it.
+
diff --git a/doc/riscv-bfloat16-format.adoc b/doc/riscv-bfloat16-format.adoc
@@ -0,0 +1,82 @@
+[[bfloat16_format]]
+== BFloat16 Operand Format
+
+BFloat16 bits::
+[wavedrom, , svg]
+....
+{reg:[
+{bits: 1, name: 'S'},
+{bits: 5, name: 'expo'},
+{bits: 10, name: 'frac'},
+]}
+....
+
+IEEE Compliance: While BFloat16 (BF16) is not an IEEE-754 _standard_ format, it is a valid floating point format as defined by the standard. There are three parameters that specify a format: radix (b), number of digits in the significand (p), and maximum exponent (emax).
+For BF16 these values are:
+
+[%autowidth]
+.BFloat16 parameters
+|===
+|radix (b)|2
+|significand (p)|8
+|emax|127
+|===
+
+
+.Obligatory Floating Point Format Table
+[cols = "1,1,1,1,1,1,1,1"]
+|===
+|Format|Sign Bits|Expo Bits|fraction bits|padded 0s|encoding bits|expo max/bias|expo min
+
+|FP16    |1| 5|10| 0|16|  15| -14
+|BFloat16|1| 8| 7| 0|16| 127|-126
+|TF32    |1| 8|10|13|32| 127|-126
+|FP32    |1| 8|23| 0|32| 127|-126
+|FP64    |1|11|52| 0|64|1023|-1022
+|FP128   |1|15|112|0|128|16,383|-16,382
+|===
+
+== BFloat16 behaviors
+
+=== Subnormal Numbers:
+Floating-point values that are too small to be represented as normal numbers, but can still be represented by
+using the format's smallest exponent with a zero integer bit and one or more leading 0s --- and one or
+more 1s --- in the trailing fractional bits are called subnormal numbers. Basically, the idea is there is
+a trade off of precision to support _gradual underflow_.
+
+In RISC-V instructions operating on BFloat16, it is generally intended that all subnormal BFloat16 inputs
+are treated as zero and subnormal outputs are flushed to zero. The sign of the original value is retained. 
+However, it does not necessarily make sense for all BF16 instructions to follow this behavior. For
+example, there is little value in such behavior when converting between FP32 and BF16. Therefore, individual
+instructions can specify when they deviate from this behavior.
+
+While '754 doesn't support treating/flushing subnormals, many architectures have adopted such behavior
+as a reasonable simplification for certain domains.
+Furthermore, since BFloat16 has the same exponent range as FP32, supporting subnormals is  expected to
+add little value.
+
+
+===  Infinities:
+Infinities are used to represent values that are too large to be represented by the target format. These are usually produced as a result of overflows (depending on the rounding mode), but can also be provided as inputs. Infinities have a sign associated with them: there are positive infinities and negative infinities.
+
+
+Infinities are important for keeping meaningless results from being operated upon.
+
+=== NaNs
+
+NaN stands for Not a Number. These are provided as the result of an operation when it cannot be represented
+as a number or infinity. For example, performing the square root of -1 will result in a NaN because
+there is no real number that can represent the result. NaNs can also be used as inputs.
+
+There are two types of NaNs: signalling and quiet. Signalling NaNs are provided as input data since no computational instruction will ever produce this kind of a NaN. Operating on a Signalling NaN will produce an invalid operation exception. Operating on a Quiet NaN usually does not cause an exception.
+
+NaNs include a sign bit, but the bit has no meaning.
+
+NaNs are important for keeping meaningless results from being operated upon. It is best to retain them. As IEEE allows, operations should return the canonical NaN rather than be required to propagate the payload.
+
+===  Rounding Modes:
+In general, the default IEEE rounding mode (round to nearest, ties to even) works for arithmetic cases. There are some special cases where a particular instruction benefits from a different rounding mode (e.g., convert to integer, widening multiply-accumulate) - we can address this on those specific instructions.
+
+=== Handling exceptions
+Default exception handling, as defined by IEEE, is a simple and effective approach to producing results in exceptional cases. For the coder to be able to see what has happened, and take further action if needed, the BFloat16 instructions need to set floating-point exception flags the same way as all other floating-point instructions in RISC-V. 
+
diff --git a/doc/riscv-bfloat16-introduction.adoc b/doc/riscv-bfloat16-introduction.adoc
@@ -0,0 +1,45 @@
+[[BFloat16_introduction]]
+== Introduction
+
+When FP16 (officially called binary16) was first introduced by the IEEE-754 standard,
+it was just an interchange format. It was intended as a space/bandwidth efficient
+encoding that would be used to transfer information. This is in line with the Zfhmin
+proposed extension.
+
+However, there were some applications (notably graphics) that found that the smaller
+precision and dynamic range was sufficient for their space. So, FP16 started to see
+some widespread adoption as an arithmetic format. This is in line with the Zfh
+proposed extension.
+
+While it was not the intention of '754 to have FP16 be an arithmetic format, it is
+supported by the standard. Even though the '754 committee recognized that FP16 was
+gaining popularity, the committee decided to hold off on making it a basic format
+in the 2019 release. This means that a '754 compliant implementation of binary
+floating point, which needs to support at least one basic format, cannot support
+only FP16 - it needs to support at least one of binary32, binary64, and binary128.
+
+Experts working in machine learning noticed that FP16 was a much more compact way of
+storing operands and often provided sufficient precision for them. However, they also
+found that intermediate values were much better when accumulated into a higher precision.
+The final computations were then typically converted back into the more compact FP16
+encoding.  This approach has become very common in inferencing where the weights and
+activations are stored in FP16 encodings.  There was the added benefit that smaller
+multipliers could be created for the FP16's smaller number of significant bits. At this
+point, widening multiply-accumulate instructions became much more common. Also, more
+complicated dot product instructions started to show up including those that stored two
+FP16 numbers in a 32-bit register, multiplied these by another pair of FP16 numbers in
+another register, added these two products to an FP32 accumulate value in a 3rd register
+and returned an FP32 result. 
+
+Experts working in machine learning at Google who continued to work with FP32 values
+noted that the least significant 16 bits of their significands were not always needed
+for good results, even in training. They proposed a truncated version of FP32, which was
+the 16 most significant bits of the FP32 encoding. This format was named BFloat16
+(or BF16). The B in BFloat16, stands for Brain. Not only did they find that the number of
+significant bits in BF16 tended to be sufficient for their work (despite being fewer than
+in FP16), but it was very easy for them to reuse their existing data; FP32 numbers could
+be readily rounded to BF16 with a minimal amount of work. Furthermore, the even smaller
+number of the BF16 significant bits enabled even smaller multipliers to be built. Similar
+to FP16, BF16 multiply-accumulate widening and dot-product instructions started to
+proliferate.
+
diff --git a/doc/riscv-bfloat16-policies.adoc b/doc/riscv-bfloat16-policies.adoc
@@ -0,0 +1,15 @@
+[[crypto_scalar_policies]]
+=== Policies
+
+In creating this proposal, we tried to adhere to the following
+policies:
+
+* Provide a RISC-V BFloat16 definition that makes sense for how we expect
+these operands to be used in real applications.
+* Provide the basic instructions that allow implementations to leverage the
+benefits of the BFloat16 format +
+** reduced storage space - A BFloat16 operand consumes half the space of an FP32 operand +
+** higher effective storage bandwidth - Two BFloat16 operands can be transferred at the same rate as one FP32 +
+** higher computational throughput - Two BFloat16 multiplies can be performed with less logic than one FP32 +
+* Provide consistency with other approaches when this doesn't interfere with
+the above