-
Notifications
You must be signed in to change notification settings - Fork 3
Initial base for AltFP #2
base: main
Are you sure you want to change the base?
Changes from all commits
44bf540
f94139b
f5d13d8
ab80f72
3f10a10
4b497d6
12516a6
d89b343
a55b8d4
52777c4
29cb303
1c0f957
1e65e82
cf2e130
0269541
e5201c5
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
*.swp | ||
*.vim | ||
.DS_Store | ||
|
||
build/ |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,69 @@ | ||
[appendix] | ||
[[BFloat16_appx_rationale]] | ||
= Extension Rationale | ||
|
||
== Format Rationale | ||
Various choices were made in the RISC-V BFloat16 format and behavior. | ||
Some of these choices are allowed by IEEE-754 and others are deviations | ||
from the standard | ||
|
||
=== Rounding Modes | ||
|
||
==== Round to odd | ||
Round to odd is not a '754 supported rounding mode. However, it avoids double | ||
rounding that can occur when accumulating a result in a wider format and then | ||
converting the result to a narrower format before subsequent usage. | ||
|
||
==== Round to nearest - even | ||
Round to nearest, ties to even is the default '754 rounding format. It is unbiased | ||
and minimize rounding error. | ||
|
||
=== Subnormal Handling | ||
|
||
=== NaN handling | ||
|
||
=== Zeros and Infinities | ||
|
||
== Instruction Rationale | ||
|
||
This section contains various rationale, design notes and usage | ||
recommendations for the instructions in the BFloat16 extension. | ||
It also tries to record how the designs of instructions were | ||
derived, or where they were contributed from. | ||
|
||
=== Conversion Instructions | ||
|
||
|
||
The most common and important conversion instructions are between BFloat16 and FP32 | ||
(Single Precision). | ||
|
||
We chose not to have direct conversion between BFloat16 and other formats as they | ||
|
||
can typically be performed by a combination of instructions. | ||
|
||
.Notes to software developers | ||
[NOTE,caption="SH"] | ||
==== | ||
In some cases, for example convert from FP64 to BFloat16 there can be double rounding. | ||
It is up to software to eliminate such sources of error if this is important to the | ||
application. | ||
==== | ||
|
||
=== FMA | ||
|
||
Fused multiply add. | ||
|
||
=== Dot Product | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is the intention here to support dot product as a packed-SIMD style operation, or an application of FMA? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Upon reading the rest of the spec, it seems like this is intended to be used in packed-SIMD style ops. The spec should probably also define other operations on multiple packed BF16 operands (at least a note on how to load/store them - probably using standard FLW/FLD?) and how this is intended to work in general; I assume it's useful to to think about this for all operations, since just using 16 bits of a 32 or 64 bit register seems a bit wasteful. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, the intention is for dot product operations, as that is what BF16 is usually used for. It was requested that we start with the base and then we can move on to operations. I anticipate that these operations will be most useful in Vector. |
||
|
||
Somewhat unaptly named, yet very useful instructions. | ||
|
||
|
||
.Notes to software developers | ||
[NOTE,caption="SH"] | ||
==== | ||
Significant speedup | ||
|
||
E Pluribus Unum | ||
|
||
==== | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
[[crypto_scalar_audience]] | ||
=== Intended Audience | ||
THIS IS VERY PRELIMINARY - TO BE UPDATED | ||
|
||
Floating-point arithmetic is a specialized subject, requiring people with many different | ||
backgrounds to cooperate in its correct and efficient implementation. | ||
Where possible, we have written this specification to be understandable by | ||
all, though we recognize that the motivations and references to | ||
algorithms or other specifications and standards may be unfamiliar to those | ||
who are not domain experts. | ||
|
||
This specification anticipates being read and acted on by various people | ||
with different backgrounds. | ||
We have tried to capture these backgrounds | ||
here, with a brief explanation of what we expect them to know, and how | ||
it relates to the specification. | ||
We hope this aids people's understanding of which aspects of the specification | ||
are particularly relevant to them, and which they may (safely!) ignore or | ||
pass to a colleague. | ||
|
||
Software developers:: | ||
These are the people we expect to write code using the instructions | ||
in this specification. | ||
They should understand fairly obviously the motivations for the | ||
instructions we include, and be familiar with most of the algorithms | ||
and outside standards to which we refer. | ||
|
||
Computer architects:: | ||
We do expect architects to have a floating-point background. | ||
We nonetheless expect architects to be able to examine our instructions | ||
for implementation issues, understand how the instructions will be used | ||
in context, and advise on how best to fit the functionality the | ||
cryptographers want to the ISA interface. | ||
|
||
Digital design engineers & micro-architects:: | ||
These are the people who will implement the specification inside a | ||
core. Floating-point expertise is assumed as not all of the corner | ||
cases are pointed out in the specification. | ||
|
||
Verification engineers:: | ||
Responsible for ensuring the correct implementation of the extension | ||
in hardware. | ||
|
||
|
||
These are by no means the only people concerned with the specification, | ||
but they are the ones we considered most while writing it. | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,82 @@ | ||
[[bfloat16_format]] | ||
== BFloat16 Operand Format | ||
|
||
BFloat16 bits:: | ||
[wavedrom, , svg] | ||
.... | ||
{reg:[ | ||
{bits: 1, name: 'S'}, | ||
{bits: 5, name: 'expo'}, | ||
{bits: 10, name: 'frac'}, | ||
]} | ||
.... | ||
|
||
IEEE Compliance: While BFloat16 (BF16) is not an IEEE-754 _standard_ format, it is a valid floating point format as defined by the standard. There are three parameters that specify a format: radix (b), number of digits in the significand (p), and maximum exponent (emax). | ||
For BF16 these values are: | ||
|
||
[%autowidth] | ||
.BFloat16 parameters | ||
|=== | ||
|radix (b)|2 | ||
|significand (p)|8 | ||
|emax|127 | ||
|=== | ||
|
||
|
||
.Obligatory Floating Point Format Table | ||
[cols = "1,1,1,1,1,1,1,1"] | ||
|=== | ||
|Format|Sign Bits|Expo Bits|fraction bits|padded 0s|encoding bits|expo max/bias|expo min | ||
|
||
|FP16 |1| 5|10| 0|16| 15| -14 | ||
|BFloat16|1| 8| 7| 0|16| 127|-126 | ||
|TF32 |1| 8|10|13|32| 127|-126 | ||
|FP32 |1| 8|23| 0|32| 127|-126 | ||
|FP64 |1|11|52| 0|64|1023|-1022 | ||
|FP128 |1|15|112|0|128|16,383|-16,382 | ||
|=== | ||
|
||
== BFloat16 behaviors | ||
|
||
=== Subnormal Numbers: | ||
Floating-point values that are too small to be represented as normal numbers, but can still be represented by | ||
using the format's smallest exponent with a zero integer bit and one or more leading 0s --- and one or | ||
more 1s --- in the trailing fractional bits are called subnormal numbers. Basically, the idea is there is | ||
a trade off of precision to support _gradual underflow_. | ||
|
||
In RISC-V instructions operating on BFloat16, it is generally intended that all subnormal BFloat16 inputs | ||
are treated as zero and subnormal outputs are flushed to zero. The sign of the original value is retained. | ||
However, it does not necessarily make sense for all BF16 instructions to follow this behavior. For | ||
example, there is little value in such behavior when converting between FP32 and BF16. Therefore, individual | ||
instructions can specify when they deviate from this behavior. | ||
|
||
While '754 doesn't support treating/flushing subnormals, many architectures have adopted such behavior | ||
as a reasonable simplification for certain domains. | ||
Furthermore, since BFloat16 has the same exponent range as FP32, supporting subnormals is expected to | ||
add little value. | ||
|
||
|
||
=== Infinities: | ||
Infinities are used to represent values that are too large to be represented by the target format. These are usually produced as a result of overflows (depending on the rounding mode), but can also be provided as inputs. Infinities have a sign associated with them: there are positive infinities and negative infinities. | ||
|
||
|
||
Infinities are important for keeping meaningless results from being operated upon. | ||
|
||
=== NaNs | ||
|
||
NaN stands for Not a Number. These are provided as the result of an operation when it cannot be represented | ||
as a number or infinity. For example, performing the square root of -1 will result in a NaN because | ||
there is no real number that can represent the result. NaNs can also be used as inputs. | ||
|
||
There are two types of NaNs: signalling and quiet. Signalling NaNs are provided as input data since no computational instruction will ever produce this kind of a NaN. Operating on a Signalling NaN will produce an invalid operation exception. Operating on a Quiet NaN usually does not cause an exception. | ||
|
||
NaNs include a sign bit, but the bit has no meaning. | ||
|
||
NaNs are important for keeping meaningless results from being operated upon. It is best to retain them. As IEEE allows, operations should return the canonical NaN rather than be required to propagate the payload. | ||
|
||
=== Rounding Modes: | ||
In general, the default IEEE rounding mode (round to nearest, ties to even) works for arithmetic cases. There are some special cases where a particular instruction benefits from a different rounding mode (e.g., convert to integer, widening multiply-accumulate) - we can address this on those specific instructions. | ||
|
||
Comment on lines
+78
to
+79
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does this mean we intend to have a static rounding mode force to RNE by default and only allow rounding-mode static (opcode) or dynamic (csr) selection on a specific subset of instructions ? This seems to be in contradiction with F and D extensions and should be justified here IMHO. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, these instructions would have a static rounding mode that is not overridable. Yes, this is different from '754. However, it is a common simplification (just like flushing subnormals). If someone needs more control over the rounding mode they can run in SP (F). I agree that we will need to provide a detailed justification in the specification for this simplification. |
||
=== Handling exceptions | ||
Default exception handling, as defined by IEEE, is a simple and effective approach to producing results in exceptional cases. For the coder to be able to see what has happened, and take further action if needed, the BFloat16 instructions need to set floating-point exception flags the same way as all other floating-point instructions in RISC-V. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This formulation may not be future proof, we may want to cite explicitly the basic floating-point extensions here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Which area are concerned about: the rounding mode, default exception handling, or both? Should the need arise, an extension could be added that allows the rounding mode to be changed by the CSR. The handling of exceptions via the IEEE default is common across RISC-V. Is this what you mean about citing the basic floating-point extensions? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am concerned that another floating-point extension may be introduced with a different way of managing FP exception flags, making "as all other floating-point instructions ..." missleading. So mentioning explicitly that we intend to managed them as extension F and D (and Q), clarifies things if such an extension should appear at some point. I agree that the use of such remark may be limited. |
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
[[BFloat16_introduction]] | ||
== Introduction | ||
|
||
When FP16 (officially called binary16) was first introduced by the IEEE-754 standard, | ||
it was just an interchange format. It was intended as a space/bandwidth efficient | ||
encoding that would be used to transfer information. This is in line with the Zfhmin | ||
proposed extension. | ||
|
||
However, there were some applications (notably graphics) that found that the smaller | ||
precision and dynamic range was sufficient for their space. So, FP16 started to see | ||
some widespread adoption as an arithmetic format. This is in line with the Zfh | ||
proposed extension. | ||
|
||
While it was not the intention of '754 to have FP16 be an arithmetic format, it is | ||
supported by the standard. Even though the '754 committee recognized that FP16 was | ||
gaining popularity, the committee decided to hold off on making it a basic format | ||
in the 2019 release. This means that a '754 compliant implementation of binary | ||
floating point, which needs to support at least one basic format, cannot support | ||
only FP16 - it needs to support at least one of binary32, binary64, and binary128. | ||
|
||
Experts working in machine learning noticed that FP16 was a much more compact way of | ||
storing operands and often provided sufficient precision for them. However, they also | ||
found that intermediate values were much better when accumulated into a higher precision. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. does better means accuracte here ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, it does. I will clarify and elaborate (a little). Also, thanks for catching the typos. |
||
The final computations were then typically converted back into the more compact FP16 | ||
encoding. This approach has become very common in inferencing where the weights and | ||
activations are stored in FP16 encodings. There was the added benefit that smaller | ||
multipliers could be created for the FP16's smaller number of significant bits. At this | ||
point, widening multiply-accumulate instructions became much more common. Also, more | ||
complicated dot product instructions started to show up including those that stored two | ||
FP16 numbers in a 32-bit register, multiplied these by another pair of FP16 numbers in | ||
another register, added these two products to an FP32 accumulate value in a 3rd register | ||
and returned an FP32 result. | ||
|
||
Experts working in machine learning at Google who continued to work with FP32 values | ||
noted that the least significant 16 bits of their significands were not always needed | ||
for good results, even in training. They proposed a truncated version of FP32, which was | ||
the 16 most significant bits of the FP32 encoding. This format was named BFloat16 | ||
(or BF16). The B in BFloat16, stands for Brain. Not only did they find that the number of | ||
significant bits in BF16 tended to be sufficient for their work (despite being fewer than | ||
in FP16), but it was very easy for them to reuse their existing data; FP32 numbers could | ||
be readily rounded to BF16 with a minimal amount of work. Furthermore, the even smaller | ||
number of the BF16 significant bits enabled even smaller multipliers to be built. Similar | ||
to FP16, BF16 multiply-accumulate widening and dot-product instructions started to | ||
proliferate. | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
[[crypto_scalar_policies]] | ||
=== Policies | ||
|
||
In creating this proposal, we tried to adhere to the following | ||
policies: | ||
|
||
* Provide a RISC-V BFloat16 definition that makes sense for how we expect | ||
these operands to be used in real applications. | ||
* Provide the basic instructions that allow implementations to leverage the | ||
benefits of the BFloat16 format + | ||
** reduced storage space - A BFloat16 operand consumes half the space of an FP32 operand + | ||
** higher effective storage bandwidth - Two BFloat16 operands can be transferred at the same rate as one FP32 + | ||
** higher computational throughput - Two BFloat16 multiplies can be performed with less logic than one FP32 + | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We could even add than one BFloat16 multiply can be done with less logic than one FP16 (mostly due to multiplier area gains) |
||
* Provide consistency with other approaches when this doesn't interfere with | ||
the above |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we intend to list (and check) those combinations of instructions in the actual specification ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We intend to check that it is possible to move between these other formats. This would likely be in an appendix rather than as a part of the specification proper.