From 44bf540faae63c151863d547102133d250df2d1f Mon Sep 17 00:00:00 2001 From: Ken Dockser <37552326+kdockser@users.noreply.github.com> Date: Fri, 25 Feb 2022 16:09:28 -0600 Subject: [PATCH 01/14] Add files via upload Inital draft --- doc/riscv-bfloat16-appx-rationale.adoc | 66 +++++++++++ doc/riscv-bfloat16-audience.adoc | 47 ++++++++ doc/riscv-bfloat16-format.adoc | 70 ++++++++++++ doc/riscv-bfloat16-introduction.adoc | 45 ++++++++ doc/riscv-bfloat16-policies.adoc | 15 +++ doc/riscv-bfloat16-spec.adoc | 152 +++++++++++++++++++++++++ 6 files changed, 395 insertions(+) create mode 100644 doc/riscv-bfloat16-appx-rationale.adoc create mode 100644 doc/riscv-bfloat16-audience.adoc create mode 100644 doc/riscv-bfloat16-format.adoc create mode 100644 doc/riscv-bfloat16-introduction.adoc create mode 100644 doc/riscv-bfloat16-policies.adoc create mode 100644 doc/riscv-bfloat16-spec.adoc diff --git a/doc/riscv-bfloat16-appx-rationale.adoc b/doc/riscv-bfloat16-appx-rationale.adoc new file mode 100644 index 0000000..78e27c7 --- /dev/null +++ b/doc/riscv-bfloat16-appx-rationale.adoc @@ -0,0 +1,66 @@ +[appendix] +[[BFloat16_appx_rationale]] += Extension Rationale + +== Format Rationale +Various choices were made in the RISC-V BFloat16 format and behavior. +Some of these choices are allowed by IEEE-754 and others are deviations +from the standard + +=== Rounding Modes + +==== Round to odd +Round to odd is not a '754 supported rounding mode. However, it avoids double +rounding can occur when accumulating a result in a wider format and then +converting the result to a narrower format before subsequent usage. + +==== Round to nearest - even +Round to nearest, ties to even is the default '754 rounding format. It is unbiased +and minimze rounding error. + +=== Subnormal Handling + +=== NaN handling + +=== Zeros and Infinities + +== Instruction Rationale + +This section contains various rationale, design notes and usage +recommendations for the instructions in the BFloat16 extension. +It also tries to record how the designs of instructions were +derived, or where they were contributed from. + +=== Conversions Instructions + +The most common and important conversion instructions are between BFloat16 and FP32 +(Single Precision). + +We chose not to have direct conversion between BFloa16 and other formats as they +can typcially be performed by a combination of instructions. + +.Notes to software developers +[NOTE,caption="SH"] +==== +In some cases, for example convert from FP64 to BFloat16 there can be double rounding. +It is up to software to elimiante such sources of error if this is important to the +application. +==== + +=== FMA + +Fused multiply add. + +=== Dot Product + +Somewhat unaptly named, yet very useful instructions. + + +.Notes to software developers +[NOTE,caption="SH"] +==== +Signifiant speedup + +E Plurbus Unum +==== + diff --git a/doc/riscv-bfloat16-audience.adoc b/doc/riscv-bfloat16-audience.adoc new file mode 100644 index 0000000..54a65c9 --- /dev/null +++ b/doc/riscv-bfloat16-audience.adoc @@ -0,0 +1,47 @@ +[[crypto_scalar_audience]] +=== Intended Audience + THIS IS VERY PRELIMINARY - TO BE UPDATED + +FLoating-point arithmetic is a specialised subject, requiring people with many different +backgrounds to cooperate in its correct and efficient implementation. +Where possible, we have written this specification to be understandable by +all, though we recognize that the motivations and references to +algorithms or other specifications and standards may be unfamiliar to those +who are not domain experts. + +This specification anticipates being read and acted on by various people +with different backgrounds. +We have tried to capture these backgrounds +here, with a brief explanation of what we expect them to know, and how +it relates to the specification. +We hope this aids people's understanding of which aspects of the specification +are particularly relevant to them, and which they may (safely!) ignore or +pass to a colleague. + +Software developers:: +These are the people we expect to write code using the instructions +in this specification. +They should understand fairly obviously the motivations for the +instructions we include, and be familiar with most of the algorithms +and outside standards to which we refer. + +Computer architects:: +We do expect architects to have a floating-point background. +We nonetheless expect architects to be able to examine our instructions +for implementation issues, understand how the instructions will be used +in context, and advise on how best to fit the functionality the +cryptographers want to the ISA interface. + +Digital design engineers & micro-architects:: +These are the people who will implement the specification inside a +core. Floating-point expertise is assumed as not all of the corner +cases are pointed out in the specification. + +Verification engineers:: +Responsible for ensuring the correct implementation of the extension +in hardware. + + +These are by no means the only people concerned with the specification, +but they are the ones we considered most while writing it. + diff --git a/doc/riscv-bfloat16-format.adoc b/doc/riscv-bfloat16-format.adoc new file mode 100644 index 0000000..21e083c --- /dev/null +++ b/doc/riscv-bfloat16-format.adoc @@ -0,0 +1,70 @@ +[[bfloat16_format]] +== BFloat16 Operand Format + +BFloat16 bits:: +[wavedrom, , svg] +.... +{reg:[ +{bits: 1, name: 'S'}, +{bits: 5, name: 'expo'}, +{bits: 10, name: 'frac'}, +]} +.... + +IEEE Compliance: While BFloat16 (BF16) is not an IEEE-754 _standard_ format, it is a valid floating point format as defined by the standard. There are three parameters that specify a format: radix (b), number of digits in the significand (p), and maximum exponent (emax). +For BF16 these values are: + +[%autowidth] +.BFloat16 paramenters +|=== +|radix (b)|2 +|significand (p)|8 +|emax|127 +|=== + + +.Obligatory Floating Point Format Table +[cols = "1,1,1,1,1,1,1,1"] +|=== +|Format|Sign Bits|Expo Bits|fraction bits|padded 0s|encoding bits|expo max/bias|expo min + +|FP16 |1| 8| 7| 0|16| 127|-126 +|BFloat16|1| 5|10| 0|16| 15| -14 +|TF32 |1| 8|10|13|32| 127|-126 +|FP32 |1| 9|23| 0|32| 127|-126 +|FP64 |1|11|52| 0|64|1023|-1022 +|FP128 |1|15|112|0|128|16,383|-16,382 +|=== + +== BFloat16 behaviors + +=== Subnormal Numbers: +Floating-point values that are too small to represnted as normal numbers, but can still be represnted by using the format's smallest exponent with a zero integer bit and one or more leading 0s --- and one or +more 1s --- in the trailing fractional bits are called subnormal numbers. Basically, the idea is there is +a trade off of precision to support _gradual underflow_. + +In RISC-V BFloat16, all subnormal BFloat16 inputs as treated as zero and subnormal outputs are flushed to zero. The sign of the original value is retained. This is not consistent with '754' but has been found to be a suitable alternative in many workloads. Furthermore, with BFloat16's relatively large exponent range, subnormals add little value. + +=== Infinities: +Infinties are used to represnt values that are too large to be represnted by the target format. These are usually produced as a result of overflows (depending on the rounding mode), but can also be provided as inputs. Infities have a sign associated with them: there are positive infionioties and negative infinities. + +Infinities are important for keeping meaningless results from being operated upon. + +=== NaNs + +NaN stands for Not a Number. These are provided as the result of an operation when it cannot be represnted +as a number or infinity. For example, performaning the square root of -1 will result in a NaN because +there is no real number that can represent the result. NaNs can also be used as inputs. + +There are two types of NaNs: signalling and quiet. Signalling NaNs are provided as input data since no computational instruction will ever produce tis kind of a NaN. Operating on a Signalling NaN will produce an invalid operation exception. Operating on a Quiet NaN usually does not cause an exception. + +NaNs include a sign bit, but the bit has no meaning. + +NaNs are important for keeping meaningless results from being operated upon. It is best to retain them. As IEEE allows, operations should return the canonical NaN rather than be required to propagate the payload. + +=== Rounding Modes: +In general, the default IEEE rounding mode (round to nearest, ties to even) works for arithmetic cases. There are some special cases where a particular instruction benefits from a different rounding mode (e.g., convert to integer, widening multiply-accumulate) - we can address this on those specific instructions. + +=== Handling exceptions +Default exception handling, as defined by IEEE, is a simple and effective approach to producing results in exceptional cases. For the coder to be able to see what has happened, and take further action if needed, the BFloat16 instructions need to set floating-point exception flags the same way as all other floating-point instructions in RISC-V. + diff --git a/doc/riscv-bfloat16-introduction.adoc b/doc/riscv-bfloat16-introduction.adoc new file mode 100644 index 0000000..4d795bf --- /dev/null +++ b/doc/riscv-bfloat16-introduction.adoc @@ -0,0 +1,45 @@ +[[BFloat16_introduction]] +== Introduction + +When FP16 (officially called binary16) was first introduced by the IEEE-754 standard, +it was just an interchange format. It was intended as a space/bandwidth efficient +encoding that would be used to transfer information. This is in line with the Zfhmin +proposed extension. + +However, there were some applications (notably graphics) that found that the smaller +precision and dynamic range was sufficient for their space. So, FP16 started to see +some widespread adoption as an arithmetic format. This is in line with the Zfh +proposed extension. + +While it was not the intention of '754 to have FP16 be an arithmetic format, it is +supported by the standard. Eventhough the '754 committee recognized that FP16 was +gaining popularity, the committee decided to hold off on making it a basic format +in the 2019 release. This means that a '754 compliant implementation of binary +floating point, which needs to support at least one basic format, cannot support +only FP16 - it needs to support at least one of binary32, binary64, and binary128. + +Experts working in machine learning noticed that FP16 was a much more compact way of +storing operands and often provided sufficient precision for them. However, they also +found that intermediate values were much better when accumulated into a higher precision. +The final computations were then typically converted back into the more compact FP16 +encoding. This approach has become very common in inferencing where the weights and +activations are stored in FP16 encodings. There was the added benefit that smaller +multipliers could be created for the FP16's smaller number of significant bits. At this +point, widening multiply-accumulate instructions became much more common. Also, more +complicated dot product instructions started to show up including those that stored two +FP16 numbers in a 32-bit register, multiplied these by another pair of FP16 numbers in +another register, added these two products to an FP32 accumulate value in a 3rd register +and returned an FP32 result. + +Experts working in machine learning at Google who continued to work with FP32 values +noted that the least significant 16 bits of their significands were not always needed +for good results, even in training. They proposed a truncated version of FP32, which was +the 16 most significant bits of the FP32 encoding. This format was named BFloat16 +(or BF16). The B in BFloat16, stands for Brain. Not only did they find that the number of +significant bits in BF16 tended to be sufficient for their work (despite being fewer than +in FP16), but it was very easy for them to reuse their existing data; FP32 numbers could +be readily rounded to BF16 with a minimal amount of work. Furthermore, the even smaller +number of the BF16 significant bits enabled even smaller multipliers to be built. Similar +to FP16, BF16 multiply-accumulate widening and dot-product instructions started to +proliferate. + diff --git a/doc/riscv-bfloat16-policies.adoc b/doc/riscv-bfloat16-policies.adoc new file mode 100644 index 0000000..8e832f4 --- /dev/null +++ b/doc/riscv-bfloat16-policies.adoc @@ -0,0 +1,15 @@ +[[crypto_scalar_policies]] +=== Policies + +In creating this proposal, we tried to adhere to the following +policies: + +* Provide a RISC-V BFloat16 definition that makes sense for how we expect +these operands to be used in real applications. +* Provide the basic instructions that allow implementations to leverge the +benefits of the BFloat16 format + +** reduced storage space - A BFloat16 operand consumes half the space of an FP32 operand + +** higher effective storage bandwidth - Two BFloat16 operands can be transfered at the same rate as one FP32 + +** higher computational throughput - Two BFloat16 multiplies can be performed with less logic than one FP32 + +* Provide consitency with other approaches when this doesn't interfere with +the above diff --git a/doc/riscv-bfloat16-spec.adoc b/doc/riscv-bfloat16-spec.adoc new file mode 100644 index 0000000..468d534 --- /dev/null +++ b/doc/riscv-bfloat16-spec.adoc @@ -0,0 +1,152 @@ +[[riscv-doc-template]] += RISC-V BFloat16 Extension +:description: The BFloat16 format and instruction extensions for the RISC-V ISA. +:company: RISC-V.org +:revdate: 8 February, 2022 +:revnumber: v0.0.1 +:revremark: Draft +:url-riscv: http://riscv.org +:doctype: book +//:doctype: report +:preface-title: Preamble +:colophon: +:appendix-caption: Appendix +:imagesdir: images +:title-logo-image: image:risc-v_logo.png[pdfwidth=3.25in,align=center] +//:page-background-image: image:draft.svg[opacity=20%] +//:title-page-background-image: none +//:back-cover-image: image:circuit.png[opacity=25%] +// Settings: +:experimental: +:reproducible: +// needs to be changed? bug discussion started +:WaveDromEditorApp: wavedrom-cli +:imagesoutdir: images +:icons: font +:lang: en +:listing-caption: Listing +:sectnums: +:toc: left +:toclevels: 4 +:source-highlighter: pygments +ifdef::backend-pdf[] +:source-highlighter: coderay +endif::[] +:data-uri: +:hide-uri-scheme: +:stem: latexmath +:footnote: +:xrefstyle: short +:bibtex-file: ../riscv-bfloat16-spec.bib +:bibtex-order: alphabetical +:bibtex-style: ieee + +//:This is the preamble. + +[colophon] += Colophon + +This document describes the BFloat16 format and instruction extensions to the +RISC-V Instruction Set Architecture. + +This document is in the link:http://riscv.org/spec-state[Discussion State]. +This specification is in the early stages. Comments and proposals +are encouraged. +For more information, see link:http://riscv.org/spec-state[here]. + +[NOTE] +.Copyright and licensure: +This work is licensed under a +link:http://creativecommons.org/licenses/by/4.0/[Creative Commons Attribution 4.0 International License] + +[NOTE] +.Document Version Information: +==== +include::git-commit.adoc[] + +See link:https://github.com/riscv/riscv-alt-fp[github.com/riscv/riscv-alt-fp] +for more information. +==== + +[acknowledgments] +== Acknowledgments + +Contributors to all versions of the specification (in alphabetical order) +include: +[square] +* GouYue + +* link:mailto:kad@rivosinc.com[Ken Dockser] (Editor) + +* Nick Knight + +* _Your name here_ + + +We will be very grateful to the huge number of other people who will +have helped to improve this specification through their comments, reviews, +feedback and questions. + +// ------------------------------------------------------------ + +include::riscv-bfloat16-introduction.adoc[] +include::riscv-bfloat16-audience.adoc[] +include::riscv-bfloat16-format.adoc[] +// include::riscv-bfloat16-sail-specifications.adoc[] +// include::riscv-bfloat16-policies.adoc[] + +// ------------------------------------------------------------ + +[[bfloat16_extensions]] +== Extensions Overview + +The group of extensions introduced by the BFloat16 Instruction Set +Extension is listed here. + +Detection of individual BFloat16 extensions uses the +unified software-based RISC-V discovery method. + +[NOTE] +==== +At the time of writing, these discovery mechanisms are still a work in +progress. +==== + +.A note on extension rationale +[NOTE, caption="SH"] +==== +BFloat16 instructions are separated into different +functional groups because not all implementations will require all +of the functionality. + +==== + +// include::riscv-bfloat16-zfbmin.adoc[] + + +=== `Zfbmin` - BFloat16 minimal + +This extension provides the minimal support required for the BFloat16 +format. It enables BFloat16 to be an interchange format whereby it +can be used to load, store, and convert BFloat16 values. +<>. + +// ------------------------------------------------------------ + +[[bfloat16_insns, reftext="BFloat16 Instructions"]] +== Instructions +// include::insns/bf16convert.adoc[] +// <<< +// include::insns/bf16fma.adoc[] +// <<< + + +[[bibliography]] +== Bibliography + +bibliography:: + +// ../riscv-bfloat16-spec.bib[ieee] +https://https://ieeexplore.ieee.org/document/8766229[754-2019 - IEEE Standard for Floating-Point Arithmetic] + +https://ieeexplore.ieee.org/document/4610935[754-2008 - IEEE Standard for Floating-Point Arithmetic] + +// include::riscv-bfloat16-appx-rationale.adoc[] +// include::riscv-bfloat16-appx-materials.adoc[] +// include::riscv-bfloat16-appx-sail.adoc[] + From f94139b17a5ca68b09184c3551e40912c3c025d3 Mon Sep 17 00:00:00 2001 From: kdockser Date: Fri, 25 Feb 2022 16:27:12 -0600 Subject: [PATCH 02/14] Work --- .DS_Store | Bin 0 -> 6148 bytes doc/riscv-bfloat16-appx-rationale.adoc | 66 +++++++++++ doc/riscv-bfloat16-audience.adoc | 47 ++++++++ doc/riscv-bfloat16-format.adoc | 70 ++++++++++++ doc/riscv-bfloat16-introduction.adoc | 45 ++++++++ doc/riscv-bfloat16-policies.adoc | 15 +++ doc/riscv-bfloat16-spec.adoc | 152 +++++++++++++++++++++++++ 7 files changed, 395 insertions(+) create mode 100644 .DS_Store create mode 100755 doc/riscv-bfloat16-appx-rationale.adoc create mode 100755 doc/riscv-bfloat16-audience.adoc create mode 100755 doc/riscv-bfloat16-format.adoc create mode 100755 doc/riscv-bfloat16-introduction.adoc create mode 100755 doc/riscv-bfloat16-policies.adoc create mode 100755 doc/riscv-bfloat16-spec.adoc diff --git a/.DS_Store b/.DS_Store new file mode 100644 index 0000000000000000000000000000000000000000..19f071d593da5f53822bde1a1191426217781d50 GIT binary patch literal 6148 zcmeHKO>fgc5S>i|u~i}E01`bQS>hT(`au!m;-<-=Qp=%6Z~zqi5vYahjclhmM3H=l z|H7F+!ry7%>~2xhmL8FS9ckvx&d#j8Z>?P~5sASp*&yl=kppFHxG*;e_p@G;hKaO+ zg2#|?l*tG!9?r_&zyPmZlM<2??`JFPw~Lu71u$q;Fuz1I<*HUKDg&#(5(#d?>xxXj#Dopc9p$gJuf|U1f(>#&$ zft=+@scHk=;kb@F?yN5so7)@Dy57##^JUjtyx8n_z3r`?<mqg zyue4PB)D1*KF{OsDjOLdvKDexfOCOvi{6h)iI6@NZZt!PC9`4rp4j z3|Iy%1Gh3@wnt~})-2m;8L$lej|}ks;6WLEgSAGrb-<`A0I-Fy68P%pALwxa=o_px zA_5V*6sSvuxnc-&cM!T9{l3YsHR^H_>d6?#JXx3;1D1hR2AaCr z;q!m=`}_ZDk!@K9ECc@(1EMtyh67B=oUPX;$7ihv{Rm~@yjtV05;*E8MlK)451~pB Y_t*jY25XIoK>. + +// ------------------------------------------------------------ + +[[bfloat16_insns, reftext="BFloat16 Instructions"]] +== Instructions +// include::insns/bf16convert.adoc[] +// <<< +// include::insns/bf16fma.adoc[] +// <<< + + +[[bibliography]] +== Bibliography + +bibliography:: + +// ../riscv-bfloat16-spec.bib[ieee] +https://https://ieeexplore.ieee.org/document/8766229[754-2019 - IEEE Standard for Floating-Point Arithmetic] + +https://ieeexplore.ieee.org/document/4610935[754-2008 - IEEE Standard for Floating-Point Arithmetic] + +// include::riscv-bfloat16-appx-rationale.adoc[] +// include::riscv-bfloat16-appx-materials.adoc[] +// include::riscv-bfloat16-appx-sail.adoc[] + From ab80f72e24eb165efc7192dc6169e79f0074f1cb Mon Sep 17 00:00:00 2001 From: Ken Dockser <37552326+kdockser@users.noreply.github.com> Date: Sat, 26 Feb 2022 17:07:34 -0600 Subject: [PATCH 03/14] Update doc/riscv-bfloat16-format.adoc Co-authored-by: Nicolas Brunie <82109999+nibrunieAtSi5@users.noreply.github.com> --- doc/riscv-bfloat16-format.adoc | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/doc/riscv-bfloat16-format.adoc b/doc/riscv-bfloat16-format.adoc index 21e083c..51830d2 100755 --- a/doc/riscv-bfloat16-format.adoc +++ b/doc/riscv-bfloat16-format.adoc @@ -46,7 +46,8 @@ a trade off of precision to support _gradual underflow_. In RISC-V BFloat16, all subnormal BFloat16 inputs as treated as zero and subnormal outputs are flushed to zero. The sign of the original value is retained. This is not consistent with '754' but has been found to be a suitable alternative in many workloads. Furthermore, with BFloat16's relatively large exponent range, subnormals add little value. === Infinities: -Infinties are used to represnt values that are too large to be represnted by the target format. These are usually produced as a result of overflows (depending on the rounding mode), but can also be provided as inputs. Infities have a sign associated with them: there are positive infionioties and negative infinities. +Infinities are used to represent values that are too large to be represented by the target format. These are usually produced as a result of overflows (depending on the rounding mode), but can also be provided as inputs. Infinities have a sign associated with them: there are positive infinities and negative infinities. + Infinities are important for keeping meaningless results from being operated upon. From 3f10a10d0d9d59ce927e36d7df5b6faa5e8a3e66 Mon Sep 17 00:00:00 2001 From: Ken Dockser <37552326+kdockser@users.noreply.github.com> Date: Sat, 26 Feb 2022 17:07:52 -0600 Subject: [PATCH 04/14] Update doc/riscv-bfloat16-format.adoc Co-authored-by: Nicolas Brunie <82109999+nibrunieAtSi5@users.noreply.github.com> --- doc/riscv-bfloat16-format.adoc | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/doc/riscv-bfloat16-format.adoc b/doc/riscv-bfloat16-format.adoc index 51830d2..dcc4302 100755 --- a/doc/riscv-bfloat16-format.adoc +++ b/doc/riscv-bfloat16-format.adoc @@ -43,7 +43,8 @@ Floating-point values that are too small to represnted as normal numbers, but ca more 1s --- in the trailing fractional bits are called subnormal numbers. Basically, the idea is there is a trade off of precision to support _gradual underflow_. -In RISC-V BFloat16, all subnormal BFloat16 inputs as treated as zero and subnormal outputs are flushed to zero. The sign of the original value is retained. This is not consistent with '754' but has been found to be a suitable alternative in many workloads. Furthermore, with BFloat16's relatively large exponent range, subnormals add little value. +In RISC-V BFloat16, all subnormal BFloat16 inputs are treated as zero and subnormal outputs are flushed to zero. The sign of the original value is retained. This is not consistent with '754' but has been found to be a suitable alternative in many workloads. Furthermore, with BFloat16's relatively large exponent range, subnormals add little value. + === Infinities: Infinities are used to represent values that are too large to be represented by the target format. These are usually produced as a result of overflows (depending on the rounding mode), but can also be provided as inputs. Infinities have a sign associated with them: there are positive infinities and negative infinities. From 4b497d6e9dcd4a193b365a702b526dae073166e3 Mon Sep 17 00:00:00 2001 From: Ken Dockser <37552326+kdockser@users.noreply.github.com> Date: Sat, 26 Feb 2022 17:08:10 -0600 Subject: [PATCH 05/14] Update doc/riscv-bfloat16-format.adoc Co-authored-by: Nicolas Brunie <82109999+nibrunieAtSi5@users.noreply.github.com> --- doc/riscv-bfloat16-format.adoc | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/doc/riscv-bfloat16-format.adoc b/doc/riscv-bfloat16-format.adoc index dcc4302..cc6eaee 100755 --- a/doc/riscv-bfloat16-format.adoc +++ b/doc/riscv-bfloat16-format.adoc @@ -39,7 +39,8 @@ For BF16 these values are: == BFloat16 behaviors === Subnormal Numbers: -Floating-point values that are too small to represnted as normal numbers, but can still be represnted by using the format's smallest exponent with a zero integer bit and one or more leading 0s --- and one or +Floating-point values that are too small to be represented as normal numbers, but can still be represented by using the format's smallest exponent with a zero integer bit and one or more leading 0s --- and one or + more 1s --- in the trailing fractional bits are called subnormal numbers. Basically, the idea is there is a trade off of precision to support _gradual underflow_. From 12516a6d8a42496854f360a92c4e77813c4389d2 Mon Sep 17 00:00:00 2001 From: Ken Dockser <37552326+kdockser@users.noreply.github.com> Date: Sat, 26 Feb 2022 17:09:39 -0600 Subject: [PATCH 06/14] Update doc/riscv-bfloat16-appx-rationale.adoc Co-authored-by: Nicolas Brunie <82109999+nibrunieAtSi5@users.noreply.github.com> --- doc/riscv-bfloat16-appx-rationale.adoc | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/doc/riscv-bfloat16-appx-rationale.adoc b/doc/riscv-bfloat16-appx-rationale.adoc index 78e27c7..dfc73b0 100755 --- a/doc/riscv-bfloat16-appx-rationale.adoc +++ b/doc/riscv-bfloat16-appx-rationale.adoc @@ -31,7 +31,8 @@ recommendations for the instructions in the BFloat16 extension. It also tries to record how the designs of instructions were derived, or where they were contributed from. -=== Conversions Instructions +=== Conversion Instructions + The most common and important conversion instructions are between BFloat16 and FP32 (Single Precision). From d89b343254949d6bcbe88854a5072fa1e9201c7f Mon Sep 17 00:00:00 2001 From: kdockser Date: Sat, 26 Feb 2022 17:21:03 -0600 Subject: [PATCH 07/14] removed stowaway. --- .DS_Store | Bin 6148 -> 0 bytes 1 file changed, 0 insertions(+), 0 deletions(-) delete mode 100644 .DS_Store diff --git a/.DS_Store b/.DS_Store deleted file mode 100644 index 19f071d593da5f53822bde1a1191426217781d50..0000000000000000000000000000000000000000 GIT binary patch literal 0 HcmV?d00001 literal 6148 zcmeHKO>fgc5S>i|u~i}E01`bQS>hT(`au!m;-<-=Qp=%6Z~zqi5vYahjclhmM3H=l z|H7F+!ry7%>~2xhmL8FS9ckvx&d#j8Z>?P~5sASp*&yl=kppFHxG*;e_p@G;hKaO+ zg2#|?l*tG!9?r_&zyPmZlM<2??`JFPw~Lu71u$q;Fuz1I<*HUKDg&#(5(#d?>xxXj#Dopc9p$gJuf|U1f(>#&$ zft=+@scHk=;kb@F?yN5so7)@Dy57##^JUjtyx8n_z3r`?<mqg zyue4PB)D1*KF{OsDjOLdvKDexfOCOvi{6h)iI6@NZZt!PC9`4rp4j z3|Iy%1Gh3@wnt~})-2m;8L$lej|}ks;6WLEgSAGrb-<`A0I-Fy68P%pALwxa=o_px zA_5V*6sSvuxnc-&cM!T9{l3YsHR^H_>d6?#JXx3;1D1hR2AaCr z;q!m=`}_ZDk!@K9ECc@(1EMtyh67B=oUPX;$7ihv{Rm~@yjtV05;*E8MlK)451~pB Y_t*jY25XIoK Date: Mon, 4 Apr 2022 15:39:50 -0500 Subject: [PATCH 08/14] Cleanup and clarification --- .gitignore | 5 +++++ 1 file changed, 5 insertions(+) create mode 100644 .gitignore diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..3277f8f --- /dev/null +++ b/.gitignore @@ -0,0 +1,5 @@ +*.swp +*.vim +.DS_Store + +build/ From 52777c47ccec260e76fc75652c8f714bbe07b44b Mon Sep 17 00:00:00 2001 From: kdockser Date: Mon, 4 Apr 2022 15:43:10 -0500 Subject: [PATCH 09/14] Cleanup and clarification of initial draft. --- doc/riscv-bfloat16-appx-rationale.adoc | 10 +++++----- doc/riscv-bfloat16-audience.adoc | 2 +- doc/riscv-bfloat16-format.adoc | 16 ++++++++++------ doc/riscv-bfloat16-introduction.adoc | 2 +- doc/riscv-bfloat16-policies.adoc | 6 +++--- 5 files changed, 20 insertions(+), 16 deletions(-) diff --git a/doc/riscv-bfloat16-appx-rationale.adoc b/doc/riscv-bfloat16-appx-rationale.adoc index dfc73b0..0fec872 100755 --- a/doc/riscv-bfloat16-appx-rationale.adoc +++ b/doc/riscv-bfloat16-appx-rationale.adoc @@ -11,12 +11,12 @@ from the standard ==== Round to odd Round to odd is not a '754 supported rounding mode. However, it avoids double -rounding can occur when accumulating a result in a wider format and then +rounding that can occur when accumulating a result in a wider format and then converting the result to a narrower format before subsequent usage. ==== Round to nearest - even Round to nearest, ties to even is the default '754 rounding format. It is unbiased -and minimze rounding error. +and minimize rounding error. === Subnormal Handling @@ -38,13 +38,13 @@ The most common and important conversion instructions are between BFloat16 and F (Single Precision). We chose not to have direct conversion between BFloa16 and other formats as they -can typcially be performed by a combination of instructions. +can typically be performed by a combination of instructions. .Notes to software developers [NOTE,caption="SH"] ==== In some cases, for example convert from FP64 to BFloat16 there can be double rounding. -It is up to software to elimiante such sources of error if this is important to the +It is up to software to eliminate such sources of error if this is important to the application. ==== @@ -60,7 +60,7 @@ Somewhat unaptly named, yet very useful instructions. .Notes to software developers [NOTE,caption="SH"] ==== -Signifiant speedup +Significant speedup E Plurbus Unum ==== diff --git a/doc/riscv-bfloat16-audience.adoc b/doc/riscv-bfloat16-audience.adoc index 54a65c9..51997dd 100755 --- a/doc/riscv-bfloat16-audience.adoc +++ b/doc/riscv-bfloat16-audience.adoc @@ -2,7 +2,7 @@ === Intended Audience THIS IS VERY PRELIMINARY - TO BE UPDATED -FLoating-point arithmetic is a specialised subject, requiring people with many different +Floating-point arithmetic is a specialized subject, requiring people with many different backgrounds to cooperate in its correct and efficient implementation. Where possible, we have written this specification to be understandable by all, though we recognize that the motivations and references to diff --git a/doc/riscv-bfloat16-format.adoc b/doc/riscv-bfloat16-format.adoc index cc6eaee..af82274 100755 --- a/doc/riscv-bfloat16-format.adoc +++ b/doc/riscv-bfloat16-format.adoc @@ -15,7 +15,7 @@ IEEE Compliance: While BFloat16 (BF16) is not an IEEE-754 _standard_ format, it For BF16 these values are: [%autowidth] -.BFloat16 paramenters +.BFloat16 parameters |=== |radix (b)|2 |significand (p)|8 @@ -31,7 +31,7 @@ For BF16 these values are: |FP16 |1| 8| 7| 0|16| 127|-126 |BFloat16|1| 5|10| 0|16| 15| -14 |TF32 |1| 8|10|13|32| 127|-126 -|FP32 |1| 9|23| 0|32| 127|-126 +|FP32 |1| 8|23| 0|32| 127|-126 |FP64 |1|11|52| 0|64|1023|-1022 |FP128 |1|15|112|0|128|16,383|-16,382 |=== @@ -44,7 +44,11 @@ Floating-point values that are too small to be represented as normal numbers, bu more 1s --- in the trailing fractional bits are called subnormal numbers. Basically, the idea is there is a trade off of precision to support _gradual underflow_. -In RISC-V BFloat16, all subnormal BFloat16 inputs are treated as zero and subnormal outputs are flushed to zero. The sign of the original value is retained. This is not consistent with '754' but has been found to be a suitable alternative in many workloads. Furthermore, with BFloat16's relatively large exponent range, subnormals add little value. +In RISC-V instructions operating on BFloat16, it is generally intended that all subnormal BFloat16 inputs are treated as zero and subnormal outputs are flushed to zero. The sign of the original value is retained. However, it +is uop to the instruction to specify this behavior. +vary based on the instruction as there are special cases where it may be undesirable to +some special cases where it is not desirable to treat +This is not consistent with '754' but has been found to be a suitable alternative in many workloads. Furthermore, with BFloat16's relatively large exponent range, subnormals add little value. === Infinities: @@ -55,11 +59,11 @@ Infinities are important for keeping meaningless results from being operated upo === NaNs -NaN stands for Not a Number. These are provided as the result of an operation when it cannot be represnted -as a number or infinity. For example, performaning the square root of -1 will result in a NaN because +NaN stands for Not a Number. These are provided as the result of an operation when it cannot be represented +as a number or infinity. For example, performing the square root of -1 will result in a NaN because there is no real number that can represent the result. NaNs can also be used as inputs. -There are two types of NaNs: signalling and quiet. Signalling NaNs are provided as input data since no computational instruction will ever produce tis kind of a NaN. Operating on a Signalling NaN will produce an invalid operation exception. Operating on a Quiet NaN usually does not cause an exception. +There are two types of NaNs: signalling and quiet. Signalling NaNs are provided as input data since no computational instruction will ever produce this kind of a NaN. Operating on a Signalling NaN will produce an invalid operation exception. Operating on a Quiet NaN usually does not cause an exception. NaNs include a sign bit, but the bit has no meaning. diff --git a/doc/riscv-bfloat16-introduction.adoc b/doc/riscv-bfloat16-introduction.adoc index 4d795bf..99957ce 100755 --- a/doc/riscv-bfloat16-introduction.adoc +++ b/doc/riscv-bfloat16-introduction.adoc @@ -12,7 +12,7 @@ some widespread adoption as an arithmetic format. This is in line with the Zfh proposed extension. While it was not the intention of '754 to have FP16 be an arithmetic format, it is -supported by the standard. Eventhough the '754 committee recognized that FP16 was +supported by the standard. Even though the '754 committee recognized that FP16 was gaining popularity, the committee decided to hold off on making it a basic format in the 2019 release. This means that a '754 compliant implementation of binary floating point, which needs to support at least one basic format, cannot support diff --git a/doc/riscv-bfloat16-policies.adoc b/doc/riscv-bfloat16-policies.adoc index 8e832f4..437a71b 100755 --- a/doc/riscv-bfloat16-policies.adoc +++ b/doc/riscv-bfloat16-policies.adoc @@ -6,10 +6,10 @@ policies: * Provide a RISC-V BFloat16 definition that makes sense for how we expect these operands to be used in real applications. -* Provide the basic instructions that allow implementations to leverge the +* Provide the basic instructions that allow implementations to leverage the benefits of the BFloat16 format + ** reduced storage space - A BFloat16 operand consumes half the space of an FP32 operand + -** higher effective storage bandwidth - Two BFloat16 operands can be transfered at the same rate as one FP32 + +** higher effective storage bandwidth - Two BFloat16 operands can be transferred at the same rate as one FP32 + ** higher computational throughput - Two BFloat16 multiplies can be performed with less logic than one FP32 + -* Provide consitency with other approaches when this doesn't interfere with +* Provide consistency with other approaches when this doesn't interfere with the above From 29cb303a1c6e8d99b0b42eb224455840826c858b Mon Sep 17 00:00:00 2001 From: kdockser Date: Mon, 4 Apr 2022 16:57:35 -0500 Subject: [PATCH 10/14] A couple of cleanups didn't make it through... --- doc/riscv-bfloat16-format.adoc | 19 ++++++++++++------- 1 file changed, 12 insertions(+), 7 deletions(-) diff --git a/doc/riscv-bfloat16-format.adoc b/doc/riscv-bfloat16-format.adoc index af82274..e7835f0 100755 --- a/doc/riscv-bfloat16-format.adoc +++ b/doc/riscv-bfloat16-format.adoc @@ -39,16 +39,21 @@ For BF16 these values are: == BFloat16 behaviors === Subnormal Numbers: -Floating-point values that are too small to be represented as normal numbers, but can still be represented by using the format's smallest exponent with a zero integer bit and one or more leading 0s --- and one or - +Floating-point values that are too small to be represented as normal numbers, but can still be represented by +using the format's smallest exponent with a zero integer bit and one or more leading 0s --- and one or more 1s --- in the trailing fractional bits are called subnormal numbers. Basically, the idea is there is a trade off of precision to support _gradual underflow_. -In RISC-V instructions operating on BFloat16, it is generally intended that all subnormal BFloat16 inputs are treated as zero and subnormal outputs are flushed to zero. The sign of the original value is retained. However, it -is uop to the instruction to specify this behavior. -vary based on the instruction as there are special cases where it may be undesirable to -some special cases where it is not desirable to treat -This is not consistent with '754' but has been found to be a suitable alternative in many workloads. Furthermore, with BFloat16's relatively large exponent range, subnormals add little value. +In RISC-V instructions operating on BFloat16, it is generally intended that all subnormal BFloat16 inputs +are treated as zero and subnormal outputs are flushed to zero. The sign of the original value is retained. +However, it does not necessarily make sense for all BF16 instructions to follow this behavior. For +example, there is little value in such behavior when converting between FP32 and BF16. Therefore, individual +instructions can specify when they deviate from this behavior. + +While '754 doesn't support treating/flushing subnormals, many architectures have adopted such behavior +as a reasonable simplification for certain domains. +Furthermore, since BFloat16 has the same exponent range as FP32, supporting subnormals is expected to +add little value. === Infinities: From 1c0f9573adfa519887bca74794685eb82ee4f92c Mon Sep 17 00:00:00 2001 From: Ken Dockser <37552326+kdockser@users.noreply.github.com> Date: Mon, 4 Apr 2022 17:01:36 -0500 Subject: [PATCH 11/14] Update doc/riscv-bfloat16-appx-rationale.adoc Co-authored-by: Nicolas Brunie <82109999+nibrunieAtSi5@users.noreply.github.com> --- doc/riscv-bfloat16-appx-rationale.adoc | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/doc/riscv-bfloat16-appx-rationale.adoc b/doc/riscv-bfloat16-appx-rationale.adoc index 0fec872..5e89766 100755 --- a/doc/riscv-bfloat16-appx-rationale.adoc +++ b/doc/riscv-bfloat16-appx-rationale.adoc @@ -37,7 +37,8 @@ derived, or where they were contributed from. The most common and important conversion instructions are between BFloat16 and FP32 (Single Precision). -We chose not to have direct conversion between BFloa16 and other formats as they +We chose not to have direct conversion between BFloat16 and other formats as they + can typically be performed by a combination of instructions. .Notes to software developers From 1e65e8268ef8e4d2c88ad51844af2d31d6c1610d Mon Sep 17 00:00:00 2001 From: Ken Dockser <37552326+kdockser@users.noreply.github.com> Date: Mon, 4 Apr 2022 17:03:10 -0500 Subject: [PATCH 12/14] Update doc/riscv-bfloat16-appx-rationale.adoc Co-authored-by: Nicolas Brunie <82109999+nibrunieAtSi5@users.noreply.github.com> --- doc/riscv-bfloat16-appx-rationale.adoc | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/doc/riscv-bfloat16-appx-rationale.adoc b/doc/riscv-bfloat16-appx-rationale.adoc index 5e89766..e635a88 100755 --- a/doc/riscv-bfloat16-appx-rationale.adoc +++ b/doc/riscv-bfloat16-appx-rationale.adoc @@ -63,6 +63,7 @@ Somewhat unaptly named, yet very useful instructions. ==== Significant speedup -E Plurbus Unum +E Pluribus Unum + ==== From cf2e1302a5ca573dda6f6cdee9f76163e2512ea6 Mon Sep 17 00:00:00 2001 From: Ken Dockser <37552326+kdockser@users.noreply.github.com> Date: Mon, 4 Apr 2022 17:14:32 -0500 Subject: [PATCH 13/14] Update doc/riscv-bfloat16-spec.adoc Co-authored-by: Nicolas Brunie <82109999+nibrunieAtSi5@users.noreply.github.com> --- doc/riscv-bfloat16-spec.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/riscv-bfloat16-spec.adoc b/doc/riscv-bfloat16-spec.adoc index 468d534..5e564d7 100755 --- a/doc/riscv-bfloat16-spec.adoc +++ b/doc/riscv-bfloat16-spec.adoc @@ -123,7 +123,7 @@ of the functionality. === `Zfbmin` - BFloat16 minimal This extension provides the minimal support required for the BFloat16 -format. It enables BFloat16 to be an interchange format whereby it +format. It enables BFloat16 as an interchange format whereby it can be used to load, store, and convert BFloat16 values. <>. From 0269541e7ecdb450565e104f14a84339fe6c8300 Mon Sep 17 00:00:00 2001 From: Marcus Plutowski Date: Mon, 11 Apr 2022 13:08:38 -0700 Subject: [PATCH 14/14] un-flip bfloat16 and fp16 rows in fp-table --- doc/riscv-bfloat16-format.adoc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/doc/riscv-bfloat16-format.adoc b/doc/riscv-bfloat16-format.adoc index e7835f0..c0ee60c 100755 --- a/doc/riscv-bfloat16-format.adoc +++ b/doc/riscv-bfloat16-format.adoc @@ -28,8 +28,8 @@ For BF16 these values are: |=== |Format|Sign Bits|Expo Bits|fraction bits|padded 0s|encoding bits|expo max/bias|expo min -|FP16 |1| 8| 7| 0|16| 127|-126 -|BFloat16|1| 5|10| 0|16| 15| -14 +|FP16 |1| 5|10| 0|16| 15| -14 +|BFloat16|1| 8| 7| 0|16| 127|-126 |TF32 |1| 8|10|13|32| 127|-126 |FP32 |1| 8|23| 0|32| 127|-126 |FP64 |1|11|52| 0|64|1023|-1022