Skip to content
This repository has been archived by the owner on Feb 2, 2024. It is now read-only.

Initial base for AltFP #2

Open
wants to merge 16 commits into
base: main
Choose a base branch
from
Open

Conversation

kdockser
Copy link

Define RISC-V version of BF16 format and behaviors

@nibrunieAtSi5
Copy link

I am not sure the .DS_store file attached to this PR was intended.


Experts working in machine learning noticed that FP16 was a much more compact way of
storing operands and often provided sufficient precision for them. However, they also
found that intermediate values were much better when accumulated into a higher precision.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does better means accuracte here ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it does. I will clarify and elaborate (a little). Also, thanks for catching the typos.

Copy link

@tovine tovine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a good start for a proposal, keep up the good work! 😄


Fused multiply add.

=== Dot Product
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the intention here to support dot product as a packed-SIMD style operation, or an application of FMA?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upon reading the rest of the spec, it seems like this is intended to be used in packed-SIMD style ops. The spec should probably also define other operations on multiple packed BF16 operands (at least a note on how to load/store them - probably using standard FLW/FLD?) and how this is intended to work in general; I assume it's useful to to think about this for all operations, since just using 16 bits of a 32 or 64 bit register seems a bit wasteful.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the intention is for dot product operations, as that is what BF16 is usually used for. It was requested that we start with the base and then we can move on to operations. I anticipate that these operations will be most useful in Vector.

Comment on lines 31 to 32
|FP16 |1| 8| 7| 0|16| 127|-126
|BFloat16|1| 5|10| 0|16| 15| -14
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you got these switched around: BF16 is the same as FP32, but with 16 fewer fraction bits: https://cloud.google.com/blog/products/ai-machine-learning/bfloat16-the-secret-to-high-performance-on-cloud-tpus

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct. Somehow I swapped them.

|FP16 |1| 8| 7| 0|16| 127|-126
|BFloat16|1| 5|10| 0|16| 15| -14
|TF32 |1| 8|10|13|32| 127|-126
|FP32 |1| 9|23| 0|32| 127|-126
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FP32 only has 8 exponent bits, not 9 - as it's written now the sum of bits would be 33.
There is an implied 1 bit in there so technically you get the effect of 33 bits, but that goes into the fraction 🙂

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching the typo. I will fix.

@tovine
Copy link

tovine commented Mar 4, 2022

Have you considered any synergy effects between this extension and the P (Packed SIMD) proposal? BF16 is definitely a good candidate for that, so this should probably be discussed in the spec somewhere. Especially with the mention of dot product operations

@allenjbaum
Copy link

I also noticed the statements: "Instruction design and definition. (This part has dependency on Zfh & RVV & EDIV.)" and "Additional vector extension/ EDIV extension operations;"

  1. you left out ZHinx (or ZFinx, or both) as well as P-extension (in RV64, at least).
  2. in the interests of not "boiling the ocean" I would think it would be prudent to neither rely on nor consider defining anything that depends on RVV for now - and especially not EDIV which is not official in any sense. That should be a separate TG.
    I am guessing that you just want to make sure that this might be extended_to / supportable_by a vector implementation - and that you're not trying to define those here - but that should be clear and explicit.

@allenjbaum
Copy link

NaNs
You say you're supporting signaling Nans, but no operation will produce them.
Does that mean if one is loaded into a register, despite not being produced by an fp16 op,
and subsequently used by an fp16 op, it will be treated as a quiet NaN? (i.e. no trap)
I actually don't know what IEEE says about that particular case.

denorms
The charter mentions "Handling flush-to-zero for IEEE types and BF16 type." but the format doc simply states that all denormals are flushed to zero. So are you making flush-to-zero configurable? OR is this strictly how you handle conversions for IEEE denormals to bf16, or something else?

Also: you say "Furthermore, with BFloat16’s relatively large exponent range, subnormals add little value."
except there is only a 5bit exponent field, smaller than any other format listed. How can you call this "large"?

Exceptions:
the statement is confusing in that the first sentence mentions exceptions, and the second status, with nothing that connects them. Specifically, it doesn't mention that RISC-V never takes an exception for an operation regardless or result or input operand, and so (insert second sentence here)

In the Policies doc, you say:
"higher effective storage bandwidth - Two BFloat16 operands can be transferred at the same rate as one FP32
higher computational throughput - Two BFloat16 multiplies can be performed with less logic than one FP32"
This implies that you're using packed SIMD. IS that being proposed at all? Otherwise, you highly unlikely to get any better throughput, and you can get better load store bandwidth, but you need to add extra pack/unpack operations, which decreases throughput, at least.

@kdockser
Copy link
Author

kdockser commented Mar 4, 2022 via email

@kdockser
Copy link
Author

kdockser commented Mar 4, 2022 via email

@aswaterman
Copy link

Flushing subnormals to zero should be a property of an operation, not a property of the format. The format can clearly define what the subnormal values mean, and specific operations can choose to interpret them as zero and flush outputs to zero. This approach better separates concerns.

Some other architectures’ BF16 instructions only flush subnormals in some instructions (e.g. dot product flushes to zero, but conversion to and from FP32 does not, since flushing in the latter case doesn’t buy you anything). We don’t need to wade into that debate today, but we do need to preserve the possibility of having that debate by removing the flush-to-zero mandate from the format definition.

@kdockser
Copy link
Author

kdockser commented Mar 4, 2022 via email

@allenjbaum
Copy link

allenjbaum commented Mar 4, 2022 via email

(Single Precision).

We chose not to have direct conversion between BFloa16 and other formats as they
can typically be performed by a combination of instructions.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we intend to list (and check) those combinations of instructions in the actual specification ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We intend to check that it is possible to move between these other formats. This would likely be in an appendix rather than as a part of the specification proper.

Comment on lines 42 to 44
Floating-point values that are too small to be represented as normal numbers, but can still be represented by using the format's smallest exponent with a zero integer bit and one or more leading 0s --- and one or

more 1s --- in the trailing fractional bits are called subnormal numbers. Basically, the idea is there is

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Floating-point values that are too small to be represented as normal numbers, but can still be represented by using the format's smallest exponent with a zero integer bit and one or more leading 0s --- and one or
more 1s --- in the trailing fractional bits are called subnormal numbers. Basically, the idea is there is
Floating-point values that are too small to be represented as normal numbers, but can still be represented by using the format's smallest exponent with a zero integer bit and one or more leading 0s --- and one or
more 1s --- in the trailing fractional bits are called subnormal numbers. Basically, the idea is there is

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching. I fixed this but hadn't added it before committing. The latest pull request should look better (for this file anyway).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem, I am trying to do a full review and listing typos / questions along the way.

a trade off of precision to support _gradual underflow_.

In RISC-V instructions operating on BFloat16, it is generally intended that all subnormal BFloat16 inputs are treated as zero and subnormal outputs are flushed to zero. The sign of the original value is retained. However, it
is uop to the instruction to specify this behavior.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
is uop to the instruction to specify this behavior.
is up to the instruction to specify this behavior.

Comment on lines 49 to 51
vary based on the instruction as there are special cases where it may be undesirable to
some special cases where it is not desirable to treat
This is not consistent with '754' but has been found to be a suitable alternative in many workloads. Furthermore, with BFloat16's relatively large exponent range, subnormals add little value.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
vary based on the instruction as there are special cases where it may be undesirable to
some special cases where it is not desirable to treat
This is not consistent with '754' but has been found to be a suitable alternative in many workloads. Furthermore, with BFloat16's relatively large exponent range, subnormals add little value.
This behavior may vary based on the instruction as there are special cases where it may be undesirable to treat subnormals as zero.
This is not consistent with '754' but has been found to be a suitable alternative in many workloads. Furthermore, with BFloat16's relatively large exponent range, subnormals add little value.

Comment on lines +73 to +74
In general, the default IEEE rounding mode (round to nearest, ties to even) works for arithmetic cases. There are some special cases where a particular instruction benefits from a different rounding mode (e.g., convert to integer, widening multiply-accumulate) - we can address this on those specific instructions.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean we intend to have a static rounding mode force to RNE by default and only allow rounding-mode static (opcode) or dynamic (csr) selection on a specific subset of instructions ? This seems to be in contradiction with F and D extensions and should be justified here IMHO.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, these instructions would have a static rounding mode that is not overridable. Yes, this is different from '754. However, it is a common simplification (just like flushing subnormals). If someone needs more control over the rounding mode they can run in SP (F).

I agree that we will need to provide a detailed justification in the specification for this simplification.

In general, the default IEEE rounding mode (round to nearest, ties to even) works for arithmetic cases. There are some special cases where a particular instruction benefits from a different rounding mode (e.g., convert to integer, widening multiply-accumulate) - we can address this on those specific instructions.

=== Handling exceptions
Default exception handling, as defined by IEEE, is a simple and effective approach to producing results in exceptional cases. For the coder to be able to see what has happened, and take further action if needed, the BFloat16 instructions need to set floating-point exception flags the same way as all other floating-point instructions in RISC-V.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This formulation may not be future proof, we may want to cite explicitly the basic floating-point extensions here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which area are concerned about: the rounding mode, default exception handling, or both?

Should the need arise, an extension could be added that allows the rounding mode to be changed by the CSR.

The handling of exceptions via the IEEE default is common across RISC-V. Is this what you mean about citing the basic floating-point extensions?
At some point there might be a TG that creates trapped exceptions for FP instructions - but right now only default is supported.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am concerned that another floating-point extension may be introduced with a different way of managing FP exception flags, making "as all other floating-point instructions ..." missleading. So mentioning explicitly that we intend to managed them as extension F and D (and Q), clarifies things if such an extension should appear at some point. I agree that the use of such remark may be limited.

benefits of the BFloat16 format +
** reduced storage space - A BFloat16 operand consumes half the space of an FP32 operand +
** higher effective storage bandwidth - Two BFloat16 operands can be transferred at the same rate as one FP32 +
** higher computational throughput - Two BFloat16 multiplies can be performed with less logic than one FP32 +

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could even add than one BFloat16 multiply can be done with less logic than one FP16 (mostly due to multiplier area gains)

@tovine
Copy link

tovine commented Apr 5, 2022

This might not be in the scope for this extension, but have you considered unum/posit?

@allenjbaum
Copy link

allenjbaum commented Apr 5, 2022 via email

@kdockser
Copy link
Author

kdockser commented Apr 7, 2022 via email

@kdockser
Copy link
Author

kdockser commented Apr 7, 2022 via email

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants