-
Notifications
You must be signed in to change notification settings - Fork 653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AdditionalVectorCrypto] adding section on addition vector crypto extensions #1306
base: main
Are you sure you want to change the base?
Conversation
Will it be reserved or not for instructions (vghsh and vgmul) , if the vd register group overlaps with the vs2 register group. |
Good question @QJtaibai , this is not listed but should be to aligned with the other |
Hello! I have a question that in the instruction part of document vghsh.vs shares same encoding with vghsh.vv, and has func6:101100, while in Appendix A it looks vghsh.vs have func6:100011. I wonder which one is correct? |
Appendix A and the riscv-v opcode Pull request (https://github.com/nibrunieAtSi5/riscv-opcodes/pull/1/files) are right, the instruciton part of the document was incorrect. I have fixed it. Note that those opcodes are simply suggestion at this point (but that does not mean the suggestions should not be consistent). Thank you for pointing that out. |
I have developed a small example to demonstrate how The current implementation provides "performance" of about 0.65 instruction per bytes (on 1MB input buffer). |
These two extensions add addtional instructions for carryless multiplication with 32-bits elements and Vector-Scalar GCM instructions. Please see riscv/riscv-isa-manual#1306.
src/vector-crypto-additional.adoc
Outdated
[wavedrom, , svg] | ||
.... | ||
{reg:[ | ||
{bits: 7, name: 'OP-P'}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have renamed it to OP_VE
in #1311.
We don't have this constraint on |
For LMUL>1, the operations likely take place over multiple cycles or uops. The restriction on .vs ensure that the scalar element in the first register is not overwritten before all LMUL parts have been processed. This is the same reason we have rules on widening and narrowing instructions. This restriction is not needed for .vv since the input bits and output bits are from the same offset in the register group. Overwriting an earlier part of the register doesn't affect operations on later parts of the register. |
Yeah, thanks! I get it! But it seems that |
Maybe the assumption for reductions is that vd would only be updated after all of vs1 is read since you need to see all elements before you know the result. |
@topperc Right, for reductions, the amount of additional state that’s needed is nil for many implementations, but in the worst case, it’s bounded by ELEN. Here, I guess it’s bounded by EGW. That difference could be material for narrow vector machines. @nibrunieAtSi5 can you illuminate the situation? |
What about the VPR alignment rules for
|
These two extensions add addtional instructions for carryless multiplication with 32-bits elements and Vector-Scalar GCM instructions. Please see riscv/riscv-isa-manual#1306.
These two extensions add addtional instructions for carryless multiplication with 32-bits elements and Vector-Scalar GCM instructions. Please see riscv/riscv-isa-manual#1306.
The non-overlap constraint for vector element group is part of the vector crypto specification riscv-isa-manual/src/vector-crypto.adoc Line 311 in 04179ad
It was defined as no overlapping between vector register groups. This was not necessarily repeated for each operation riscv-isa-manual/src/vector-crypto.adoc Line 1530 in 04179ad
I think this constraint makes sense here as well since allowing overlapping would constraint uarch which cannot save/stage the scalar element group before finishing the operation (as stated by @topperc it is highly likely that for LMUL > EGW / VLEN implementations will iterate to compute the operation over the full vector register groups in a few cycles). Note that
but should maybe be generalized to |
Is the second line supposed to say Zvkg rather than Zvkgs? |
Yes, you are right @topperc , this is a typo dependency to |
In the v0.0.7 spec, I have a question of the operation code of the vghsh.vs and vgmul.vs. Take vghsh.vs for example, It says H is common to all element groups, and you put the "let H = brev8(...)" code outside the loop. I suspect the H of all the group should be the same, and is from the first 128bit of vs2. But inside the each loop, H will be changed. |
@httoracle You are right. |
Additional Vector Crypto extension v0.0.8 (September 9th 2024): Changelog from v0.0.7:
|
Signed-off-by: Nicolas Brunie <[email protected]>
Signed-off-by: Nicolas Brunie <[email protected]>
9982d13
to
9b12009
Compare
9b12009
to
a12c20d
Compare
Maybe a silly question: do we need zvbc8e/zvbc16e? I feel it can be useful, but I have no data to show how common CRC8/CRC16 are. |
That is definitely no a silly question and it has been appeared a few times.
|
v 0.0.9:
riscv-crypto-spec-vector-extra_v0.0.9.pdf Note:: part of the justification for Zvkgs relies on the fact that a scalar element group in a |
I don't know the answer, but here are some investigations:
If the cost of supporting 8/16 bits won't be large, personally, I would like to have them added. |
That is very interesting, it would be also interesting to check if anyone has ever ported |
Is there any concern that the name |
No concerns were voiced so far, but if you think this could be an issue we can try to come up with another name. ( |
Zvbc32e may not be suitable now if we extend SEW to 8/16/32 bits. Maybe Zvbcxw(Extended Width), Zvbcxe(Extended EEW)? |
I discussed this with @aswaterman and he suggested that the suffix 32* has already been used to mean any element width 32-bit and lower, e.g. in |
Agreed |
I don't agree this wording, I strongly recommond that we should change to another name. |
I understand your concern @wangpc-pp If I summarize, the options are:
I would be in favour of keeping |
I don't think the 32 in Zve32x/f is about VLEN. The focus on the Zve* extensions is about reducing ALU cost by limiting which element sizes are supported or what operations are supported on large elements. The removal of 64-bit integers elements is the important difference between Zve32x and Zve64x not the minimum VLEN. |
That was my initial interpretation: that the VLEN constraint relaxation was actually a consequence on the limitation of ELEN in general or on the largest SEW supported for floating-point elements. |
Thanks for these explanations! I am still not convinced but I will let it be since it seems I'm in the minority. 😄 |
The 32 in Zve32 represents ELEN. The minimum VLEN is set to be the same as ELEN for these extensions (Table 62 in current vector chapter). |
New version: v0.0.10: Changelog:
riscv-crypto-spec-vector-extra_v0.0.10.pdf (1): it seems the original vector crypto specification uses both "carryless" and "carry-less". Looking at wikipedia, both seem valid (https://en.wikipedia.org/wiki/Carry-less_product, https://en.wikipedia.org/wiki/Finite_field_arithmetic#Carryless_multiply). I chose to unify to "carry-less" but I am open to suggestion. |
New version of the spec draft (v0.0.11 Dec5th 2024): riscv-crypto-spec-vector-extra_v0.0.11.pdf Changelog:
|
This pull request is the
riscv-isa-manual
version of a pull request started onriscv-crypto
: riscv/riscv-crypto#362./!\ This pull request is a draft for the future fast track extensions.
This pull requests draft the changes associated with two fast track extensions for vector crypto.
During the specification process for vector crypto 1.0.0 a few items had to be discarded because they appeared too late in the process. This fast track extension tries to address some of them.
The official demand that will be discussed in the Task Group and submitted to the Unpriv Committee is being drafter here: https://docs.google.com/document/d/1zpYhnZi2NxhjfcBGvPOy0oDhx6lTXchscG17Qcl6wv8/edit?usp=sharing
New features:
Zvbc32e
: Extendingvclmul[h].v[vh]
instruction to supportSEW=32-bit
valueELEN >= 32
) or in addition toZvbc
(ELEN >= 64
)Zvkgs
: Adding.vs
variants tovghsh
andvghmul
Zvkg
Open questions:
Zvbc32e
be allowed whenELEN >= 32
without depending onZvbc
? (Answer: YES)Zvbc32e
support SEW=16 ? (SEW=8 ?)How to name the two new extensionsZvkt(bc/bc32e)
to extendZvkt
to the extension ofvclmul[h/]
defined inZvbc32e
?Related changes:
spike-isa-sim
modificationsZvbc32e
andZvkgs
: Vector crypto additional riscv-software-src/riscv-isa-sim#1748Zvbce32
https://github.com/riscv/riscv-crypto/blob/main/doc/vector/code-samples/zvbc-test.cZvkgs
https://github.com/riscv/riscv-crypto/blob/main/doc/vector/code-samples/zvkg-test.cZvkgs
Zvbc32e
Draft versions:
Changelogs
vs2
/vd
overlap as reserved encoding for newvghsh.vs
/vgmul.vs
instructions (review feedback from @QJtaibai )Zvbc32e
specification to incorporate 8 and 16-bit carry-less multiplicationOriginal Plan for the fast track schedule
References