Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EIP-2537 - BLS12-381 precompiles for the EVM #368

Merged
merged 27 commits into from
May 1, 2024
Merged

EIP-2537 - BLS12-381 precompiles for the EVM #368

merged 27 commits into from
May 1, 2024

Conversation

mratsim
Copy link
Owner

@mratsim mratsim commented Apr 1, 2024

This PR helps providing pricing feedback for EIP-2537 and also implements it.

  • Metering the low-level primitives. Note: as metering does not use hardware perf counters and also uses atomics for thread-safety, it induces a small overhead on small functions
  • Implementation
  • High-level benchmarks (including EVM EBI -> Constantine overhead)

Detailed benchmark and metering, constant-time and variable-time (for worst-case scenario) is available in: https://github.com/mratsim/constantine/blob/eip2537/metering/eip2537.md

Low-level benchmark
The addition and scalar mul are constant-time when not mentioned vartime hence worst-case scenario.

image

vs Gnark (variable-time)

git clone https://github.com/Consensys/gnark-crypto
cd gnark-crypto/ecc/bls12-381
go test -bench="(Pairing|G[12]Jac(Add|Double|ScalarMultiplication))" --cpu 1 -run=none

image

Operation Constantine speedup over Gnark Constantine vartime
G1 add 1.20x 2.21x
G1 mul 1.01x 1.24x
G2 add 1.14x 2.66x
G2 mul 1.38x 1.63x
Pairing 1.16x N/A

@mratsim
Copy link
Owner Author

mratsim commented Apr 10, 2024

For @asanso on gas pricing

gas costs in ratio of G1 scalarmul
image

original: https://gist.github.com/mratsim/6785a29e72865cfa94e1174fae1e1168
image

Reproduction

git clone https://github.com/mratsim/constantine
cd constantine
git checkout eip2537
CC=clang nimble bench_eip2537_subgroup_checks_impact

@mratsim
Copy link
Owner Author

mratsim commented May 1, 2024

All EIP-2537 precompiles are implemented with benchmarks.

image

--------------------------------------------------------------------------------------------------------------------
BLS12_G1ADD                  185.60 MGas/s      371195.249 ops/s         2694 ns/op         8878 CPU cycles (approx)
BLS12_G2ADD                  218.28 MGas/s      272851.296 ops/s         3665 ns/op        12074 CPU cycles (approx)
BLS12_G1MUL                  144.80 MGas/s       12066.947 ops/s        82871 ns/op       273021 CPU cycles (approx)
BLS12_G2MUL                  346.23 MGas/s        7693.965 ops/s       129972 ns/op       428197 CPU cycles (approx)
BLS12_MAP_FP_TO_G1           161.09 MGas/s       29288.580 ops/s        34143 ns/op       112485 CPU cycles (approx)
BLS12_MAP_FP2_TO_G2          702.17 MGas/s        9362.332 ops/s       106811 ns/op       351891 CPU cycles (approx)
--------------------------------------------------------------------------------------------------------------------
BLS12_PAIRINGCHECK   1       230.31 MGas/s        2132.514 ops/s       468930 ns/op      1544885 CPU cycles (approx)
BLS12_PAIRINGCHECK   2       236.84 MGas/s        1568.453 ops/s       637571 ns/op      2100483 CPU cycles (approx)
BLS12_PAIRINGCHECK   3       237.00 MGas/s        1221.637 ops/s       818574 ns/op      2696796 CPU cycles (approx)
BLS12_PAIRINGCHECK   4       238.23 MGas/s        1005.195 ops/s       994832 ns/op      3277480 CPU cycles (approx)
BLS12_PAIRINGCHECK   5       237.45 MGas/s         848.035 ops/s      1179196 ns/op      3884872 CPU cycles (approx)
BLS12_PAIRINGCHECK   6       201.89 MGas/s         625.056 ops/s      1599857 ns/op      5270568 CPU cycles (approx)
BLS12_PAIRINGCHECK   7       223.64 MGas/s         611.036 ops/s      1636565 ns/op      5391678 CPU cycles (approx)
BLS12_PAIRINGCHECK   8       224.13 MGas/s         548.006 ops/s      1824797 ns/op      6011817 CPU cycles (approx)
--------------------------------------------------------------------------------------------------------------------
BLS12_G1MSM   2              121.74 MGas/s        5712.392 ops/s       175058 ns/op       576717 CPU cycles (approx)
BLS12_G1MSM   4              102.15 MGas/s        3320.152 ops/s       301191 ns/op       992263 CPU cycles (approx)
BLS12_G1MSM   8               81.67 MGas/s        1878.086 ops/s       532457 ns/op      1754181 CPU cycles (approx)
BLS12_G1MSM  16               67.23 MGas/s        1048.407 ops/s       953828 ns/op      3142392 CPU cycles (approx)
BLS12_G1MSM  32               59.01 MGas/s         571.284 ops/s      1750442 ns/op      5766852 CPU cycles (approx)
BLS12_G1MSM  64               51.40 MGas/s         301.480 ops/s      3316972 ns/op     10927814 CPU cycles (approx)
BLS12_G1MSM 128               42.37 MGas/s         158.535 ops/s      6307740 ns/op     20780816 CPU cycles (approx)
--------------------------------------------------------------------------------------------------------------------
BLS12_G2MSM   2              274.23 MGas/s        3431.309 ops/s       291434 ns/op       960108 CPU cycles (approx)
BLS12_G2MSM   4              168.31 MGas/s        1458.762 ops/s       685513 ns/op      2258411 CPU cycles (approx)
BLS12_G2MSM   8              177.98 MGas/s        1091.353 ops/s       916294 ns/op      3018711 CPU cycles (approx)
BLS12_G2MSM  16              150.39 MGas/s         625.367 ops/s      1599062 ns/op      5268087 CPU cycles (approx)
BLS12_G2MSM  32              136.80 MGas/s         353.166 ops/s      2831527 ns/op      9328393 CPU cycles (approx)
BLS12_G2MSM  64              119.28 MGas/s         186.555 ops/s      5360353 ns/op     17659639 CPU cycles (approx)
BLS12_G2MSM 128              101.59 MGas/s         101.367 ops/s      9865099 ns/op     32500502 CPU cycles (approx)
--------------------------------------------------------------------------------------------------------------------

@mratsim mratsim marked this pull request as ready for review May 1, 2024 12:16
@mratsim mratsim merged commit d7871d7 into master May 1, 2024
12 checks passed
@mratsim mratsim deleted the eip2537 branch May 1, 2024 12:17
@mratsim
Copy link
Owner Author

mratsim commented May 1, 2024

x86 worst case. Macbook Pro 13" from 2015 with i5-5257U (dual-core mobile Broadwell without ADCX/ADOX instructions and compiled without assemby.

image

--------------------------------------------------------------------------------------------------------------------
BLS12_G1ADD                   79.24 MGas/s      158478.605 ops/s         6310 ns/op        17019 CPU cycles (approx)
BLS12_G2ADD                   86.30 MGas/s      107874.865 ops/s         9270 ns/op        25029 CPU cycles (approx)
BLS12_G1MUL                   52.94 MGas/s        4411.875 ops/s       226661 ns/op       611983 CPU cycles (approx)
BLS12_G2MUL                  113.23 MGas/s        2516.312 ops/s       397407 ns/op      1072999 CPU cycles (approx)
BLS12_MAP_FP_TO_G1            59.91 MGas/s       10892.416 ops/s        91807 ns/op       247878 CPU cycles (approx)
BLS12_MAP_FP2_TO_G2          231.85 MGas/s        3091.372 ops/s       323481 ns/op       873399 CPU cycles (approx)
--------------------------------------------------------------------------------------------------------------------
BLS12_PAIRINGCHECK   1        79.50 MGas/s         736.127 ops/s      1358461 ns/op      3667743 CPU cycles (approx)
BLS12_PAIRINGCHECK   2        70.75 MGas/s         468.521 ops/s      2134377 ns/op      5762346 CPU cycles (approx)
BLS12_PAIRINGCHECK   3        78.84 MGas/s         406.370 ops/s      2460812 ns/op      6644092 CPU cycles (approx)
BLS12_PAIRINGCHECK   4        77.35 MGas/s         326.389 ops/s      3063828 ns/op      8272202 CPU cycles (approx)
BLS12_PAIRINGCHECK   5        73.62 MGas/s         262.940 ops/s      3803153 ns/op     10268383 CPU cycles (approx)
BLS12_PAIRINGCHECK   6        74.92 MGas/s         231.954 ops/s      4311203 ns/op     11639867 CPU cycles (approx)
BLS12_PAIRINGCHECK   7        74.02 MGas/s         202.239 ops/s      4944633 ns/op     13350277 CPU cycles (approx)
BLS12_PAIRINGCHECK   8        73.98 MGas/s         180.873 ops/s      5528734 ns/op     14927373 CPU cycles (approx)
--------------------------------------------------------------------------------------------------------------------
BLS12_G1MSM   2               39.46 MGas/s        1851.773 ops/s       540023 ns/op      1457370 CPU cycles (approx)
BLS12_G1MSM   4               31.77 MGas/s        1032.714 ops/s       968322 ns/op      2614367 CPU cycles (approx)
BLS12_G1MSM   8               28.10 MGas/s         646.136 ops/s      1547662 ns/op      4178494 CPU cycles (approx)
BLS12_G1MSM  16               22.78 MGas/s         355.252 ops/s      2814907 ns/op      7599937 CPU cycles (approx)
BLS12_G1MSM  32               20.42 MGas/s         197.672 ops/s      5058876 ns/op     13658562 CPU cycles (approx)
BLS12_G1MSM  64               18.18 MGas/s         106.635 ops/s      9377765 ns/op     25319843 CPU cycles (approx)
BLS12_G1MSM 128               15.22 MGas/s          56.936 ops/s     17563428 ns/op     47421045 CPU cycles (approx)
--------------------------------------------------------------------------------------------------------------------
BLS12_G2MSM   2               85.47 MGas/s        1069.452 ops/s       935058 ns/op      2524473 CPU cycles (approx)
BLS12_G2MSM   4               69.02 MGas/s         598.208 ops/s      1671658 ns/op      4513355 CPU cycles (approx)
BLS12_G2MSM   8               55.77 MGas/s         341.953 ops/s      2924378 ns/op      7895732 CPU cycles (approx)
BLS12_G2MSM  16               48.60 MGas/s         202.089 ops/s      4948320 ns/op     13360281 CPU cycles (approx)
BLS12_G2MSM  32               44.21 MGas/s         114.143 ops/s      8760961 ns/op     23654460 CPU cycles (approx)
BLS12_G2MSM  64               39.68 MGas/s          62.057 ops/s     16114098 ns/op     43507952 CPU cycles (approx)
BLS12_G2MSM 128               33.94 MGas/s          33.866 ops/s     29527769 ns/op     79724878 CPU cycles (approx)
--------------------------------------------------------------------------------------------------------------------

@mratsim
Copy link
Owner Author

mratsim commented May 1, 2024

ARM 64-bit worst case, Raspberry Pi 4, without assembly. And also without any add-with-carry intrinsics, meaning cost is 3 times bigger (main addition, comparison, carry addition) than possible. See also compiler woes #357, https://gcc.godbolt.org/z/jdecvffaP.

image

--------------------------------------------------------------------------------------------------------------------                                                 
BLS12_G1ADD                   21.62 MGas/s       43239.504 ops/s        23127 ns/op                                                                                  
BLS12_G2ADD                   21.64 MGas/s       27044.569 ops/s        36976 ns/op                                                                                  
BLS12_G1MUL                   10.53 MGas/s         877.837 ops/s      1139164 ns/op
BLS12_G2MUL                   23.59 MGas/s         524.237 ops/s      1907533 ns/op
BLS12_MAP_FP_TO_G1            11.99 MGas/s        2180.573 ops/s       458595 ns/op
BLS12_MAP_FP2_TO_G2           47.34 MGas/s         631.257 ops/s      1584142 ns/op
--------------------------------------------------------------------------------------------------------------------
BLS12_PAIRINGCHECK   1        16.03 MGas/s         148.469 ops/s      6735393 ns/op
BLS12_PAIRINGCHECK   2        16.02 MGas/s         106.095 ops/s      9425528 ns/op
BLS12_PAIRINGCHECK   3        15.92 MGas/s          82.062 ops/s     12185853 ns/op
BLS12_PAIRINGCHECK   4        15.95 MGas/s          67.282 ops/s     14862870 ns/op
BLS12_PAIRINGCHECK   5        15.89 MGas/s          56.747 ops/s     17621982 ns/op
BLS12_PAIRINGCHECK   6        15.91 MGas/s          49.262 ops/s     20299681 ns/op
BLS12_PAIRINGCHECK   7        15.87 MGas/s          43.363 ops/s     23060922 ns/op
BLS12_PAIRINGCHECK   8        15.89 MGas/s          38.851 ops/s     25739113 ns/op
--------------------------------------------------------------------------------------------------------------------
BLS12_G1MSM   2                8.52 MGas/s         399.889 ops/s      2500694 ns/op
BLS12_G1MSM   4                7.17 MGas/s         233.071 ops/s      4290542 ns/op
BLS12_G1MSM   8                5.75 MGas/s         132.272 ops/s      7560162 ns/op
BLS12_G1MSM  16                4.71 MGas/s          73.425 ops/s     13619290 ns/op
BLS12_G1MSM  32                4.13 MGas/s          39.989 ops/s     25006822 ns/op
BLS12_G1MSM  64                3.60 MGas/s          21.100 ops/s     47394327 ns/op
BLS12_G1MSM 128                2.98 MGas/s          11.134 ops/s     89812356 ns/op
--------------------------------------------------------------------------------------------------------------------
BLS12_G2MSM   2               18.68 MGas/s         233.735 ops/s      4278343 ns/op
BLS12_G2MSM   4               15.81 MGas/s         137.067 ops/s      7295700 ns/op
BLS12_G2MSM   8               12.30 MGas/s          75.418 ops/s     13259367 ns/op
BLS12_G2MSM  16               10.43 MGas/s          43.388 ops/s     23047767 ns/op
BLS12_G2MSM  32                9.51 MGas/s          24.558 ops/s     40719321 ns/op
BLS12_G2MSM  64                8.22 MGas/s          12.855 ops/s     77791982 ns/op
BLS12_G2MSM 128                7.00 MGas/s           6.984 ops/s    143185138 ns/op
--------------------------------------------------------------------------------------------------------------------

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant