Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RandomX v2 virtual machine changes #274

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from
Draft

RandomX v2 virtual machine changes #274

wants to merge 2 commits into from

Conversation

tevador
Copy link
Owner

@tevador tevador commented Sep 8, 2023

  • CFROUND becomes conditional with a 1/16 chance of writing into fprc
  • F and E registers are mixed together with AES instead of XOR

This PR is incomplete. Currently, only the X86 and portable versions work, hardware AES is needed with JIT and the changes are hardcoded. But it's enough to run some benchmarks.

  • CFROUND changes in the interpreter
  • CFROUND changes in X86 JIT compiler
  • CFROUND changes in A64 JIT compiler
  • V1/V2 selectable at runtime
  • New portable intrinsics
  • New X86 intrinsics
  • New ARM64 intrinsics
  • New PowerPC intrinsics
  • Support for soft AES in X86 JIT compiler
  • Support for soft AES in A64 JIT compiler
  • Update documentation
  • Update tests

    * CFROUND becomes conditional with a 1/16 chance of writing into fprc
    * F and E registers are mixed together with AES instead of XOR
@SChernykh
Copy link
Collaborator

SChernykh commented Sep 9, 2023

Tested on Ryzen 7 1700 (Zen 1) with 2 threads running on the same core:

Algorithm Hashrate
RandomX 926.4 h/s
RandomX + CFROUND tweak 1004.2 h/s
RandomX v2 (CFROUND and AES tweaks) 1003.8 h/s

Summary for those who didn't read discussions on IRC:

  • CFROUND tweak makes RandomX more efficient (8.4% hashrate increase on Zen 1, expected 5-10% hashrate increase on other AMD CPUs)
  • AES tweak doubles the amount of AES computations per hash without hurting the hashrate (it uses the gap in RandomX main loop where CPU was sitting idle, waiting for scratchpad data)
  • AES tweak also introduces AES in the main RandomX loop which makes it harder for specialized hardware to get away with just a dedicated circuit for scratchpad intialization - AES must be implemented as a part of RandomX VM and work with RandomX VM's registers
  • AES tweak also improves data entropy (makes it more random) before it's written to the scratchpad

@Gingeropolous
Copy link

RandomX V2 tests
git clone https://github.com/tevador/RandomX.git
cd RandomX
mkdir build && cd build
cmake -DARCH=native ..
make

./randomx-benchmark --mine --jit --largePages --threads 2 --affinity 3 --init 16

cd ..
git pull origin pull/274/head
cd buid
cmake -DARCH=native ..
make

./randomx-benchmark --mine --jit --largePages --threads 2 --affinity 3 --init 16

threadripper 3970x
Standard
Performance: 1191.67 hashes per second

New:
Performance: 1250.43 hashes per second

5900x
Old
Performance: 1525.68 hashes per second

New
Performance: 1645.73 hashes per second

3900x
Old
Performance: 1454.24 hashes per second

New
Performance: 1561.44 hashes per second

model name : Intel(R) Core(TM) i7-6820HQ CPU @ 2.70GHz
Old
Performance: 375.845 hashes per second

New
Performance: 374.699 hashes per second

model name : Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz
Old
Performance: 1474.77 hashes per second

New
Performance: 1472.4 hashes per second

@SChernykh
Copy link
Collaborator

Ryzen 7 1700 in single thread mode: old 664.3 h/s, new 736.2 h/s.

@Gingeropolous
Copy link

Gingeropolous commented Sep 9, 2023

model name : Intel(R) Core(TM) i7-6820HQ CPU @ 2.70GHz

(this time Unthrottled)

Old
Performance: 1250.65 hashes per second

New:
Performance: 1225.8 hashes per second

--mine --jit --largePages --threads 1 --affinity 1 --init 16

Single thread:

Old:
Performance: 655.031 hashes per second

New:
Performance: 641.192 hashes per second

model name : Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz
Single thread:
Old
Performance: 743.708 hashes per second

New:
Performance: 739.415 hashes per second

Per @SChernykh suggestion, ran tests 5 times and picked highest:
(for i7-7700K)
Old
Performance: 747.852 hashes per second

New
746.367 hashes per second on v2

@tevador
Copy link
Owner Author

tevador commented Sep 11, 2023

I implemented software AES support in the JIT compiler. To test with software AES, the following line needs to be changed:

using JitCompiler = JitCompilerX86<RANDOMX_FLAG_V2 | RANDOMX_FLAG_HARD_AES>;

Measured with Ryzen 3700X: ./randomx-benchmark --jit --verify --softAes --largePages

Old: 15.2843 ms per hash
New: 17.117 ms per hash

(Ran 5x and took the lowest result.)

So it seems there is a 10-11% performance hit for soft AES systems when doing light verification.

@SChernykh
Copy link
Collaborator

Ryzen 9 7950X: randomx-benchmark --mine --jit --largePages --threads 2 --affinity 3 --init 32

Old 1635 h/s
New 1763 h/s

And no measurable hashrate difference with and without AES tweak.

@SChernykh
Copy link
Collaborator

@tevador Do you need help with aarch64? I can do it because I wrote that code originally, so I'm more familiar with it.

@tevador
Copy link
Owner Author

tevador commented Sep 25, 2023

Yes, it would be great if you could do the changes in the ARM64 JIT. But please wait, I realized the JitCompiler interface needs to be changed because the class cannot be a template. I'm working on a solution that would not cause cascading changes to other classes and it's a bit tricky. But I think updating the ARM assembly code should be safe for you to do now.

@SChernykh
Copy link
Collaborator

Yes, I will only implement CFROUND and AES changes for A64 JIT compiler.

@SChernykh
Copy link
Collaborator

SChernykh commented Sep 26, 2023

Yes, it would be great if you could do the changes in the ARM64 JIT. But please wait, I realized the JitCompiler interface needs to be changed because the class cannot be a template. I'm working on a solution that would not cause cascading changes to other classes and it's a bit tricky. But I think updating the ARM assembly code should be safe for you to do now.

@tevador My WIP is here: https://github.com/SChernykh/RandomX/commits/v2
I think I found a solution for JitCompiler problem you mentioned. And I only have soft AES left to implement.

@selsta
Copy link
Contributor

selsta commented Sep 26, 2023

macOS ARM

v2: 445.702 hashes per second
v1: 424.601 hashes per second

@SChernykh
Copy link
Collaborator

@selsta can you run each test multiple times and take the highest number for v1 and v2? ARM CPUs never run at the same speed in most devices because of power saving.

@selsta
Copy link
Contributor

selsta commented Sep 26, 2023

I did run it multiple times, while there was some variation v2 was always faster by around 15-20h/s.

@SChernykh
Copy link
Collaborator

SChernykh commented Sep 26, 2023

Hmm, that's interesting. So Apple silicon also gets a boost (but only 5%). Is it Apple M1 or M2?

@selsta
Copy link
Contributor

selsta commented Sep 26, 2023

M1 Pro (8 performance cores, 2 efficiency cores)

@SChernykh
Copy link
Collaborator

@tevador aarch64 is ready to be added: https://github.com/SChernykh/RandomX/tree/v2

@SChernykh
Copy link
Collaborator

@tevador I squashed my commits, you can just cherry-pick SChernykh@67d1340 into your PR.

@tevador tevador mentioned this pull request Oct 9, 2023
@blackmennewstyle
Copy link

I can't wait for the RandomX V2 ❤️

@SChernykh
Copy link
Collaborator

@tevador Do you plan to finish it soon? What is left to be done?

@mikevoronov
Copy link

mikevoronov commented Dec 23, 2023

@tevador thank you for your work on the previous and this new version of RandomX!

We're working on decentralized cloud and plan to use RandomX for CPU capacity proof of every core of a capacity provider. Looks like RandomX is the only existing ASIC and GPU resistant solution for this task. We want to launch our network in the nearest future and kinda dependent on this PR. Are there any time estimates for it? How stable is it now and can you recommend to use it for at least x86?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants