-
Notifications
You must be signed in to change notification settings - Fork 313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Low performance for the amd epic 7351p #91
Comments
Please post the full command line you use for testing. |
randomx-benchmark.exe --mine --init 16 --threads 32 --nonces 100000 --largePages --jit |
EPYC CPUs are 4 NUMA nodes per socket IIRC. You need to run 4 benchmark instances, each with 8 threads and assigned to corresponding NUMA node. |
Assigned where ? |
numactl on Linux, I don't know how it's done on Windows. |
4core ans 8 thread per instance ? |
Your CPU consists of 4 NUMA nodes, each node being 4 cores. Unfortunately, the bechmark doesn't support running in NUMA mode at the moment (see issue #22), but you can estimate the performance by running only 1 node and multiplying by 4:
This doesn't need to give optimal performance either, it depends on how Windows will allocate the memory. |
I have juste 681 h/s |
Try |
Sorry, should be |
1000 h/s |
--seed maybe ? And i have 32 threads ans 64mo why i use just 4 threads per launcher ? |
You can try |
1345 |
I think i have to wait for the adapted scripts |
To force an application to use NUMA on windows server / win 10 (i think, I have not double checked, but from win 7 onwards NUMA has been a part of windows) The most important part is to read the example of how to use CoreInfo This also article references SSAS but dont worry about it, the Windows System Resource Manager/WSRM, or the hyper-x stuff in that article are not relevant. CoreInfo - https://docs.microsoft.com/en-gb/sysinternals/downloads/coreinfo This shows what cores are assigned to what nodes and roughly what performance costs for each cpu to access each bank. The source code to this might be available from before microsoft acquired sysinternals. Once you have done this, either apply this hotfix (if it applies to you) or skip this step if it doesn't. Now you can run benchmark instances per node with something like
For each NUMA node you have. Also adjust the --init and --threads for your hardware. This might be all you need but you will probably need to set the affinity too. You can work out which cores are on which banks from CoreInfo. Then use a command like:
Again for each node you want to have where [h] is the hex mask and [n] is the numa node. The affinity hex mask not a direct cpu number. So to go over the full range of possible combinations on my 2c/4t thread processor, 0 to F are acceptable values for the parameter. This represents the range 0001 to 1111. 0001 = one core 1 thread and 1111 = 2 cores 4 threads. Now you can put it all in a batch file with one line per node. Note: You might need to fiddle with your largePages set up too, not 100% sure on that yet. |
Heres my results of the benchmark for a 7351p OS: Proxmox 5.3 Ran directly on hypverisor OS
My results were Calculated result: d6660144e9a2e68bf47d7cc8afc206672e72f82dfff69fe0d974531e85f7504f So thats ~10,100 H/s if I sum the result of each NUMA job |
Any idea why the fourth node had worse performance? |
Could be because I did had some vms and dockers running while doing these benchmarks, its my home server so those vms and dockers load is low, but that could count on the difference |
Thanks for the reply. I assume getting performance for a node and then multiplying by the node count is accurate (enough), but want to make sure. |
Hi tevador, I have a question. I have a AMD epic 7351p 16core 32 thread 8 channels.
If i launch the script randomX i have juste 4000h/s and max 5000
I have 256 go in ddr4 i think it's not a problem, do you have any idea ? Probably my configuration is not good
The text was updated successfully, but these errors were encountered: