Low performance for the amd epic 7351p #91

zeno39 · 2019-06-28T10:10:19Z

Hi tevador, I have a question. I have a AMD epic 7351p 16core 32 thread 8 channels.
If i launch the script randomX i have juste 4000h/s and max 5000
I have 256 go in ddr4 i think it's not a problem, do you have any idea ? Probably my configuration is not good

tevador · 2019-06-28T10:13:27Z

Please post the full command line you use for testing.

zeno39 · 2019-06-28T10:21:22Z

randomx-benchmark.exe --mine --init 16 --threads 32 --nonces 100000 --largePages --jit

SChernykh · 2019-06-28T10:24:53Z

EPYC CPUs are 4 NUMA nodes per socket IIRC. You need to run 4 benchmark instances, each with 8 threads and assigned to corresponding NUMA node.

zeno39 · 2019-06-28T10:25:45Z

Assigned where ?

SChernykh · 2019-06-28T10:27:04Z

numactl on Linux, I don't know how it's done on Windows.

zeno39 · 2019-06-28T10:27:21Z

4core ans 8 thread per instance ?

tevador · 2019-06-28T10:31:46Z

Your CPU consists of 4 NUMA nodes, each node being 4 cores.

Unfortunately, the bechmark doesn't support running in NUMA mode at the moment (see issue #22), but you can estimate the performance by running only 1 node and multiplying by 4:

randomx-benchmark.exe --mine --init 4 --threads 4 --affinity 170 --nonces 10000 --largePages --jit

This doesn't need to give optimal performance either, it depends on how Windows will allocate the memory.

zeno39 · 2019-06-28T10:37:15Z

I have juste 681 h/s

SChernykh · 2019-06-28T10:40:42Z

Try --affinity 255 and everything else the same.

tevador · 2019-06-28T10:42:00Z

Sorry, should be --affinity 170.

zeno39 · 2019-06-28T10:44:21Z

1000 h/s

zeno39 · 2019-06-28T10:45:49Z

--seed maybe ? And i have 32 threads ans 64mo why i use just 4 threads per launcher ?

tevador · 2019-06-28T10:49:23Z

You can try --threads 8 --affinity 255

zeno39 · 2019-06-28T10:51:40Z

1345

zeno39 · 2019-06-28T11:34:27Z

I think i have to wait for the adapted scripts

mistfpga · 2019-07-03T09:10:15Z

To force an application to use NUMA on windows server / win 10 (i think, I have not double checked, but from win 7 onwards NUMA has been a part of windows)

The most important part is to read the example of how to use CoreInfo

This also article references SSAS but dont worry about it, the Windows System Resource Manager/WSRM, or the hyper-x stuff in that article are not relevant.

https://techcommunity.microsoft.com/t5/DataCAT/Forcing-NUMA-Node-affinity-for-Analysis-Services-Tabular/ba-p/305188

CoreInfo - https://docs.microsoft.com/en-gb/sysinternals/downloads/coreinfo

This shows what cores are assigned to what nodes and roughly what performance costs for each cpu to access each bank. The source code to this might be available from before microsoft acquired sysinternals.

Once you have done this, either apply this hotfix (if it applies to you) or skip this step if it doesn't.
https://support.microsoft.com/en-gb/help/2028687/you-cannot-specify-a-numa-node-when-you-create-a-process-by-using-the

Now you can run benchmark instances per node with something like

start /NUMA [n] "Numa" cmd /k benchmark.exe --mine --jit --largePages --init 8 --threads 8 --nonces 10000

For each NUMA node you have. Also adjust the --init and --threads for your hardware.
the /k parameter means the cmd prompt will stay open even after the benchmark has finished.

This might be all you need but you will probably need to set the affinity too. You can work out which cores are on which banks from CoreInfo.

Then use a command like:

start /AFFINITY [h] /NUMA [n] "Numa [n]" cmd /k benchmark.exe --mine --jit --largePages --init 8 --threads 8 --nonces 10000

Again for each node you want to have where [h] is the hex mask and [n] is the numa node.

The affinity hex mask not a direct cpu number. So to go over the full range of possible combinations on my 2c/4t thread processor, 0 to F are acceptable values for the parameter. This represents the range 0001 to 1111. 0001 = one core 1 thread and 1111 = 2 cores 4 threads.

Now you can put it all in a batch file with one line per node.

Note: You might need to fiddle with your largePages set up too, not 100% sure on that yet.

Jabroni · 2019-07-12T04:55:41Z

Heres my results of the benchmark for a 7351p

OS: Proxmox 5.3
Memory: 8x 16GB 2400Mhz (so all slots are used)

Ran directly on hypverisor OS

sudo sysctl -w vm.nr_hugepages=4800
seq 0 3 | xargs -P 0 -I node numactl -N node ./randomx-benchmark --mine --largePages --jit --nonces 100000 --init 8 --threads 8

My results were

Calculated result: d6660144e9a2e68bf47d7cc8afc206672e72f82dfff69fe0d974531e85f7504f
Performance: 2645.11 hashes per second
Calculated result: d6660144e9a2e68bf47d7cc8afc206672e72f82dfff69fe0d974531e85f7504f
Performance: 2646.72 hashes per second
Calculated result: d6660144e9a2e68bf47d7cc8afc206672e72f82dfff69fe0d974531e85f7504f
Performance: 2637.67 hashes per second
Calculated result: d6660144e9a2e68bf47d7cc8afc206672e72f82dfff69fe0d974531e85f7504f
Performance: 2173.49 hashes per second
`

So thats ~10,100 H/s if I sum the result of each NUMA job

russoj88 · 2019-07-22T18:54:13Z

Calculated result: d6660144e9a2e68bf47d7cc8afc206672e72f82dfff69fe0d974531e85f7504f
Performance: 2645.11 hashes per second
Calculated result: d6660144e9a2e68bf47d7cc8afc206672e72f82dfff69fe0d974531e85f7504f
Performance: 2646.72 hashes per second
Calculated result: d6660144e9a2e68bf47d7cc8afc206672e72f82dfff69fe0d974531e85f7504f
Performance: 2637.67 hashes per second
Calculated result: d6660144e9a2e68bf47d7cc8afc206672e72f82dfff69fe0d974531e85f7504f
Performance: 2173.49 hashes per second

Any idea why the fourth node had worse performance?

Jabroni · 2019-07-22T18:58:35Z

Any idea why the fourth node had worse performance?

Could be because I did had some vms and dockers running while doing these benchmarks, its my home server so those vms and dockers load is low, but that could count on the difference

russoj88 · 2019-07-22T19:05:05Z

Thanks for the reply. I assume getting performance for a node and then multiplying by the node count is accurate (enough), but want to make sure.

tevador added the invalid This doesn't seem right label Aug 30, 2019

tevador closed this as completed Sep 27, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Low performance for the amd epic 7351p #91

Low performance for the amd epic 7351p #91

zeno39 commented Jun 28, 2019

tevador commented Jun 28, 2019

zeno39 commented Jun 28, 2019

SChernykh commented Jun 28, 2019

zeno39 commented Jun 28, 2019

SChernykh commented Jun 28, 2019

zeno39 commented Jun 28, 2019

tevador commented Jun 28, 2019 •

edited

Loading

zeno39 commented Jun 28, 2019

SChernykh commented Jun 28, 2019

tevador commented Jun 28, 2019

zeno39 commented Jun 28, 2019

zeno39 commented Jun 28, 2019

tevador commented Jun 28, 2019

zeno39 commented Jun 28, 2019

zeno39 commented Jun 28, 2019

mistfpga commented Jul 3, 2019 •

edited

Loading

Jabroni commented Jul 12, 2019 •

edited

Loading

russoj88 commented Jul 22, 2019

Jabroni commented Jul 22, 2019

russoj88 commented Jul 22, 2019

Low performance for the amd epic 7351p #91

Low performance for the amd epic 7351p #91

Comments

zeno39 commented Jun 28, 2019

tevador commented Jun 28, 2019

zeno39 commented Jun 28, 2019

SChernykh commented Jun 28, 2019

zeno39 commented Jun 28, 2019

SChernykh commented Jun 28, 2019

zeno39 commented Jun 28, 2019

tevador commented Jun 28, 2019 • edited Loading

zeno39 commented Jun 28, 2019

SChernykh commented Jun 28, 2019

tevador commented Jun 28, 2019

zeno39 commented Jun 28, 2019

zeno39 commented Jun 28, 2019

tevador commented Jun 28, 2019

zeno39 commented Jun 28, 2019

zeno39 commented Jun 28, 2019

mistfpga commented Jul 3, 2019 • edited Loading

Jabroni commented Jul 12, 2019 • edited Loading

russoj88 commented Jul 22, 2019

Jabroni commented Jul 22, 2019

russoj88 commented Jul 22, 2019

tevador commented Jun 28, 2019 •

edited

Loading

mistfpga commented Jul 3, 2019 •

edited

Loading

Jabroni commented Jul 12, 2019 •

edited

Loading