You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It's very subtle, but if you look at the lines: "Working Memory... 5.06 GiB (locked, spread: 50%/2)"
...
It's very well distributed in the 2nd and 3rd runs, but poorly distributed in the 1st and 4th runs.
I can't explain why the 1st and 4th runs are so poor. The program tries its best to evenly spread out the memory, but this isn't always possible if one or more of the nodes is out of memory.
The curious thing here is that it's either 100% or 50%. That corresponds to perfect distribution and 3-to-1 distribution across the nodes. (3x more memory on one node than the other)
This seems too "round" to be a coincidence. Running out of memory one one node wouldn't explain this.
I've never observed this on my dated quad-opteron. And unfortunately, I do not have access to a Threadripper system. So this might take a while to track down.
The text was updated successfully, but these errors were encountered:
Had a discussion with Oliver Kruse. And while he wasn't able to reproduce it with a 1950X, he did bring up a point which seems to be the likely cause of this on the 2990WX. So huge thanks to him!
The screenshots on the forum post show that Windows (and thus y-cruncher) reads the hardware as 4 NUMA nodes despite there being only 2 memory domains.
Windows uses the CPU topology to define nodes. And since the 2990WX has 4 dies, it has 4 nodes. 2 of them have memory, the other 2 don't.
y-cruncher reads the hardware as having 4 NUMA nodes and attempts to allocate memory evenly across the 4 nodes. These allocations are done using VirtualAllocExNuma().
However, 2 of the nodes have no memory. Therefore VirtualAllocExNuma() cannot satisfy the nndPreferred parameter to bind to the nodes with no memory. So instead, it silently (and seemingly randomly) binds the memory to one of the two nodes that do have memory.
If the empty nodes gets bound to different nodes, the distribution will be perfect (100%/2). If they both get bound to the same node, the memory distribution will be 3-to-1 - thus giving the 50%/2.
The solution is to exclude NUMA nodes that don't have memory. This should take care of Threadripper and other similar cases. But it won't solve the more general case of heterogeneous systems.
This is easy to do on Windows. But Linux will take some more investigation.
A temporary work-around is to manually select the NUMA nodes in memory allocator. You will need to experiment to see which 2 nodes of the 4 are the ones with memory.
For reference: https://www.overclockers.com/forums/showthread.php/792068-Marathon-Season-VII-October-y-cruncher-Pi-1b?p=8090577&viewfull=1#post8090577
And my response: https://www.overclockers.com/forums/showthread.php/792068-Marathon-Season-VII-October-y-cruncher-Pi-1b?p=8090579&viewfull=1#post8090579
The curious thing here is that it's either 100% or 50%. That corresponds to perfect distribution and 3-to-1 distribution across the nodes. (3x more memory on one node than the other)
This seems too "round" to be a coincidence. Running out of memory one one node wouldn't explain this.
I've never observed this on my dated quad-opteron. And unfortunately, I do not have access to a Threadripper system. So this might take a while to track down.
The text was updated successfully, but these errors were encountered: