Memory footprint of sorting medium-size list with M>= 1 #54
Replies: 6 comments
-
What you are observing is indeed expected behavior. You can also see this for the demo mpyc/demos/sort.py if you run it like this, using
If With For The built-in routine To lower the memory footprint reducing the size of the secure integers should help a bit (default is L=32). There will also be a vectorized version using secure NumPy arrays for sorting, which will reduce the memory footprint considerably. Achieving the same effect as already done for instance for |
Beta Was this translation helpful? Give feedback.
-
Thank you, I better understand the phenomenon now. I would have a follow-up question: is there a way to control the expansion of the circuit? For example, I envision two possibilities:
These solutions create a memory-footprint/round-complexity trade-off which is necessary for large lists. |
Beta Was this translation helpful? Give feedback.
-
The methods I've done a quick experiment already with secure NumPy arrays, and then sorting 1000 numbers with 3 parties takes like 10 seconds. The circuit size also remains much smaller, linear in the depth of the sorting network, so much less of an issue. |
Beta Was this translation helpful? Give feedback.
-
Thanks, these functions should do the job. Wow, this is a nice speed up. Did you just parallelize the batcher sort or did you also switch to a quicksort or a radix sort? |
Beta Was this translation helpful? Give feedback.
-
It's a vectorized version of Batcher's merge-exchange sort algorithm as used by def np_sort(self, a, axis=-1):
if axis is None:
a = self.np_flatten(a)
axis = 0
n = a.shape[axis]
if a.size == 0 or n <= 1:
return a
a = self.np_swapaxes(a, axis, -1) # switch to last axis
t = (n-1).bit_length() # n >= 2, so t >= 1
p = 1 << t-1
while p:
d, q, r = p, 1 << t-1, 0
while d:
I = np.fromiter((i for i in range(n - d) if i & p == r), dtype=int)
b0 = a[..., I]
b1 = a[..., I + d]
h = (b0 >= b1) * (b0 - b1)
b0, b1 = b0 - h, b1 + h
a = self.np_update(a, (..., I), b0)
a = self.np_update(a, (..., I + d), b1)
d, q, r = q - p, q >> 1, p
p >>= 1
a = self.np_swapaxes(a, axis, -1) # restore original axis
return a |
Beta Was this translation helpful? Give feedback.
-
Looks nice! Looking forward to the next release! |
Beta Was this translation helpful? Give feedback.
-
Hello,
I've run some experiments where I sort "medium-size" secure lists (i.e., around 1K elements). For example, see the following snippet:
If I execute my script with no argument or with
-M0
, the script finishes without any issue. If I use-M3
or even-M1
, the Python process takes more and more memory. Sometimes, it even fills my memory entirely for larger lists or larger M. On the other hand, with-M0
, the memory footprint is constant (and much smaller).I suppose this behaviour comes from the secret sharing costs but I wonder whether I can optimize it to reduce the memory footprint. For example, I see several nested loops in the
_sort()
function and wonder whether the coroutine stores the "comparison shares" from all comparison rounds until the end instead of releasing some of them round by round.To sum up, is there a simple trick I missed in the documentation or the code to optimize the memory footprint in these case?
Beta Was this translation helpful? Give feedback.
All reactions