You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The previous parallel implementation of Argon2 was removed in #247. There were soundness concerns about how it handled mutable aliasing, namely that it's unsound for mutable aliases to exist even if they aren't accessed.
This is a discussion issue for how to build a sound abstraction for use in a new parallel implementation of Argon2. Please make sure to read through this issue before attempting a new parallel implementation.
Argon2 memory layout
The memory layout used by Argon2 is somewhat difficult to model in Rust due to how it's arranged, though this choice of arrangement is due to its goals of sequential memory hardness.
3.4. Indexing
To enable parallel block computation, we further partition the memory
matrix into SL = 4 vertical slices. The intersection of a slice and a
lane is called a segment, which has a length of q/SL. Segments of the
same slice can be computed in parallel and do not reference blocks
from each other. All other blocks can be referenced.
slice 0 slice 1 slice 2 slice 3
___/\___ ___/\___ ___/\___ ___/\___
/ \ / \ / \ / \
+----------+----------+----------+----------+
| | | | | > lane 0
+----------+----------+----------+----------+
| | | | | > lane 1
+----------+----------+----------+----------+
| | | | | > lane 2
+----------+----------+----------+----------+
| ... ... ... | ...
+----------+----------+----------+----------+
| | | | | > lane p - 1
+----------+----------+----------+----------+
Figure 9: Single-Pass Argon2 with p Lanes and 4 Slices
(Note: since slices are an important concept in Rust, to disambiguate I'll throw scare quotes on Argon2 "slices")
The parallel implementation is described as follows in the Argon2 paper:
We suggest the following solution for p cores: the entire memory is split into p lanes of l equal slices each,
which can be viewed as elements of a (p×l)-matrix Q[i][j]. Consider the class of schemes given by Equation (1).
We modify it as follows:
• p invocations to H run in parallel on the first column Q[∗][0] of the memory matrix. Their indexing
functions refer to their own slices only;
• For each column j > 0, l invocations to H continue to run in parallel, but the indexing functions now may
refer not only to their own slice, but also to all jp slices of previous columns Q[∗][0], Q[∗][1], . . . , Q[∗][j−1].
• The last blocks produced in each slice of the last column are XORed.
This scheme prevents mutable aliasing because for each thread in p lanes, it has exclusive mutable access to segment j in its own lane, but has a shared view of the segments of previous lanes.
To "rustify" the above diagram, for j = 2 it would allow the following access:
Which is to say, each thread has exclusive &mut [Block] access to the jth segment in their lane, and shared &[Block] access to all segments in all lanes for previous "slices" prior to j.
To make this work, I would suggest having a type like struct Memory which defines iterators that can hand out slice-and-lane-specific "views" of memory while ensuring no mutable aliasing occurs.
This type can use dynamically checked borrow rules to avoid invalid accesses, e.g. if j were 2 and an immutable view of a segment in "slice" 2 were requested, this would cause an error. All of the borrows (i.e. the ones held by the "views") would have to be returned before j could be advanced, at which point a new "view" can be constructed per-lane for the next iteration of j.
The text was updated successfully, but these errors were encountered:
More concretely, I think a prospective Memory type needs a sort of "lending iterator" which returns a set of p memory views and then increments j. Those views would be valid until j is next incremented.
The reason for this is the &mut [Block] column above: those mutable borrows need to all be returned before j can be incremented, since after j is incremented they can be aliased as &[Block] borrows. So at some point a &mut -> & handoff needs to occur. For this reason I'm suggesting those borrows only be valid for a single iteration of j.
The previous parallel implementation of Argon2 was removed in #247. There were soundness concerns about how it handled mutable aliasing, namely that it's unsound for mutable aliases to exist even if they aren't accessed.
This is a discussion issue for how to build a sound abstraction for use in a new parallel implementation of Argon2. Please make sure to read through this issue before attempting a new parallel implementation.
Argon2 memory layout
The memory layout used by Argon2 is somewhat difficult to model in Rust due to how it's arranged, though this choice of arrangement is due to its goals of sequential memory hardness.
https://datatracker.ietf.org/doc/html/rfc9106#section-3.4
(Note: since slices are an important concept in Rust, to disambiguate I'll throw scare quotes on Argon2 "slices")
The parallel implementation is described as follows in the Argon2 paper:
https://www.password-hashing.net/argon2-specs.pdf
This scheme prevents mutable aliasing because for each thread in
p
lanes, it has exclusive mutable access to segmentj
in its own lane, but has a shared view of the segments of previous lanes.To "rustify" the above diagram, for
j = 2
it would allow the following access:Which is to say, each thread has exclusive
&mut [Block]
access to thej
th segment in their lane, and shared&[Block]
access to all segments in all lanes for previous "slices" prior toj
.To make this work, I would suggest having a type like
struct Memory
which defines iterators that can hand out slice-and-lane-specific "views" of memory while ensuring no mutable aliasing occurs.This type can use dynamically checked borrow rules to avoid invalid accesses, e.g. if
j
were 2 and an immutable view of a segment in "slice" 2 were requested, this would cause an error. All of the borrows (i.e. the ones held by the "views") would have to be returned beforej
could be advanced, at which point a new "view" can be constructed per-lane for the next iteration ofj
.The text was updated successfully, but these errors were encountered: