argon2: parallel memory view abstraction #380

tarcieri · 2023-03-04T17:54:17Z

The previous parallel implementation of Argon2 was removed in #247. There were soundness concerns about how it handled mutable aliasing, namely that it's unsound for mutable aliases to exist even if they aren't accessed.

This is a discussion issue for how to build a sound abstraction for use in a new parallel implementation of Argon2. Please make sure to read through this issue before attempting a new parallel implementation.

Argon2 memory layout

The memory layout used by Argon2 is somewhat difficult to model in Rust due to how it's arranged, though this choice of arrangement is due to its goals of sequential memory hardness.

https://datatracker.ietf.org/doc/html/rfc9106#section-3.4

3.4. Indexing

To enable parallel block computation, we further partition the memory
matrix into SL = 4 vertical slices. The intersection of a slice and a
lane is called a segment, which has a length of q/SL. Segments of the
same slice can be computed in parallel and do not reference blocks
from each other. All other blocks can be referenced.

    slice 0    slice 1    slice 2    slice 3
    ___/\___   ___/\___   ___/\___   ___/\___
   /        \ /        \ /        \ /        \
  +----------+----------+----------+----------+
  |          |          |          |          | > lane 0
  +----------+----------+----------+----------+
  |          |          |          |          | > lane 1
  +----------+----------+----------+----------+
  |          |          |          |          | > lane 2
  +----------+----------+----------+----------+
  |         ...        ...        ...         | ...
  +----------+----------+----------+----------+
  |          |          |          |          | > lane p - 1
  +----------+----------+----------+----------+
Figure 9: Single-Pass Argon2 with p Lanes and 4 Slices

(Note: since slices are an important concept in Rust, to disambiguate I'll throw scare quotes on Argon2 "slices")

The parallel implementation is described as follows in the Argon2 paper:

https://www.password-hashing.net/argon2-specs.pdf

We suggest the following solution for p cores: the entire memory is split into p lanes of l equal slices each,
which can be viewed as elements of a (p×l)-matrix Q[i][j]. Consider the class of schemes given by Equation (1).

We modify it as follows:
• p invocations to H run in parallel on the first column Q[∗][0] of the memory matrix. Their indexing
functions refer to their own slices only;
• For each column j > 0, l invocations to H continue to run in parallel, but the indexing functions now may
refer not only to their own slice, but also to all jp slices of previous columns Q[∗][0], Q[∗][1], . . . , Q[∗][j−1].
• The last blocks produced in each slice of the last column are XORed.

This scheme prevents mutable aliasing because for each thread in p lanes, it has exclusive mutable access to segment j in its own lane, but has a shared view of the segments of previous lanes.

To "rustify" the above diagram, for j = 2 it would allow the following access:

    slice 0    slice 1      slice 2      slice 3
    ___/\___   ___/\___   _____/\_____   ___/\___
   /        \ /        \ /            \ /        \
  +----------+----------+--------------+----------+
  | &[Block] | &[Block] | &mut [Block] |          | > lane 0
  +----------+----------+--------------+----------+
  | &[Block] | &[Block] | &mut [Block] |          | > lane 1
  +----------+----------+--------------+----------+
  | &[Block] | &[Block] | &mut [Block] |          | > lane 2
  +----------+----------+--------------+----------+

Which is to say, each thread has exclusive &mut [Block] access to the jth segment in their lane, and shared &[Block] access to all segments in all lanes for previous "slices" prior to j.

To make this work, I would suggest having a type like struct Memory which defines iterators that can hand out slice-and-lane-specific "views" of memory while ensuring no mutable aliasing occurs.

This type can use dynamically checked borrow rules to avoid invalid accesses, e.g. if j were 2 and an immutable view of a segment in "slice" 2 were requested, this would cause an error. All of the borrows (i.e. the ones held by the "views") would have to be returned before j could be advanced, at which point a new "view" can be constructed per-lane for the next iteration of j.

The text was updated successfully, but these errors were encountered:

tarcieri · 2023-03-08T00:19:08Z

More concretely, I think a prospective Memory type needs a sort of "lending iterator" which returns a set of p memory views and then increments j. Those views would be valid until j is next incremented.

The reason for this is the &mut [Block] column above: those mutable borrows need to all be returned before j can be incremented, since after j is incremented they can be aliased as &[Block] borrows. So at some point a &mut -> & handoff needs to occur. For this reason I'm suggesting those borrows only be valid for a single iteration of j.

This was referenced Mar 5, 2023

argon2: make parallel implementation safe #154

Closed

argon2 v0.5.0 #391

Merged

tarcieri mentioned this issue Feb 3, 2024

Reduce rust-version #487

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

argon2: parallel memory view abstraction #380

argon2: parallel memory view abstraction #380

tarcieri commented Mar 4, 2023 •

edited

Loading

tarcieri commented Mar 8, 2023

argon2: parallel memory view abstraction #380

argon2: parallel memory view abstraction #380

Comments

tarcieri commented Mar 4, 2023 • edited Loading

Argon2 memory layout

tarcieri commented Mar 8, 2023

tarcieri commented Mar 4, 2023 •

edited

Loading