Miden VM consumes programs are structured in a form of an execution graph. The purpose of this graph is twofold:
- To describe program execution semantics.
- To define structure of program hash.
These are related but distinct because two programs which are semantically the same can be reduced to two distinct hashes, depending on how the execution graph is structured.
Program execution graph is constructed from a set of nodes called program blocks. Each program block has its own structure and execution semantics. In the following sections we describe these programs blocks, give a few examples of how program graphs are constructed from these blocks, define methodology for computing program hashes from program graphs, and finally, describe how program hash is computed within the VM.
At the high level, there are two types of program blocks: instruction blocks and control blocks. Both are explained below.
An instruction block is just a sequence of instructions, where each instruction is a tuple (op_code, op_value). For vast majority of instructions op_value = 0
, but there are some instructions where it is not. For example, for a PUSH
instruction, op_value
is set to the value which is to be pushed onto the stack.
Instruction blocks impose the following restrictions on their content:
- Number of instructions in a block must be one less than a multiple of 16 (e.g. 15, 31, 47 etc.).
- An instruction block cannot contain any of the flow control instructions:
BEGIN
,TEND
,FEND
,LOOP
,WRAP
,BREAK
,HACC
, andVOID
. - An instruction which carries non-zero
op_value
must be located at a step which is a multiple of 8 (e.g. 8, 16, 32 etc.).
Due to alignment rules within the VM, the first instruction of an instruction block is guaranteed to be executed on a step which is a multiple of 16.
Control blocks are used to specify flow control logic of a program. Currently, there are 3 types of control blocks: (1) group blocks, (2) switch blocks, and (3) loop blocks. Specifics of each type of these are described below.
A group block is used to group several blocks together, and has the following structure:
Group {
body : Vector<ProgramBlock>,
}
where, ProgramBlock
can be an instruction block or a control block.
Execution semantics of a group block are as follows:
- Each block in the
body
vector is executed one after the other.
Group blocks impose the following restrictions on their content:
- There must be at least one block in the
body
vector. - The first block in the
body
vector must be an instruction block. - An instruction block cannot be followed by another instructions block.
A switch block is used to describe conditional branching (i.e. if/else statements), and has the following structure:
Switch {
true_branch : Vector<ProgramBlock>,
false_branch : Vector<ProgramBlock>,
}
Execution semantics of a switch block are as follows:
- If the top of the stack is
1
, blocks in thetrue_branch
is executed one after the other; - If the top of the stack is
0
, blocks in thefalse_branch
is executed one after the other. - If the top of the stack is neither
0
nor1
, program execution fails.
Switch blocks impose the following restrictions on their content:
- The first block in the
true_branch
vector must be an instruction block which hasASSERT
as its first operation. This guarantees that this branch can be executed only if the top of the stack is1
. - The first block in the
false_branch
vector must be an instruction block which hasNOT ASSERT
as its first two instructions. This guarantees that this branch can be executed only if the top of the stack is0
. - Within
true_branch
andfalse_branch
vectors, an instruction block cannot be followed by another instructions block.
A loop block is used to describe a sequence of instructions which is to be repeated zero or more times based on some condition (i.e. while statement). Structure of a loop block looks like so:
Loop {
body : Vector<ProgramBlock>,
skip : InstructionBlock,
}
where, skip
is an instruction block containing the following sequence of instructions: NOT ASSERT
followed by 14 NOOP
's.
Execution semantics of a loop block are as follows:
- If the top of the stack is
1
, blocks in thebody
vector are executed one after the other.- If after executing all blocks in the
body
vector, the top of the stack is1
, thebody
blocks are executed again. This process is repeated until the top of the stack is0
. - Once the top of the stack becomes
0
,skip
block is executed.
- If after executing all blocks in the
- If the top of the stack is
0
,skip
block is executed. - If the top of the stack is neither
0
nor1
, program execution fails.
Loop blocks impose the following restrictions on their content:
- The first block in the
body
vector must be an instruction block which hasASSERT
as its first operation. This guarantees that a loop iteration can be entered only if the top of the stack is1
. - Within the
body
vector, an instruction block cannot be followed by another instructions block.
It is expected that at the end of executing all body
blocks, the top of the stack will contain a binary value (i.e. 1
or 0
). However, this is not enforced at program construction time, and if the top of the stack is not binary, the program will fail at execution time.
The simplest program is a linear sequence of instructions with no branches or loops:
a0, a1, ..., a_i
where, a0, a1 etc. are instructions executed one after the other. Such a program can be described by a single group block like so:
To briefly explain the diagram:- The outer rectangle with rounded corners represents a control block. In this case, it is a group block B0.
- This group block contains a single instruction block, which is represented by the inner rectangle with square corners.
Let's add some conditional logic to our program. The program below does the following:
- First, instructions a0 . . . ai are executed.
- Then, if the top of the stack is
1
, instructions b0 . . . bj are executed. But if the top of the stack is0
, instructions c0 . . . ck are executed. - Finally, instructions d0 . . . dn are executed.
a0, a1, ..., a_i
if.true
b0, b1, ..., b_j
else
c0, c1, ..., c_k
end
d0, d1, ..., d_n
A diagram for this program could look like so:
Here we have block B0 which groups 3 other blocks together. The first one is an instruction block, the second one is a switch block describing the if/else statement, and the last one is another group block which contains a single instruction block with instructions d0 . . . dn.
Let's add nested control logic to our program. The program below is the same as the program from the previous example, except the else clause of the if/else statement now also contains a loop. This loop will keep executing instructions d0 . . . dn as long as, right after dn is executed, the top of the stack is 1
. Once, the top of the stack becomes 0
, instructions e0 . . . em are executed, and then execution moves on to instructions f0 . . . fl.
a0, a1, ..., a_i
if.true
b0, b1, ..., b_j
else
c0, c1, ..., c_k
while.true
d0, d1, ..., d_n
end
e0, e1, ..., e_m
end
f0, f1, ..., f_l
A diagram for this program would look like so:
Here, we have 4 control blocks, where loop blocks B2 is nested within the else branch of block B1.
All Miden programs can be reduced to a 32-byte hash represented by a pair of elements in a 128-bit field. The hash is designed to target 128-bit preimage and second preimage resistance, and 64-bit collision resistance.
Program hash is computed from hashes of individual program blocks in a manner similar to computing roots of Merkle trees.
For example, let's say our program consists of 3 control blocks and looks like so:
Hash of this program is computed like so:
- First, we compute hash of instruction block a0 . . . ai.
- Then, we compute hashes of b0 . . . bj and c0 . . . ck, and combine them into the hash of block B1.
- Then, we merge hash of block B1 with hash of a0 . . . ai, and call the resulting value h1.
- Then, we compute hash of d0 . . . dn and transform it into hash of block B2.
- Then, we merge hash of block B2 with h1 to get the hash of block B0.
- Finally, we merge hash of block B0 with value
0
to obtain the hash of the entire program hp.
Graphically, this process looks like so:
As mentioned above, hashes of Miden programs are computed in a manner similar to roots of Merkle trees. This is by design. Using this property of program hashes, we can selectively reveal any of the program blocks while keeping the rest of the program private (e.g. secret programs with public pre/post conditions).
For example, if we wanted to reveal instruction sequence d0 . . . dn, we could do so by making intermediate hash h1 public. Then, anyone would be able to reconstruct the root hash from these two pieces of data and be sure that d0 . . . dn is, in fact, the last sequence of instructions executed in the program with hash hp.
To compute program hash, we employ several rules and procedures:
- Rules for constructing hashes of control blocks from hashes of their contents.
- Procedure hash_ops for computing a hash of a sequence of instructions.
- Procedure hash_acc for merging hashes of control blocks with 128-bit values.
- Procedure hash_seq for hashing of sequences of program blocks into 128-bit values.
All of these are described below.
Hashes of control blocks are defined as tuples of two 128-bit elements (v0, v1). These tuples are constructed as follows:
For group blocks, it is just a hash of body
blocks contained within the group:
- v0 = hash_seq(blocks)
- v1 = 0
For switch blocks, it is hashes of both branches:
- v0 = hash_seq(true_branch)
- v1 = hash_seq(false_branch)
For loop blocks, it is a hash of the loop's body
and the skip
block:
- v0 = hash_seq(body, skip)
- v1 = hash_seq(skip)
For loop blocks we also define a value called loop image which is equal to hash_seq(body). This value binds each iteration of the loop to a specific hash (see here).
The purpose of hash_ops procedure is to hash a sequence of instructions. The procedure takes a state of four 128-bit elements as an input, merges each instruction into the state, and returns the updated state as the output.
The pseudo-code for this procedure looks like so:
with inputs: state[], instructions[]:
for each instruction in instructions do:
state = add_round_constants(state);
state = apply_sbox(state);
state = apply_mds(state);
state[0] = state[0] + instruction.op_code;
state[1] = state[1] + instruction.op_value;
state = add_round_constants(state);
state = apply_inverse_sbox(state);
state = apply_mds(state);
return state;
where:
state
is an array of four 128-bit elements.instructions
is a vector of instruction tuples.
The body of the above loop is similar to the round of Rescue hash function. But there are significant differences. Specifically, injecting values in the middle of every round invalidates security proof of sponge construction. Thus, it is possible that this approach is insecure, and will need to be modified in the future.
The purpose of hash_acc procedure is to merge hash of a control block, which is a tuple of two 128-bit values, with a single 128-bit value. The output of the hash_acc procedure an array of four 128-bit elements.
Denoting (v0, v1) to be the hash of the control block, and h to be the 128-bit value, high-level pseudo-code for hash_acc procedure looks like so:
with inputs: h, v0, v1:
let state = [h, v0, v1, 0];
for 14 rounds do:
state = add_round_constants(state);
state = apply_sbox(state);
state = apply_mds(state);
state = add_round_constants(state);
state = apply_inverse_sbox(state);
state = apply_mds(state);
return state;
The above is a modified version of Rescue hash function. This modification adds half-rounds to the beginning and to the end of the standard Rescue hash function to make the arithmetization of the function fully foldable. This should not impact security properties of the function, but it is worth noting that it has not been studied to the same extent as the standard Rescue hash function.
The purpose of hash_seq procedure is to hash a sequence of program blocks into a single 128-bit value. The procedure works as follows:
- First, we initialize a
state
of four 128-bit elements to0
's. - Then, we consume blocks from the sequence one by one, hash them, and merge their hashes into the
state
. Hashing and merging is performed as follows: - Finally, we return the first element of the state as the hash value of the sequence.
High-level pseudo-code for the hash_seq procedure is as follows:
with input blocks[]:
let state = [0, 0, 0, 0];
for each block in blocks
if block is instruction block
state = hash_ops(state, block.instructions);
else
if block is group block
v0 = hash_seq(block.blocks);
v1 = 0;
else if block is switch block
v0 = hash_seq(block.true_branch);
v1 = hash_seq(block.false_branch);
else if
v0 = hash_seq(block.body, block.skip);
v1 = hash_seq(block.skip);
end if
state = hash_acc(state[0], v0, v1);
end if
end loop
return state[0];
For example, let's say we want to hash a sequence of blocks [b0, b1, b2]
, where b0
and b2
are instruction blocks, and b1
is a control block. Hashing of this sequence will work as follows:
- First we initialize the state as
state = [0, 0, 0, 0]
. - Then, we merge
b0
into the state asstate = hash_ops(state, b0)
; - Then, we compute the hash of
b1
as(v0, v1) = hash(b1)
, and merge it into the state asstate = hash_acc(state[0], v0, v1)
. - Then, we merge
b0
into the state asstate = hash_ops(state, b2)
. - Finally, we return
state[0]
as the hash of the sequence.
Miden VM computes program hash as the program is executed in the VM. Hash computations are structured so that even if a single instruction is added, removed, or replaced with a different instruction, the computed hash will not match the original hash of the program.
There are several components in the VM which facilitate hash computations:
- sponge state which holds running hash of the currently executing program block; sponge state takes up 4 registers.
- context stack which holds hashes of parent blocks to the currently executing control block; context stack takes up between 1 and 16 registers (depending on the level of nesting in the program).
General intuition for the hashing process is as follows:
- At the start of every control block, we push current hash of its parent block onto the
context stack
, and set hash of the new block to0
(this is done be resettingsponge state
). - Then, as we read instructions contained in the block, we merge each instruction into the block's hash.
- If we encounter a new block, we process it recursively starting at step 1.
- Once the end of the block is reached, we pop hash of the parent block from the
context stack
and merge our block hash into it.
Each of these steps is explained in detail below.
A new control block is started by the BEGIN
operation. The operation does the following:
- Pushes hash of the current block onto the
context stack
; - Sets all registers of
sponge state
to0
.
A diagram of BEGIN
operation is shown below. By convention, result of hashing is in the first register of sponge state
. So, s0
is the hash of the current block right before BEGIN
operation is executed.
╒═══ sponge ═══╕ ╒══ context ═══╕
[s0, s1, s2, s3], [ ]
🡣
[ 0, 0, 0, 0], [s0 ]
As instructions are executed in the VM, each instruction is merged into the sponge state
using a modified version of Rescue round as described in the hash_ops procedure above.
If we encounter a control block, we start a new block with the BEGIN
operation as described above.
A block can be terminate by one of two operations: TEND
or FEND
. These operations behave similarly. Specifically:
- Both operations carry a value which they move into the
sponge state
. - Both operations pop hash of the parent block from the
context stack
and move it into thesponge state
. - Both operations propagate hash of the current block into the
sponge state
at the next step.
However, these operations arrange sponge state
slightly differently, and are intended to terminate different branches of execution.
For example, recall that hash of a switch block is defined as tuple (v0, v1), where:
- v0 = hash(true_branch)
- v1 = hash(false_branch)
When the VM executes true_branch of a switch block, the block must be terminated with TEND(v1)
operation. A diagram of this operation looks like so:
╒═══ sponge ═══╕ ╒══ context ═══╕
[s0, s1, s2, s3], [c0 ]
🡣
[c0, s0, v1, 0], [ ]
Note that by the time TEND
instruction is reached, the first register of the sponge will contain hash of the true_branch. Thus, s0 = v0
, and the result of executing TEND(v1)
will be sponge state set to [c0, v0, v1, 0]
, where c0
is the hash of the parent block.
On the other hand, when the VM executes false_branch of a switch block, the block must terminate with FEND(v0)
operation. A diagram of this operation looks like so:
╒═══ sponge ═══╕ ╒══ context ═══╕
[s0, s1, s2, s3], [c0 ]
🡣
[c0, v0, s0, 0], [ ]
Note again that by the time FEND
instruction is reached, the first register of the sponge will contain hash of the false_branch. Thus, s0 = v1
, and the result of executing FEND(v0)
will be sponge state set to [c0, v0, v1, 0]
.
The crucially important thing here is that by the time we exit the block, sponge state
is the same, regardless of which branch was taken.
Similar methodology applies to other blocks as well:
- To exit a group block, we execute
TEND(0)
operation. - To exit a loop block, we execute
TEND(v1)
if the body of the loop was executed at least once, andFEND(v0)
, if the body of the loop was never entered (see next section for more details).
After sponge state
has been arranged as described above, HACC
operation is executed 14 times to perform hash_acc procedure. After this, program execution can resume with the following instruction or the next code block in the sequence.
TEND
and FEND
operations can be executed only on steps which are multiples of 16 (e.g. 16, 32, 48 etc.), while BEGIN
instruction can be executed only on steps which are one less than a multiple of 16 (e.g. 15, 31, 47 etc.). Trying to execute them on other steps will result in program failure.
Since TEND
/FEND
instruction is followed by 14 HACC
instruction, we have a cycle of 16 instructions with the last slot empty. This is convenient because we can fill it with a BEGIN
instruction in cases when one control block is immediately followed by another. Specifically, the inter-block sequence of instructions could look like so:
... TEND(v1) HACC HACC HACC HACC HACC HACC HACC
HACC HACC HACC HACC HACC HACC HACC BEGIN ...
In cases when a control block is followed by an instruction block, the last operation in the inter-block sequence is set to NOOP
.
Ability to execute unbounded loops requires additional structures. Specifically, we need a loop stack
to holds images of loop bodies for currently active loops. Loop stack takes up between 0 and 8 registers to support nested loops up to 8 levels deep.
Loop execution works as follows:
First, we check if the top of the stack is 1
or 0
. If it is 0
, we don't need to enter the loop, and instead we execute the following sequence of operations (padding with NOOP
s is skipped for brevity):
BEGIN NOT ASSERT FEND(v0)
Recall that hash of a loop block is defined as a tuple (v0, v1), where:
- v0 = hash(body, skip)
- v1 = hash(skip)
Where skip is a sequence of instructions starting with NOT ASSERT
and followed with 14 NOOP
's.
So, executing FEND(v0)
operation puts sponge state
to the following: [c0, v0, v1, 0]
, where c0
is the hash of the parent block. Then, we executed 14 HACC
operations and we are done.
If, however, the top of the stack is 1
, we do need to enter the loop. We do this by executing a LOOP
operation. This operation is similar to the BEGIN
operation but it also carries a value. This value is set to the loop's image (hash of loop body).
Executing LOOP(i0)
operation (where i0
is the loop's image) does the following:
- Pushes the operation value
i0
onto theloop stack
; - Pushes hash of the current block onto the
context stack
; - Sets all registers of
sponge state
to0
.
A diagram of LOOP(i0)
operation is shown below:
╒═══ sponge ═══╕ ╒══ context ═══╕ ╒═ loop stack ═╕
[s0, s1, s2, s3], [ ] [ ]
🡣
[ 0, 0, 0, 0], [s0 ] [i0 ]
After the LOOP
operation, the body of the loop is executed similar to any other block.
If after executing the loop's body, the top of the stack is 1
, we execute a WRAP
operation. WRAP
operation does the following:
- Checks whether the first register of the
sponge
is equal to the value at the top of theloop stack
(i.e.s0 = i0
). If it is not, the program fails. - Check whether the value at the top of the user stack is
1
. If it is not, the program fails. - Resets
sponge state
A diagram of WRAP
operation is as follows:
╒═══ sponge ═══╕ ╒══ context ═══╕ ╒═ loop stack ═╕
[s0, s1, s2, s3], [c0 ] [i0 ]
🡣
[ 0, 0, 0, 0], [c0 ] [i0 ]
The above ensures that the sequence of instructions executed in the last iteration of the loop was indeed the loop's body, and prepares the sponge for the next iteration of the loop. At this point, loop body can be executed again.
If after executing the loop's body, the top of the stack is 0, we execute BREAK operation. BREAK
operation does the following:
- Pops the value from the loop stack, and checks whether it is equal to the first register of the sponge state (i.e.
s0 = i0
). If it is not, the program fails. - Checks whether the value at the top of the user stack is
0
. If it is not, the program fails.
A diagram of BREAK
operation, when top of user stack is 0
, is as follows:
╒═══ sponge ═══╕ ╒══ context ═══╕ ╒═ loop stack ═╕
[s0, s1, s2, s3], [c0 ] [i0 ]
🡣
[s0, s1, s2, s3], [c0 ] [ ]
The above ensures that the sequence of instructions executed in the last iteration of the loop was indeed the loop's body, and clears top values from the loop stack
to indicate that the loop is complete. The state of the sponge is preserved.
After the BREAK
operation, we execute instructions of the skip
block, and then execute TEND(v1)
operation. This sets the sponge state
to [c0, v0, v1, 0]
. We, then, execute 14 HACC
operations.
Again, it is important to note that regardless of whether we enter the loop or not, sponge state
ends up set to [c0, v0, v1, 0]
before we start executing HACC
operations.
To illustrate execution of a loop on a concrete example, let's say we have the following loop:
while.true
a0, a1, ... a14
end
Hash of this loop will be defined as:
- v0 = hash(a0 . . . a14, skip)
- v1 = hash(skip)
Also, remember that body of the loop must start with ASSERT
(i.e. a0 = ASSERT
), and skip block must start with NOT ASSERT
followed by 14 NOOP
's.
Let's also say that for a given input, the top of the stack is 1
, and it changes to 0
after we execute a0, a1, . . ., a14 instructions twice. Then, the sequence of instructions executed on the VM to complete this loop will be:
LOOP(v0)
ASSERT a1 a2 a3 a4 a5 a6 a7
a8 a9 a10 a11 a12 a13 a14 WRAP
ASSERT a1 a2 a3 a4 a5 a6 a7
a8 a9 a10 a11 a12 a13 a14 BREAK
NOT ASSERT NOOP NOOP NOOP NOOP NOOP NOOP
NOOP NOOP NOOP NOOP NOOP NOOP NOOP NOOP
TEND(v1)
If, however, the top of the stack is 0
before we enter the loop, the sequence of instructions will be:
BEGIN
NOT ASSERT NOOP NOOP NOOP NOOP NOOP NOOP
NOOP NOOP NOOP NOOP NOOP NOOP NOOP NOOP
FEND(v0)