Skip to content

Commit

Permalink
feat(sib-isolation): add Parallel Snapshot Isolation (#6)
Browse files Browse the repository at this point in the history
  • Loading branch information
s12f authored Dec 18, 2024
1 parent bb4c933 commit ba0e45c
Show file tree
Hide file tree
Showing 18 changed files with 86 additions and 48 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,4 @@
.vim
.vscode
*.out
states/
2 changes: 1 addition & 1 deletion isolation-models/ALL_TESTS.tla
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@

EXTENDS SIB_TESTS

======================================================
======================================================
78 changes: 57 additions & 21 deletions isolation-models/SIB_ISOLATION.tla
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,10 @@ SetToSeqs(S) ≜
LET D 1 Cardinality(S)
IN { f [D S]: i,j D: i j f[i] f[j] }

SeqToSet(S) { S[x] : x S}

Max(S) CHOOSE x S: y S: x y

----------------------------------------------------
\* States

Expand Down Expand Up @@ -53,7 +57,8 @@ ComputeStates(txs, states, idx) ≜
from. These two conditions suffice to guarantee that T will observe the effects
of all transactions that committed before it.
*)
SER_CT(e, tx, states) k DOMAIN tx.read: tx.read[k] = states[Len(states)][k]
SER_CT(t, txs, states)
k DOMAIN txs[t].read: txs[t].read[k] = states[t][k]

(* Snapshot Isolation Commit Test:
Like serializability, SI prevents transaction T from seeing the effects of
Expand All @@ -64,52 +69,82 @@ SER_CT(e, tx, states) ≜ ∀k ∈ DOMAIN tx.read: tx.read[k] = states[Len(state
operations T will not observe, may commit in between s and sp. The commit
test only forbids T from modifying any of the keys that changed value as the
system’s state progressed from s to sp.
*)
SI_CT(e, tx, states)
idx DOMAIN states:
k DOMAIN tx.read: tx.read[k] = states[idx][k]
*)
SI_CT(t, txs, states)
\* x: previous state index
x 1 t:
k DOMAIN txs[t].read: txs[t].read[k] = states[x][k]
\* NO-CONF
y idx + 1 Len(states) - 1:
wk DOMAIN tx.write: states[idx][wk] = states[y][wk]
y x + 1 t:
wk DOMAIN txs[t].write: states[x][wk] = states[y][wk]

(* Read Committed Isolation Commit Test:
Read committed allows T to see the effects of concurrent transactions, as long
as they are committed. The commit test therefore no longer constrains all
operations in T to read from the same state; instead, it only requires each of
them to read from a state that precedes T in the execution e.
*)
RC_CT(e, tx, states)
k DOMAIN tx.read:
idx DOMAIN states:
tx.read[k] = states[idx][k]
*)
RC_CT(t, txs, states)
k DOMAIN txs[t].read:
x 1 t:
txs[t].read[k] = states[x][k]

(* Read Uncommitted Isolation Commit Test
(* Read Uncommitted Isolation Commit Test:
Read uncommitted allows T to see the effects of concurrent transactions,
whether they have committed or not. The commit test reflects this
permissiveness, to the point of allowing transactions to read arbitrary
values.
*)
RU_CT(e, tx, states) TRUE
*)
RU_CT(t, txs, states) TRUE

(* Strict Serializable Isolation Commit Test:
Strict serializability guarantees that the real-time order of transactions
will be reflected in the final history or execution.
*)
SSER_CT(t, txs, states)
LET TxLess(tx1, tx2) tx1.commitTs < tx2.startTs
IN SER_CT(t, txs, states)
t1 DOMAIN txs: TxLess(txs[t1], txs[t]) t1 < t

\* Timestamps
TxLess(tx1, tx2) tx1.commitTs < tx2.startTs
(* Parallel Snapshot Isolation Commit Test:
*)

SSER_CT(e, tx, states)
SER_CT(e, tx, states)
t1, t2 DOMAIN e.txs: TxLess(e.txs[t1], e.txs[t2]) t1 < t2
TxPrecede(tx1, tx2)
DOMAIN tx1.write (DOMAIN tx2.read DOMAIN tx2.write) {}
\* ts: transaction set
ComputeDirectPrecedeSet(txs, tx, pending) { t1 pending : TxPrecede(txs[t1], txs[tx]) }
\* compute a precede set of some transactions
RECURSIVE ComputePrecedeSet(_, _, _)
ComputePrecedeSet(txs, pending, result)
LET new UNION { ComputeDirectPrecedeSet(txs, tx, pending) : tx result } IN
IF new = {}
THEN result
ELSE ComputePrecedeSet(txs, pending \ new, result new )

PSI_CT(t, txs, states)
RC_CT(t, txs, states)
LET rs DOMAIN txs[t].read
ws DOMAIN txs[t].write
ps ComputePrecedeSet(txs, 1 t - 1, {t})
\* operation set
ops rs ws
slo(ok) Max({ x 1 t: ok rs txs[t].read[ok] = states[x][ok] })
IN t1 ps, ok ops:
ok DOMAIN txs[t1].write t1 + 1 slo(ok)

----------------------------------------------------
\* Isolation Executions

IsolationExecution(e, CommitTest(_, _, _))
LET txs e.txs
states ComputeStates(txs, e.init, 1)
IN idx DOMAIN txs: CommitTest(e, txs[idx], SubSeq(states, 1, idx))
IN idx DOMAIN txs: CommitTest(idx, txs, states)

SerializableExecution(e) IsolationExecution(e, SER_CT)
SnapshotExecution(e) IsolationExecution(e, SI_CT)
ReadCommittedExecution(e) IsolationExecution(e, RC_CT)
ReadUncommittedExecution(e) IsolationExecution(e, RU_CT)
ParallelSnapshotExecution(e) IsolationExecution(e, PSI_CT)
StrictSerializableExecution(e) IsolationExecution(e, SSER_CT)

----------------------------------------------------
Expand All @@ -123,6 +158,7 @@ SerializableIsolation(init, transactions) ≜ Isolation(init, transactions, Seri
SnapshotIsolation(init, transactions) Isolation(init, transactions, SnapshotExecution)
ReadCommittedIsolation(init, transactions) Isolation(init, transactions, ReadCommittedExecution)
ReadUncommittedIsolation(init, transactions) Isolation(init, transactions, ReadUncommittedExecution)
ParallelSnapshotIsolation(init, transactions) Isolation(init, transactions, ParallelSnapshotExecution)
StrictSerializableIsolation(init, transactions) Isolation(init, transactions, StrictSerializableExecution)

================================================
1 change: 1 addition & 0 deletions isolation-models/SIB_TESTS.tla
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ SI ≜ INSTANCE SIB_ISOLATION

ASSUME SI!SnapshotIsolation(init, txs)
ASSUME ¬ SI!SerializableIsolation(init, txs)
ASSUME ¬ SI!ParallelSnapshotIsolation(init, txs)

\* StrictSerializableExecution Tests1
\* failed_sser_init == [ k1 |-> 0, k2 |-> 0 ]
Expand Down
4 changes: 2 additions & 2 deletions multi-paxos-eb0/MultiPaxos.tla
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ LastCommittedSeq ≜ Max({ c.seq: c ∈ committed })
LearnLastCommitted(p)
\* ∧ p.role = CANDIDATE
committed {}
LastCommittedSeq >= p.nextSeq
LastCommittedSeq p.nextSeq
proposerStates' = [ proposerStates EXCEPT ![p.id] =
[ id p.id, role CANDIDATE, lss p.lss, nextSeq LastCommittedSeq + 1 ] ]
\* ∧ proposerStates' = [ proposerStates EXCEPT ![p.id].role = CANDIDATE ]
Expand Down Expand Up @@ -188,4 +188,4 @@ Spec ≜ Init ∧ □[Next]_vars
Inv TypeOK
c1, c2 committed:
c1.seq = c2.seq c1.val = c2.val
============================================================================
============================================================================
6 changes: 3 additions & 3 deletions multi-paxos-eb0/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,10 @@ you should read [paxos-eb0](../paxos-eb0) first.

## Motivation

multi-paxos-eb0 is a Multi-Paxos implement composed by paxos-eb0 instances,
multi-paxos-eb0 is a Multi-Paxos implementation composed by paxos-eb0 instances,
for every paxos-eb0 instance,
we always need an excluded proposer(leader) to commit a value in the fast path,
multi-paxos-eb0 is not a stable/strong leader-based implement,
multi-paxos-eb0 is not a stable/strong leader-based implementation,
the Leader role only means it can vote directly in the next sequence.

So a typical multi-paxos-eb0 history is:
Expand All @@ -28,7 +28,7 @@ Compare with [raft](https://raft.github.io/raft.pdf),
multi-paxos-eb0 has higher availability,
but without a steady/strong leader,
so you can't guarantee the linearizable read from a single leader node,
typical raft implement use lease read(read on leader is linearizable, but requires time synchronization).
typical raft implementation use lease read(read on leader is linearizable, but requires time synchronization).

So multi-paxos-eb0 is probably useful in distributed log/message storage systems(higher availability and read is typically sequential)
or other systems that require higher availability but not frequent linearizable read.
Expand Down
2 changes: 1 addition & 1 deletion paxos-eb0/PaxosEb0.tla
Original file line number Diff line number Diff line change
Expand Up @@ -108,4 +108,4 @@ Spec ≜ Init ∧ □[Next]_vars

Inv TypeOK
v1 committed, v2 committed: v1 = v2
============================================================================
============================================================================
4 changes: 2 additions & 2 deletions paxos-eb0/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ to commit a value:
* Phase2: recover the voted value or issue a new value if not found any voted value.

But in practice, two-round trip to commit a value is quite expensive,
for an example, [raft](https://raft.github.io/raft.pdf) or some other multi-paxos implements,
for an example, [raft](https://raft.github.io/raft.pdf) or some other multi-paxos implementations,
in normal case, committing a value only need one-round trip.

## Paxos with excluded ballot 0 optimization
Expand All @@ -37,5 +37,5 @@ is a solution for that).
So we can set the ballot 0 as the fast path(one-round trip),
other ballots as recover path(two-round trip).
Then based on this property,
you can compose the basic paxos instances to build you efficient Multi-Paxos implements,
you can compose the basic paxos instances to build you efficient Multi-Paxos implementations,
[multi-paxos-eb0](../multi-paxos-eb0) is an example.
2 changes: 1 addition & 1 deletion percolator/MC_SIB_SNAPSHOT.tla
Original file line number Diff line number Diff line change
Expand Up @@ -43,4 +43,4 @@ SnapshotInv ≜ AllTxsDone ⇒
\* SerializableIsolation will fail, Because exists write skew executions
\* ∧ SI!SerializableIsolation(InitState, MappedTxs)

============================================================================
============================================================================
2 changes: 1 addition & 1 deletion percolator/Percolator.tla
Original file line number Diff line number Diff line change
Expand Up @@ -215,4 +215,4 @@ TypeOK ≜

Inv TypeOK

============================================================================
============================================================================
2 changes: 1 addition & 1 deletion percolator/SnapshotRead.tla
Original file line number Diff line number Diff line change
Expand Up @@ -40,4 +40,4 @@ SnapshotReadInv ≜
LET r txs["t2"].read
IN 20 = r["k1"] + r["k2"]

============================================================================
============================================================================
2 changes: 1 addition & 1 deletion percolator/Transfer.tla
Original file line number Diff line number Diff line change
Expand Up @@ -37,4 +37,4 @@ AtomicRead ≜

TransferInv Inv AtomicRead

============================================================================
============================================================================
16 changes: 8 additions & 8 deletions readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,14 @@ Verify some old or new concepts/models/algorithms/ideas of distributed/discrete

## Projects

| Project | Description |
| --- | --- |
| [paxos-eb0](paxos-eb0/) | A basic Paxos with excluded ballot 0 optimization |
| [multi-paxos-eb0](multi-paxos-eb0/) | A Multi-Paxos implement composed by [paxos-eb0](paxos-eb0/) instances |
| [percolator](percolator/) | The Google's distributed transaction algorithm built on Bigtable |
| [sequencer](sequencer/) | A distributed sequence generator |
| [simplified-fast-paxos](simplified-fast-paxos/) | A simplified Fast Paxos implement |
| [isolation-models](isolation-models/) | Transaction Isolation Models |
| Project | Description |
| --- | --- |
| [paxos-eb0](paxos-eb0/) | A basic Paxos with excluded ballot 0 optimization |
| [multi-paxos-eb0](multi-paxos-eb0/) | A Multi-Paxos implementation composed by [paxos-eb0](paxos-eb0/) instances |
| [percolator](percolator/) | The Google's distributed transaction algorithm built on Bigtable |
| [sequencer](sequencer/) | A distributed sequence generator |
| [simplified-fast-paxos](simplified-fast-paxos/) | A simplified Fast Paxos implementation |
| [isolation-models](isolation-models/) | Transaction Isolation Models |

## TODO

Expand Down
2 changes: 1 addition & 1 deletion sequencer/MCSequencer.tla
Original file line number Diff line number Diff line change
Expand Up @@ -15,4 +15,4 @@ Constraint ≜
nextRid < 7
storage < 7

====
====
2 changes: 1 addition & 1 deletion sequencer/Sequencer.tla
Original file line number Diff line number Diff line change
Expand Up @@ -128,4 +128,4 @@ LinearizableHistory ≜

Inv LinearizableHistory

============================================================================
============================================================================
3 changes: 1 addition & 2 deletions sequencer/readme.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
# Sequencer

This is a distributed sequence generator implement
by the assumption that all servers have a bounded clock skew.
This is a distributed sequence generator on the assumption that all servers have a bounded clock skew.

Files:
+ [Sequencer.tla](./Sequencer.tla): the specification of the Sequencer
Expand Down
2 changes: 1 addition & 1 deletion simplified-fast-paxos/SimplifiedFastPaxos.tla
Original file line number Diff line number Diff line change
Expand Up @@ -123,4 +123,4 @@ Spec ≜ Init ∧ □[Next]_vars

Inv TypeOK
v1 committed, v2 committed: v1 = v2
============================================================================
============================================================================
2 changes: 1 addition & 1 deletion simplified-fast-paxos/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ You should read the [Fast Paxos](https://www.microsoft.com/en-us/research/wp-con
## Motivation

Fast Paxos(and [Generalized Paxos](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tr-2005-33.pdf))
is a very important paper(s) that inspires many future leaderless(multi-leader) paxos implements.
is a very important paper(s) that inspires many future leaderless(multi-leader) paxos implementations.

This project simplifies the origin Fast Paxos to make it more easily and practically to implement:

Expand Down

0 comments on commit ba0e45c

Please sign in to comment.