Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Quorum Key Resharding Service #446

Open
wants to merge 31 commits into
base: main
Choose a base branch
from
Open

Add Quorum Key Resharding Service #446

wants to merge 31 commits into from

Conversation

emostov
Copy link
Contributor

@emostov emostov commented Apr 18, 2024

TODO:

Summary & Motivation (Problem vs. Solution)

QOS users need a way to reshard quorum keys.

This PR solves the issue by adding a new boot mode specifically for resharding keys. The process is similar to running a boot standard, in that members from the share set need to post shares to reconstruct the quorum keys. Once quorum keys are reconstructed, they are resharded to the new share set. New share set members can then pull their shares and verify that they can decrypt them.

Prior to reviewing the code in detail, take a look at src/qos_client/RESHARD_GUIDE.md, which describes how to execute quorum key resharding

How I Tested These Changes

  • E2E test for the positive case
  • Unit tests on the new service logic

related: https://github.com/tkhq/gitops/pull/1133

@emostov emostov mentioned this pull request Apr 18, 2024
4 tasks
@emostov emostov changed the base branch from main to lance/stagex-refactor April 18, 2024 01:55
Copy link
Contributor

@oliviathet oliviathet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

phew this was a huge chunk of work - nice job dude! all my comments are minor/nits.

src/integration/tests/reshard.rs Show resolved Hide resolved
src/integration/tests/reshard.rs Show resolved Hide resolved
src/qos_crypto/src/n_choose_k.rs Outdated Show resolved Hide resolved
src/qos_crypto/src/n_choose_k.rs Show resolved Hide resolved
src/qos_core/src/protocol/error.rs Outdated Show resolved Hide resolved
src/qos_client/src/cli/services.rs Outdated Show resolved Hide resolved
src/qos_core/src/protocol/msg.rs Show resolved Hide resolved
src/qos_core/src/protocol/msg.rs Outdated Show resolved Hide resolved
src/qos_core/src/protocol/msg.rs Outdated Show resolved Hide resolved
src/integration/tests/reshard.rs Show resolved Hide resolved
@emostov emostov requested a review from oliviathet April 23, 2024 18:46
src/qos_client/RESHARD_GUIDE.md Outdated Show resolved Hide resolved
1) Generate the configuration for how to reshard the given quorum keys. This configuration is called the `ReshardInput`.
2) Boot the enclave in reshard mode using the `ReshardInput`.
3) A threshold of the _old_ share holders query the enclave for an attestation document.
4) A threshold of the _old_ share holders re-encrypt their shares to the enclaves ephemeral key and post those shares in a single message. The data structure used to group a users shares and their corresponding quorum keys is called the `ReshardProvisionInput`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"to group a user's shares and their corresponding quorum keys" (N encrypted user shares + N quorum keys in each ReshardProvisionInput right?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If N = quorum key count, then a single share holder will submit N shares (one for each quorum key)

src/qos_client/RESHARD_GUIDE.md Outdated Show resolved Hide resolved
Comment on lines +68 to +78
qos_client reshard-re-encrypt-share \
--yubikey \
--quorum-share-dir-multiple <read: path to directory to specify> \
--attestation-doc-path <read: path to attestation doc> \
--provision-input-path <write: path to the file to write this users provision input> \
--reshard-input-path <read: path to reshard input> \
--qos-release-dir <read: path to a dir with pcrs file and release.env> \
--pcr3-preimage-path <read: path to the IAM role for the enclave> \
--new-share-set-dir <read: path to dir new share set> \
--old-share-set-dir <read: path to dir with old_share_set> \
--alias <alias for share holder>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand why this part has to be different from a normal share posting step where quorum key shares are posted to the enclave 🤔

I guess the nature of resharding is compatible, but because we've chosen to implement "bulk" resharding, it's not compatible with the existing interface. That's a tradeoff: we could reuse the normal share posting logic at the expense of doing resharding 4 times (once per quorum key)

What I'm thinking is we could have most of the resharding logic out of QOS and inside of a normal enclave app (reshard):

  • boot the reshard app with an existing quorum key to reshard + pivot args which specify the new share set
  • the app's only job (once qos pivots) is to reshard the quorum key according to the pivot args
  • there's then an interface for new share set members to download attestations and new encrypted shares.

Rince and repeat for each quorum key to reshard (you'd need to boot 4 reshard enclaves with 4 different pivot args to reshard the 4 quorum keys we currently have)

Obviously that's a big departure from what you have here so I don't think it's worth implementing; just thinking out loud.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We talked about building an app, and in fact we could design such an app that you bulk post shares. The real determining factor was just that it was a bit less code to put this directly in qos (didn't have to make a seperate host and then also a client for talking to that host). Others justifications is that

  • this is super sensitive, and we want the enclave to shut down if anything goes wrong during the flow
  • having the configuration as pivot args is hard to read and kind of non-ergonomic
  • we expect anyone using qos will want this base operation, so worth batching it in
  • the sharding logic is native to qos, so its clean to do the resharding in the same place

At the end of the day just tradeoffs - don't feel strongly that there is any right or wrong decisions

src/qos_client/RESHARD_GUIDE.md Outdated Show resolved Hide resolved
src/qos_crypto/src/n_choose_k.rs Show resolved Hide resolved
Comment on lines +47 to +53
/// Waiting for quorum key shards to be posted so the reshard service can
/// be executed.
ReshardWaitingForQuorumShards,
/// Reshard service has completed
ReshardBooted,
/// Reshard failed to reconstruct the quorum key
UnrecoverableReshardFailedBadShares,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems a bit odd to make these new phases part of the "protocol". They represent an entirely separate state machine IMO: we don't expect a resharding enclave to reach "QuorumKeyProvisioned" for example.

(at the same time, I understand that it's easiest to put this here instead of having a separate ReshardingProtocolPhase enum)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to keep as one enum to cohesively show every possible state, even if the state transitions map is more like a DAG

I wanted the states downstream of a boot reshard to be distinct, such that the state machine would only allow reshard functionality in order limit surface area

pub struct ReshardInput {
/// List of quorum public keys
pub quorum_keys: Vec<Vec<u8>>,
/// The set and threshold to shard the key.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we have multiple quorum keys: "to shard the quorum keys"

src/qos_core/src/protocol/services/reshard.rs Outdated Show resolved Hide resolved
src/qos_core/src/protocol/services/reshard.rs Outdated Show resolved Hide resolved
Base automatically changed from lance/stagex-refactor to main June 7, 2024 17:44
emostov added 21 commits June 18, 2024 12:43
wip

wip

get to compile

Initial reshard provision and unit test

code clean up

fix n choose k

refactor n choose k

Finish tests for reshard

add test for boot reshard

Add generate reshard input

wip

wip

wip

wip

Get reshard-renencrypt working

Get post share working

lint

wip

Build get reshard output

Add new secrets for file key

get full thing working e2e

refactor wip

get qos core to compile

Get all of qos core working

refactor to not use quorumpubkey wrapper

wip

Get e2e tests working with new input

Add logic for checking that e2e share recombination works

finish integration test

lint stuff

Improve human verifiactions

clean up
@emostov emostov requested a review from a team as a code owner June 18, 2024 16:44

### Added

- BREAKING CHANGE: qos_core: quorum key resharding service, new state machine transitions, and new `ProtocolMsg` variants (#428)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wanted to highlight this breaking change - @r-n-o lmk if you want to merge your qos_net pr first so you can update qos in mono without worrying about breaking changes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me! If you don't mind having my PR go first I'll take you on this offer!

@cr-tk
Copy link
Collaborator

cr-tk commented Nov 26, 2024

I think this PR needs a rebase on top of the current main, particularly due to the cleanup PRs and switch to vsss-rs that is summarized here: #457 (comment) .

@emostov @r-n-o has any of our base goals with this PR changed since this was last worked on in June?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants