Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use vsss-rs for share generation and reconstruction #457

Closed
wants to merge 62 commits into from

Conversation

emostov
Copy link
Contributor

@emostov emostov commented Jun 18, 2024

Summary & Motivation (Problem vs. Solution)

Closes ENG-1687

Note, there is some extra diff in here from running a newer version of clippy. I can work on reverting if this causes issues

How I Tested These Changes

Pre merge check list

  • Update CHANGELOG.MD

emostov added 30 commits June 18, 2024 12:43
wip

wip

get to compile

Initial reshard provision and unit test

code clean up

fix n choose k

refactor n choose k

Finish tests for reshard

add test for boot reshard

Add generate reshard input

wip

wip

wip

wip

Get reshard-renencrypt working

Get post share working

lint

wip

Build get reshard output

Add new secrets for file key

get full thing working e2e

refactor wip

get qos core to compile

Get all of qos core working

refactor to not use quorumpubkey wrapper

wip

Get e2e tests working with new input

Add logic for checking that e2e share recombination works

finish integration test

lint stuff

Improve human verifiactions

clean up
@lthibault
Copy link
Contributor

lthibault commented Sep 13, 2024

Upgraded to vsss-rs v4.3.8 64ce106

@lthibault lthibault self-assigned this Sep 16, 2024
@lthibault
Copy link
Contributor

Per discussion in Google Meet, I am removing the legacy (i.e. unfairly-named "insecure") code. This removes backwards compatibility for 1-of-1 shards, and will require us to reshard dev/preprod keys. Prod keys have already been resharded.

@lthibault lthibault force-pushed the zeke-reshard-6-vsss-rs branch from 50ca13f to 605447b Compare September 17, 2024 20:48
@cr-tk
Copy link
Collaborator

cr-tk commented Sep 24, 2024

@lthibault some comments and questions.

The removal of the legacy code paths has the effect that this PR cannot land in actively-used contexts of dev/preprod until the resharding there has been done, right?

The reason we've labeled the legacy code functions with insecure in their name is that they have several implementation problems, including likely security side channel problems during generation, which can be a problem depending on their usage. Given that we now have a better alternative that avoids those problems, we wanted to make it clear to developers that the use of the old functions is discouraged. More context: ENG-1720, ENG-1384.

In c264d9c , you've changed the unit tests by directly replacing the use of the legacy calls with the modern alternative. Unfortunately, that doesn't really make sense here. If you look at the unit test structure, the modern functions are already called directly one line above - the old_* variants which called the legacy functions were included in combination with assert_eq!() to ensure the results between old and new are identical and therefore consistent.
Instead, I suggest removing the variable definitions and asserts with the old_* lines since they're redundant now.

@lthibault
Copy link
Contributor

lthibault commented Sep 24, 2024

The removal of the legacy code paths has the effect that this PR cannot land in actively-used contexts of dev/preprod until the resharding there has been done, right?

Correct. I currently have a resharded dev key (2-of-2), which is now supported as of d87496a.
Resharding remains to be performed in preprod and prod environments.

[...] Unfortunately, that doesn't really make sense here.

Oh nice catch. I must have been on autopilot.

[...] Instead, I suggest removing the variable definitions and asserts with the old_* lines since they're redundant now.

Done: d6db3e7

If there's nothing else (and CI passes), I intend to proceed with resharding preprod.

@cr-tk
Copy link
Collaborator

cr-tk commented Oct 30, 2024

@lthibault what's the status on this PR?

From what I can see, https://github.com/tkhq/mono/pull/3323 exists as a springboard to do testing on preprod, but isn't actually meant to be merged - instead, this PR (#457) would need to land in QOS main, and then mono could bump the QOS submodule.

https://github.com/tkhq/namespaces/pull/65 paves the road for mono to actually work with this new PR state that no longer handles 1-of-1 SSS situations, and that's out of the way now.

@Ulexus 04b0c11 was a necessary Cargo configuration fix to make this work, right? What about 5ba801e , can/should this stay in as-is, given the DANGER commit warning?

https://github.com/tkhq/mono/pull/3397 seems to be related since it aims to un-break the qos handling, but there's probably a followup there that needs some attention.
Given what I've found on background information, this flag directly propagates to the aws cli crate, where this code is responsible for it.

/// Connects to the enclave console and prints it continuously.

Can you outline where the danger in this comes from? The host learning sensitive information via printed output on the console that is normally inaccessible to the host, for example, if the debug_mode is left on accidentally or maliciously in production?

PR 3397 in mono seems to related to un-break QOS via a stagex/Linux kernel relevant revert, but there this followup probably still needs some attention.

Copy link
Contributor

@r-n-o r-n-o left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Posting my partial review from months ago! I'll be taking over this PR; my strategy is going to be: let's break it up into multiple chunks. Some updates made here are very minor cleanup-style items (README updates, stale files); others are very critical (actual vsss-rs libarary).

Comment on lines +24 to +25
- BREAKING CHANGE: qos_core: quorum key resharding service, new state machine transitions, and new `ProtocolMsg` variants (#428)
- qos_client: commands to run quorum key resharding and high level documentation (#428)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

428 is a PR that seems closed. Why this mentioned in the context of this PR which is about VSSS?

@@ -50,4 +50,4 @@ COPY --from=file . /
COPY --from=gcc . /
COPY --from=linux-nitro /bzImage .
COPY --from=linux-nitro /nsm.ko .
COPY --from=linux-nitro /linux.config .
COPY --from=linux-nitro /linux.config .
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that's a good change? Typically we want a newline at the end of a file, that's how POSIX works: https://stackoverflow.com/questions/729692/why-should-text-files-end-with-a-newline

@@ -0,0 +1 @@
3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

counter-example here I suppose: this file + the pub keys are fine as-is without a newline because this is more like a binary blob and less like a text file? Depends on what loads this file exactly.

(but if it's all the same, I'd rather end with a newline character)

@@ -0,0 +1,338 @@
use std::{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we picking this up here? This test is related to a new reshard functionality which isn't related to VSSS afaict?

@r-n-o
Copy link
Contributor

r-n-o commented Nov 11, 2024

This change is now split into the following PRs:

@cr-tk
Copy link
Collaborator

cr-tk commented Nov 14, 2024

@r-n-o good work on splitting up this PR. Can we close #457 now that the others have replaced it, or do you want to leave it open for now for visibility / discussion followup?

@r-n-o
Copy link
Contributor

r-n-o commented Nov 14, 2024

@cr-tk thank you! I think we can close this PR for clarity. I have no issue with further discussion if needed but given the intent isn't to merge this PR, better to be explicit and close it. Thanks for flagging!

@r-n-o r-n-o closed this Nov 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants