-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow multiple private keys but use the latest one #1748
base: main
Are you sure you want to change the base?
Conversation
…new private key(s) if new file(s) are detected
…ks that fix all problems happened in the unit tests
… behavior; linter problems fix
…ive private key var; Remove the redundant "isRegistered" logic
One other quick comment, now that I think we've settled on an answer -- wherever there's some JSON that gets passed back and forth via API calls, we should stick to camel case: For example, the new |
19ce7bb
to
85bc504
Compare
registry/client_commands_test.go
Outdated
func() { assert.NoError(t, egrp.Wait()) }() | ||
cancel() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is clearly the deadlock and looks like some bad copy/pasting without understanding what the original code does.
The original was:
defer func() { require.NoError(t, egrp.Wait()) }()
defer cancel()
Since defer
is executed LIFO, this will invoke cancel()
followed by the Wait()
(to wait on the canceled functions to clean up).
The revised code is effectively this:
defer func() {
func() { require.NoError(t, egrp.Wait()) }()
cancel()
}
Which waits on the group to stop, then shuts it down, a clear deadlock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing out and fixing this lingering problem! Having said that, this is just an intermediate problem in the tests - the root problem is topology mock server doesn't response in TestRegistryKeyChainingOSDF
and test timed out in TestRegistryKeyChaining
(see the error logs in the tests result). These two errors were suppressed if the other two tests in client_commands_test.go are commented out. Justin figured out there is a test state leak because commenting out the other tests allowed both TestRegistryKeyChaining and TestRegistryKeyChainingOSDF to pass. I'm still looking for the leak and hoping fixing the problems mentioned in your new PR comments could potentially fix the leak.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test result | Test result | |
---|---|---|
Comment Out ⬇️ | TestRegistryKeyChainingOSDF | TestRegistryKeyChaining |
TestServeNamespaceRegistry | ❌ | ❌ |
TestMultiPubKeysRegisteredOnNamespace | ❌ | ❌ |
Both test funcs above | ✔️ | ✔️ |
The current code had the logic reversed -- first it waited for completion, then it cancelled the running goroutines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's still a lot to do here. Some minor stylistic / convention comments in the review -- plus a few major requests:
- Redo the handshake between origin and registry to allow arbitrary replacement of the public keys.
- Move the key refresh logic into the config module and add unit tests. Make the updates atomic and have things in lexicographical order.
- Do not deprecate the existing issuer key (at least not in this version!) to allow external folks to adapt to the new setup. Do not delete a key that an admin has provided. Instead, behave as if the issuer key was found within the key directory.
@@ -39,7 +39,7 @@ func keygenMain(cmd *cobra.Command, args []string) error { | |||
return errors.Wrap(err, "failed to get the current working directory") | |||
} | |||
if privateKeyPath == "" { | |||
privateKeyPath = filepath.Join(wd, "issuer.jwk") | |||
privateKeyPath = filepath.Join(wd, "issuer-keys") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this work? Before, the variable was a file and now it's a directory. All the code below operates on this as if it was a directory.
On first glance, appears that the tool is broken, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, the private/public generation logic lies in config.GetIssuerPublicJWKS()
, which was already updated to incorporate the new keys in IssuerKeysDirectory instead of only one key file at IssuerKey. The pelican generate keygen
command works as expected (creating a key in ./issuer-keys).
config/init_server_creds.go
Outdated
} | ||
|
||
// Rename the existing private key file and set destination path | ||
fileName := fmt.Sprintf("migrated_%d_%s.pem", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the appropriate mkstemp to create a unique file. This is not sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I also use the mkstemp logic in another similar function GeneratePEM
config/init_server_creds.go
Outdated
newKey, err := loadIssuerPrivateJWK(issuerKeyFile) | ||
issuerKeysDir := param.IssuerKeysDirectory.GetString() | ||
currentIssuerKeysDir := getCurrentIssuerKeysDir() | ||
// Handles runtime changes to the issuer keys directory (configured via "IssuerKeysDirectory" parameter). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the tests should take a different approach: reset the config state between the calls. Otherwise, it clutters up the runtime code.
// Check the origin is authorized to update (possessing the public key used for prefix initial registration) | ||
// Parse all public keys of the sender into a JWKS | ||
var clientKeySet jwk.Set | ||
if data.AllPubkeys == nil { // backward compatibility - AllPubkeys only exists in the payload in Pelican 7.12 or later |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused by this.
data
is from the client -- why is there backward compatibility code in an API call introduced in 7.12?
continue | ||
} | ||
|
||
if registryDbKid == clientKid { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems unnecessary.
The client should prove it possess one of the existing keys.
However, this additionally forces the client to keep the key it used for the challenge in the updated key set. That seems unnecessary. We should simply see if the key used to sign is one of the existing known keys.
} | ||
|
||
// Check if any key in `clientKeySet` matches a key in `registryDbKeySet` | ||
registryDbKeysIter := registryDbKeySet.Keys(ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These loops could be replaced by the LookupKeyID
method.
|
||
// Update the public key of registered prefix(es) if the http request passed client and server verification for nonce. | ||
// It returns the response data, and an error if any | ||
func updateNsKeySignChallengeCommit(ctx *gin.Context, data *RegisteredPrefixUpdate) (map[string]interface{}, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the design here is still a bit confused. For example, it removes the previous key -- not clear that is matching the intent here.
Let me suggest something cleaner:
- At the start of handshake, the registry sends all known public keys for the namespace.
- The client receives the list of public keys and selects a corresponding private key from that list (if one is available) to sign the challenge.
- The client sends the server the updated set of public keys plus the proof of possession from one of the known keys.
Thus, at the end of the handshake:
- The client has demonstrated it owns a private key from the existing known public keys.
- The client's desired set of keys is synchronized with the registry; the client can remove or add any arbitrary keys in the call.
Oh! One final item from the review -- please rebase and squash down the commits to a clean list. Each commit should be self-contained (compiles, passes tests), covering a single piece of functionality (such as adding the key directory logic; or adding the API to update the keys), and have a detailed commit message. Please refer to https://osg-htc.org/technology/software/git-software-development/#making-good-pull-requests-the-art-of-good-commits for some best practices. Given the scope of the changes, I'd expect 4-5 commits to come out on this branch -- not 60. I typically like keeping the git history in response to reviewer comments to help the reader understand the evolution of ideas. However, at this point, the history of the PR is so messy that it'd be impossible to glean that knowledge -- more value is in a clean branch. |
… condition in updating previous active private key; a few comments
…te illusion for IssuerKey if it exists, minor test impr
Overview
In issue #561, we want to rotate out old private key when the new one is provided. This PR mainly changed the functions chain starting from GetIssuerPrivateJWK(), in order to support multiple private key files in a new directory specified by a new config param
IssuerKeysDirectory
, along with a new goroutine to monitor the changes there every 5 mins.New Key
In this PR, admin can create new private key through either of the following two ways:
IssuerKeysDirectory
points to, which by default is/etc/pelican/issuer-keys
), or modify any .pem there. Pelican program will pick up the last modified .pem file through a goroutine running every 5 mins, set the private key it contains as the active one, add it to an in-memory map, and update the public key in registry sqlite db.Security
How do we know the origin who updates the namespace's public key is the same origin registered this namespace? We authorize the origin on the registry side: require the public keys carried by the key update HTTP request (origin-->registry) must contain the public this origin previously registered, and check the proof of possession for both origin's current active private key and the previous private key, whose can prove the origin owns this namespace(prefix). The latter proof of possession verification uses a "previous private key's signature" sent by origin and verified by the public key stored in db on registry side.
Design thinking
Existing file hierarchy:
To ensure backward compatibility, the new directory that stores the private keys (saved as .pem files) is mounted to a new config param
IssuerKeysDirectory
. User doesn’t need to change the value ofIssuerKey
, though all functions are not longer using it. The private keyIssuerKey
refers to will be migrated to the new directory specified byIssuerKeysDirectory
. The new file hierarchy looks like this:Algorithm efficiency tradeoff
For a map using atomic.pointer, it requires to copy the entire map before update any key-value pair. In the
loadPEMFiles
func, because it runs every 5 minutes to update the in-memory key map from the file, which is not frequent, we think the memory use by a map using atomic.pointer is still acceptable.Test
No private key file
--> new key created: in-memory active key == public key of this origin's namespaces in registry db
One existing private key file, adding a new private key
a) an existing private key file at config param
IssuerKey
(usually.../issuer.jwk
), adding a new private key manuallyb) an existing private key file at config param
IssuerKey
(usually.../issuer.jwk
), adding a new private key via APIc) an existing private key file at config param
IssuerKeysDirectory
(usually.../issuer-keys
), adding a new private key manuallyd) an existing private key file at config param
IssuerKeysDirectory
(usually.../issuer-keys
), adding a new private key via API--> result of any scenario in this section: in-memory active key: second; public key of this origin's namespaces in registry db: second
Two existing private key files
a) never run pelican before (empty registry db)
--> in-memory active key: second; public key of this origin's namespaces in registry db: second
b) first key is registered in registry db
--> in-memory active key: second; public key of this origin's namespaces in registry db: second
c) second key is registered in registry db
--> in-memory active key: second; public key of this origin's namespaces in registry db: second
d) second key is registered in registry db, adding the third key through API
--> in-memory active key: third; public key of this origin's namespaces in registry db: third
Note: "first", "second" key are keys in chronological order of their private key file modification time.
In-memory active key can be copied at the top-right key icon in origin webUI; db public key of this namespace can be found at the
registry.sqlite
file, whose directory is specified by config param Registry.DbLocation, or inetc/pelican
by defaultFuture work
At this moment, we use the most recent modified .pem file as the active private key. In the next step, we want to allow multiple active private keys in use. #1818
Eventually, when multiple origins have the same namespace, the admin of these origins can append a common private key to all. Then these origin servers will register the corresponding shared public key in the registry. Currently this operation needs to manually done by OSDF admin.