Allow multiple private keys but use the latest one #1748

h2zh · 2024-11-15T21:50:52Z

Overview

In issue #561, we want to rotate out old private key when the new one is provided. This PR mainly changed the functions chain starting from GetIssuerPrivateJWK(), in order to support multiple private key files in a new directory specified by a new config param IssuerKeysDirectory, along with a new goroutine to monitor the changes there every 5 mins.

New Key

In this PR, admin can create new private key through either of the following two ways:

Simply drop the private key into into the new directory as a .pem file (the location where config param IssuerKeysDirectory points to, which by default is /etc/pelican/issuer-keys), or modify any .pem there. Pelican program will pick up the last modified .pem file through a goroutine running every 5 mins, set the private key it contains as the active one, add it to an in-memory map, and update the public key in registry sqlite db.
Hit the API endpoint “/api/v1.0/origin_ui/newIssuerKey”. Then the new .pem file will be created. Then it will wait for the same goroutine to be loaded (note: The web ui for this API is not included).

Security

How do we know the origin who updates the namespace's public key is the same origin registered this namespace? We authorize the origin on the registry side: require the public keys carried by the key update HTTP request (origin-->registry) must contain the public this origin previously registered, and check the proof of possession for both origin's current active private key and the previous private key, whose can prove the origin owns this namespace(prefix). The latter proof of possession verification uses a "previous private key's signature" sent by origin and verified by the public key stored in db on registry side.

Design thinking

Existing file hierarchy:

├── whatever-parent-dir/ (by default /etc/pelican)                             
│   ├── issuer.jwk            # Existing key location, specified by config param “IssuerKey”

To ensure backward compatibility, the new directory that stores the private keys (saved as .pem files) is mounted to a new config param IssuerKeysDirectory. User doesn’t need to change the value of IssuerKey, though all functions are not longer using it. The private key IssuerKey refers to will be migrated to the new directory specified by IssuerKeysDirectory. The new file hierarchy looks like this:

├── whatever-parent-dir/ (by default /etc/pelican)                             
│   └── issuer-keys/          # The new directory storing multiple private keys where “IssuerKeysDirectory” refers to
│       ├── pelican_generated_<timestamp_1>_<randomChars>.pem    # If no private key is provided, new .pem file will be created
│       ├── pelican_generated_<timestamp_2>_<randomChars>.pem         
│       ├── ...                 
│       ├── <system admin's fav name>.pem
│       └── migrated_<timestamp_2>_<randomChars>.pem    # previous key if it exists, migrated from the location where “IssuerKey” points to

Algorithm efficiency tradeoff

For a map using atomic.pointer, it requires to copy the entire map before update any key-value pair. In the loadPEMFiles func, because it runs every 5 minutes to update the in-memory key map from the file, which is not frequent, we think the memory use by a map using atomic.pointer is still acceptable.

Test

No private key file
--> new key created: in-memory active key == public key of this origin's namespaces in registry db
One existing private key file, adding a new private key
a) an existing private key file at config param IssuerKey (usually .../issuer.jwk), adding a new private key manually
b) an existing private key file at config param IssuerKey (usually .../issuer.jwk), adding a new private key via API
c) an existing private key file at config param IssuerKeysDirectory (usually .../issuer-keys), adding a new private key manually
d) an existing private key file at config param IssuerKeysDirectory (usually .../issuer-keys), adding a new private key via API
--> result of any scenario in this section: in-memory active key: second; public key of this origin's namespaces in registry db: second
Two existing private key files
a) never run pelican before (empty registry db)
--> in-memory active key: second; public key of this origin's namespaces in registry db: second
b) first key is registered in registry db
--> in-memory active key: second; public key of this origin's namespaces in registry db: second
c) second key is registered in registry db
--> in-memory active key: second; public key of this origin's namespaces in registry db: second
d) second key is registered in registry db, adding the third key through API
--> in-memory active key: third; public key of this origin's namespaces in registry db: third

Note: "first", "second" key are keys in chronological order of their private key file modification time.
In-memory active key can be copied at the top-right key icon in origin webUI; db public key of this namespace can be found at the registry.sqlite file, whose directory is specified by config param Registry.DbLocation, or in etc/pelican by default

Future work

At this moment, we use the most recent modified .pem file as the active private key. In the next step, we want to allow multiple active private keys in use. #1818

Eventually, when multiple origins have the same namespace, the admin of these origins can append a common private key to all. Then these origin servers will register the corresponding shared public key in the registry. Currently this operation needs to manually done by OSDF admin.

…pting sync.Map

…new private key(s) if new file(s) are detected

…ks that fix all problems happened in the unit tests

… behavior; linter problems fix

…cerns

…ive private key var; Remove the redundant "isRegistered" logic

…k and forth

jhiemstrawisc · 2024-12-19T23:00:03Z

One other quick comment, now that I think we've settled on an answer -- wherever there's some JSON that gets passed back and forth via API calls, we should stick to camel case:
https://github.com/orgs/PelicanPlatform/discussions/1734#discussioncomment-11621001

For example, the new RegisteredPrefixUpdate struct from registry/registry_pubkey_update.go uses snake case when it defines JSON tags like json:"client_nonce". Unless we're worried about a specific backwards compatibility issue here, these should be defined like json:"clientNonce"

…teredOnNamespace

bbockelm · 2024-12-21T17:14:31Z

registry/client_commands_test.go

+		func() { assert.NoError(t, egrp.Wait()) }()
+		cancel()


This is clearly the deadlock and looks like some bad copy/pasting without understanding what the original code does.

The original was:

defer func() { require.NoError(t, egrp.Wait()) }() defer cancel()

Since defer is executed LIFO, this will invoke cancel() followed by the Wait() (to wait on the canceled functions to clean up).

The revised code is effectively this:

defer func() { func() { require.NoError(t, egrp.Wait()) }() cancel() }

Which waits on the group to stop, then shuts it down, a clear deadlock.

Thanks for pointing out and fixing this lingering problem! Having said that, this is just an intermediate problem in the tests - the root problem is topology mock server doesn't response in TestRegistryKeyChainingOSDF and test timed out in TestRegistryKeyChaining (see the error logs in the tests result). These two errors were suppressed if the other two tests in client_commands_test.go are commented out. Justin figured out there is a test state leak because commenting out the other tests allowed both TestRegistryKeyChaining and TestRegistryKeyChainingOSDF to pass. I'm still looking for the leak and hoping fixing the problems mentioned in your new PR comments could potentially fix the leak.

Test result Test result

Comment Out ⬇️ TestRegistryKeyChainingOSDF TestRegistryKeyChaining

TestServeNamespaceRegistry ❌ ❌

TestMultiPubKeysRegisteredOnNamespace ❌ ❌

Both test funcs above ✔️ ✔️

The current code had the logic reversed -- first it waited for completion, then it cancelled the running goroutines.

bbockelm

There's still a lot to do here. Some minor stylistic / convention comments in the review -- plus a few major requests:

Redo the handshake between origin and registry to allow arbitrary replacement of the public keys.
Move the key refresh logic into the config module and add unit tests. Make the updates atomic and have things in lexicographical order.
Do not deprecate the existing issuer key (at least not in this version!) to allow external folks to adapt to the new setup. Do not delete a key that an admin has provided. Instead, behave as if the issuer key was found within the key directory.

bbockelm · 2024-12-21T17:21:01Z

cmd/generate_keygen.go

@@ -39,7 +39,7 @@ func keygenMain(cmd *cobra.Command, args []string) error {
 		return errors.Wrap(err, "failed to get the current working directory")
 	}
 	if privateKeyPath == "" {
-		privateKeyPath = filepath.Join(wd, "issuer.jwk")
+		privateKeyPath = filepath.Join(wd, "issuer-keys")


How does this work? Before, the variable was a file and now it's a directory. All the code below operates on this as if it was a directory.

On first glance, appears that the tool is broken, no?

Actually, the private/public generation logic lies in config.GetIssuerPublicJWKS(), which was already updated to incorporate the new keys in IssuerKeysDirectory instead of only one key file at IssuerKey. The pelican generate keygen command works as expected (creating a key in ./issuer-keys).

config/init_server_creds.go

bbockelm · 2024-12-21T17:33:43Z

config/init_server_creds.go

+	}
+
+	// Rename the existing private key file and set destination path
+	fileName := fmt.Sprintf("migrated_%d_%s.pem",


Use the appropriate mkstemp to create a unique file. This is not sufficient.

Done. I also use the mkstemp logic in another similar function GeneratePEM

bbockelm · 2024-12-21T17:38:29Z

config/init_server_creds.go

-		newKey, err := loadIssuerPrivateJWK(issuerKeyFile)
+	issuerKeysDir := param.IssuerKeysDirectory.GetString()
+	currentIssuerKeysDir := getCurrentIssuerKeysDir()
+	// Handles runtime changes to the issuer keys directory (configured via "IssuerKeysDirectory" parameter).


I think the tests should take a different approach: reset the config state between the calls. Otherwise, it clutters up the runtime code.

config/init_server_creds.go

bbockelm · 2024-12-21T19:02:59Z

registry/registry_pubkey_update.go

+				// Check the origin is authorized to update (possessing the public key used for prefix initial registration)
+				// Parse all public keys of the sender into a JWKS
+				var clientKeySet jwk.Set
+				if data.AllPubkeys == nil { // backward compatibility - AllPubkeys only exists in the payload in Pelican 7.12 or later


I'm confused by this.

data is from the client -- why is there backward compatibility code in an API call introduced in 7.12?

bbockelm · 2024-12-21T19:05:03Z

registry/registry_pubkey_update.go

+							continue
+						}
+
+						if registryDbKid == clientKid {


This seems unnecessary.

The client should prove it possess one of the existing keys.

However, this additionally forces the client to keep the key it used for the challenge in the updated key set. That seems unnecessary. We should simply see if the key used to sign is one of the existing known keys.

bbockelm · 2024-12-21T19:05:22Z

registry/registry_pubkey_update.go

+				}
+
+				// Check if any key in `clientKeySet` matches a key in `registryDbKeySet`
+				registryDbKeysIter := registryDbKeySet.Keys(ctx)


These loops could be replaced by the LookupKeyID method.

bbockelm · 2024-12-21T19:12:32Z

registry/registry_pubkey_update.go

+
+// Update the public key of registered prefix(es) if the http request passed client and server verification for nonce.
+// It returns the response data, and an error if any
+func updateNsKeySignChallengeCommit(ctx *gin.Context, data *RegisteredPrefixUpdate) (map[string]interface{}, error) {


I think the design here is still a bit confused. For example, it removes the previous key -- not clear that is matching the intent here.

Let me suggest something cleaner:

At the start of handshake, the registry sends all known public keys for the namespace.

The client receives the list of public keys and selects a corresponding private key from that list (if one is available) to sign the challenge.

The client sends the server the updated set of public keys plus the proof of possession from one of the known keys.

Thus, at the end of the handshake:

The client has demonstrated it owns a private key from the existing known public keys.

The client's desired set of keys is synchronized with the registry; the client can remove or add any arbitrary keys in the call.

registry/client_commands_test.go

bbockelm · 2024-12-21T19:19:50Z

Oh! One final item from the review -- please rebase and squash down the commits to a clean list.

Each commit should be self-contained (compiles, passes tests), covering a single piece of functionality (such as adding the key directory logic; or adding the API to update the keys), and have a detailed commit message. Please refer to https://osg-htc.org/technology/software/git-software-development/#making-good-pull-requests-the-art-of-good-commits for some best practices. Given the scope of the changes, I'd expect 4-5 commits to come out on this branch -- not 60.

I typically like keeping the git history in response to reviewer comments to help the reader understand the evolution of ideas. However, at this point, the history of the PR is so messy that it'd be impossible to glean that knowledge -- more value is in a clean branch.

…nNamespace

… condition in updating previous active private key; a few comments

…te illusion for IssuerKey if it exists, minor test impr

h2zh added 13 commits November 15, 2024 21:25

key manager

43b1440

check private keys dir every 10 mins

5f4532f

Use the latest private key

da110ab

enbale concurrent access to the issuer private keys in memory, by ado…

3fc8d52

…pting sync.Map

Checks the directory containing .pem files every 5 minutes and loads …

46380ef

…new private key(s) if new file(s) are detected

backward compatibility: migrate existing issuer key, patches and twea…

05eb5f6

…ks that fix all problems happened in the unit tests

use atomic pointer for the in-memory private keys map

ab4da0c

newIssuerKey API endpoint on origin, and unit tests

e6a72f2

get the most recent modified private key from file dir, simply the code

5ba6057

move the LaunchIssuerKeysDirRefresh func to origin service

04342d0

fix linting problem

e94c311

use new config param IssuerKeysDirectory to replace IssuerKey

54f48f5

fix linting problems

532cd05

h2zh requested a review from matyasselmeci November 19, 2024 14:40

h2zh linked an issue Nov 19, 2024 that may be closed by this pull request

Allow origin issuer key to be rotated #561

Open

h2zh added 6 commits November 19, 2024 16:03

in docs, correct the scope of components affected by IssuerKeysDirectory

7b738d2

deprecate IssuerKey

f4fdbd8

improve the naming of migrated key

09d3586

patch for the algorithm of new key regristration

a80f272

register namespace with new key (with TODO left)

a518afb

update namespace pubKey in registry db

dceede7

h2zh added enhancement New feature or request origin Issue relating to the origin component registry Issue relating to the registry component go Pull requests that update Go code security labels Nov 26, 2024

h2zh added 4 commits November 27, 2024 16:51

update pubkey of all origin exports; align new key API and manual add…

8cb8090

… behavior; linter problems fix

improve how the registry authorize origin to address the security con…

473c5f3

…cerns

Enhanced PoP using a "previous private key's signature"; previous act…

007b7f1

…ive private key var; Remove the redundant "isRegistered" logic

Avoid key file naming collision when running new and old codebase bac…

00b24c8

…k and forth

h2zh added 12 commits December 20, 2024 01:57

3rd attempt to fix timeout

f1123dd

4th attempt to fix timeout

08481cd

5th attempt to fix timeout

f3c7c99

6th attempt to fix timeout

7cc6591

7th attempt to fix timeout

0955e18

8th attempt to fix timeout

6162778

9th attempt to fix timeout

e9a0fb2

10th attempt to fix timeout

752ab32

10.5th attempt to fix timeout

0b1b7c1

11th attempt to fix timeout

d6dc0e1

12th attempt to fix timeout

5eb651f

only run ./registry/registry_db_test.go and client_commands_test.go

85bc504

h2zh force-pushed the multiple-private-keys branch from 19ce7bb to 85bc504 Compare December 20, 2024 19:30

h2zh added 6 commits December 20, 2024 19:57

comment out TestServeNamespaceRegistry

1fec0ce

comment out TestServeNamespaceRegistry

9d03bd4

comment out both TestServeNamespaceRegistry and TestMultiPubKeysRegis…

08d3044

…teredOnNamespace

test improvement

6809e1d

fix ResetCurrentIssuerKeysDir

2c1ebd3

revert target tests in test-template.yml

ba42296

bbockelm reviewed Dec 21, 2024

View reviewed changes

Avoid deadlock when waiting on exit

7280e9d

The current code had the logic reversed -- first it waited for completion, then it cancelled the running goroutines.

bbockelm requested changes Dec 21, 2024

View reviewed changes

h2zh added 6 commits December 23, 2024 15:53

bring back TestServeNamespaceRegistry and TestMultiPubKeysRegisteredO…

b6593ce

…nNamespace

comment out TestMultiPubKeysRegisteredOnNamespace

896aa4b

comment out TestServeNamespaceRegistry

67b62f0

generate a unique filename using a POSIX mkstemp-like logic; fix race…

7933e3c

… condition in updating previous active private key; a few comments

fix and refine private key i/o relevant funcs, remove risky api, crea…

e089e62

…te illusion for IssuerKey if it exists, minor test impr

attempt to solve timeout

a878208

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow multiple private keys but use the latest one #1748

Allow multiple private keys but use the latest one #1748

h2zh commented Nov 15, 2024 •

edited

Loading

jhiemstrawisc commented Dec 19, 2024

bbockelm Dec 21, 2024

h2zh Dec 23, 2024

h2zh Dec 23, 2024 •

edited

Loading

bbockelm left a comment

bbockelm Dec 21, 2024

h2zh Dec 26, 2024 •

edited

Loading

bbockelm Dec 21, 2024

h2zh Dec 27, 2024

bbockelm Dec 21, 2024

bbockelm Dec 21, 2024

bbockelm Dec 21, 2024

bbockelm Dec 21, 2024

bbockelm Dec 21, 2024

bbockelm commented Dec 21, 2024

	Test result	Test result
Comment Out ⬇️	TestRegistryKeyChainingOSDF	TestRegistryKeyChaining
TestServeNamespaceRegistry	❌	❌
TestMultiPubKeysRegisteredOnNamespace	❌	❌
Both test funcs above	✔️	✔️

Allow multiple private keys but use the latest one #1748

Are you sure you want to change the base?

Allow multiple private keys but use the latest one #1748

Conversation

h2zh commented Nov 15, 2024 • edited Loading

Overview

New Key

Security

Design thinking

Test

Future work

jhiemstrawisc commented Dec 19, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

h2zh Dec 23, 2024 • edited Loading

Choose a reason for hiding this comment

bbockelm left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

h2zh Dec 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bbockelm commented Dec 21, 2024

h2zh commented Nov 15, 2024 •

edited

Loading

h2zh Dec 23, 2024 •

edited

Loading

h2zh Dec 26, 2024 •

edited

Loading