Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reference secrets by name, do not inject #9345

Merged
merged 10 commits into from
Sep 9, 2024

Conversation

snazy
Copy link
Member

@snazy snazy commented Aug 14, 2024

Warehouses/object-stores require secrets (access-keys, tokens, etc). These secrets are currently injected into the object-store specific configuration holders, when the effective bucket configuration is calculated.

The configuration of warehouses/object-stores should be distinct from the definition of secrets. This requires to not have the secrets in a bucket configuration at any time, but instead reference secrets by name. These names are then resolved to secrets.

This change replaces the Secret types in all bucket configuration types with a String as the reference to the secret.

@snazy snazy force-pushed the configurable-warehouses branch 7 times, most recently from 75d2f47 to fc7ee21 Compare August 15, 2024 14:31
@snazy snazy marked this pull request as ready for review August 15, 2024 14:31
@snazy snazy requested a review from dimas-b August 15, 2024 14:39
@snazy snazy force-pushed the configurable-warehouses branch 2 times, most recently from 029b36e to f40a16c Compare August 15, 2024 16:58
Copy link
Member

@dimas-b dimas-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Java code LGTM... helm is tricky 🤔

Comment on lines 52 to 54
@SuppressWarnings("unchecked")
S secret = (S) secretType.fromValueMap(secretData);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@SuppressWarnings("unchecked")
S secret = (S) secretType.fromValueMap(secretData);
S secret = secretJavaType.cast(secretType.fromValueMap(secretData));

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... or perhaps secretType.fromValueMap(secretData, secretJavaType), then secretType could do better validation on the expected return type than ClassCastException. WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? Each SecretType has only one Java type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the second comment is totally optional, but the first one makes sense, IMHO, it utilizes the actual runtime type parameter for casting (as opposed to compiled validation at the call site).

@@ -247,8 +243,7 @@ void icebergConfigDefaults(StorageUri warehouse, BiConsumer<String, String> conf
S3BucketOptions bucketOptions =
s3clientSupplier
.s3options()
.effectiveOptionsForBucket(
Optional.ofNullable(warehouse.authority()), s3clientSupplier.secretsProvider());
.effectiveOptionsForBucket(Optional.ofNullable(warehouse.authority()));
bucketOptions.region().ifPresent(x -> config.accept(S3_CLIENT_REGION, x));
config.accept(ICEBERG_FILE_IO_IMPL, "org.apache.iceberg.aws.s3.S3FileIO");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

off topic: I'm still wondering about using "resolving" FileIO here (and everythere). The ObjectIO code appears to try and figure out whether we have some matching locations for each storage type, so it looks that we could potentially inject credentials both for S3 and GCS (for example), but the FileIO is fixed and will support only one storage type 🤔

{{- include "nessie.secretToEnv" (list .Values.catalog.storage.s3.defaultOptions.accessKeySecret "awsAccessKeyId" "nessie.catalog.service.s3.default-options.access-key.name" . ) }}
{{- include "nessie.secretToEnv" (list .Values.catalog.storage.s3.defaultOptions.accessKeySecret "awsSecretAccessKey" "nessie.catalog.service.s3.default-options.access-key.secret" . ) }}
{{- include "nessie.secretToEnv" (list .Values.catalog.storage.s3.defaultOptions.accessKeySecret "awsAccessKeyId" "nessie-catalog-secrets.s3.default-options.access-key.name" "nessie.catalog.service.s3.default-options.access-key" "nessie-catalog-secrets.s3.default-options.access-key" . ) }}
{{- include "nessie.secretToEnv" (list .Values.catalog.storage.s3.defaultOptions.accessKeySecret "awsSecretAccessKey" "nessie-catalog-secrets.s3.default-options.access-key.secret" "" "" . ) }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to produce incorrect k8s yaml... for example:

            - name: "nessie-catalog-secrets.s3.default-options.access-key.name"
              valueFrom:
                secretKeyRef:
                  name: "k8sSecABC"
                  key: "key111"
            - name: "nessie.catalog.service.s3.default-options.access-key"
              value: "nessie-catalog-secrets.s3.default-options.access-key"- name: "nessie-catalog-secrets.s3.default-options.access-key.secret"
              valueFrom:
                secretKeyRef:
                  name: "k8sSecABC"
                  key: "access111"

The second one (secret) seems to be broken syntax-wise.

Also, nessie-catalog-secrets.s3.default-options.access-key is not referenced from the config map.

Test values (diff):

diff --git a/helm/nessie/values.yaml b/helm/nessie/values.yaml
index 3e066c45a..ace6e7f7f 100644
--- a/helm/nessie/values.yaml
+++ b/helm/nessie/values.yaml
@@ -138,14 +138,14 @@ bigtable:
 catalog:
 
   # -- Whether to enable the REST catalog service.
-  enabled: false
+  enabled: true
 
   # -- Iceberg catalog settings.
   iceberg:
 
     # -- The default warehouse name. Required. This is just a symbolic name; it must refer to a
     # declared warehouse below.
-    defaultWarehouse: ~  # warehouse1
+    defaultWarehouse: warehouse1
 
     # -- Iceberg config defaults applicable to all clients and warehouses. Any properties that are
     # common to all iceberg clients should be included here. They will be passed to all clients on
@@ -166,7 +166,7 @@ catalog:
     # warehouse must be defined.
     warehouses:
       # -- Symbolic name of the warehouse. Required.
-    - name: ~  # warehouse1
+    - name: warehouse1
       # -- Location of the warehouse. Required. Used to determine the base location of a table.
       # Scheme must be either s3 (Amazon S3), gs (Google GCS) or abfs / abfss (Azure ADLS). Storage
       # properties for each location can be defined below.
@@ -249,7 +249,7 @@ catalog:
         # -- Settings only relevant when clientAuthenticationMode is ASSUME_ROLE.
         serverIam:
           # -- Whether to enable server assume-role functionality.
-          enabled: ~  # false
+          enabled: true  # false
           # -- The ARN of the role to assume for accessing S3 data. This parameter is required for
           # Amazon S3, but may not be required for other storage providers (e.g. Minio does not use it
           # at all).
@@ -271,11 +271,11 @@ catalog:
         # -- AWS credentials. Required when serverAuthenticationMode is STATIC.
         accessKeySecret:
           # -- The secret name to pull AWS credentials from.
-          name: ~
+          name: k8sSecABC
           # -- The secret key storing the AWS secret key id.
-          awsAccessKeyId: ~
+          awsAccessKeyId: key111
           # -- The secret key storing the AWS secret access key.
-          awsSecretAccessKey: ~
+          awsSecretAccessKey: access111
 
       # -- Per-bucket S3 settings. Override the general settings above.
       buckets: []

CLI: helm template --debug --namespace nessie-ns .

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm - that stuff won't become my friend.
Anyway, it should work now. Added a new ci/catalog-secrets-values.yaml and a new step to CI to run this helm template against each of those.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generated Deployment yaml still looks wrong to me:

            - name: "nessie-catalog-secrets.s3.default-options.access-key.name"
              valueFrom:
                secretKeyRef:
                  name: "k8sSecABC"
                  key: "key111"- name: "nessie-catalog-secrets.s3.default-options.access-key.secret"

Also, I think we should reference nessie-catalog-secrets.s3.default-options.access-key from the config map as nessie.catalog.service.s3.default-options.access-key=nessie-catalog-secrets.s3.default-options.access-key, but it's not there.

nit: WDYT about renaming the env. var. to nessie-catalog-secrets.s3.access-key for simplicity. In values.yaml there's only one parameter for defining this secret.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should reference nessie-catalog-secrets.s3.default-options.access-key from the config map as nessie.catalog.service.s3.default-options.access-key=nessie-catalog-secrets.s3.default-options.access-key

it is there:

            #
            # s3.default-options.access-key
            #
            - name: "nessie.catalog.service.s3.default-options.access-key"
              value: "nessie-catalog-secrets.s3.default-options.access-key"
            - name: "nessie-catalog-secrets.s3.default-options.access-key.name"
              valueFrom:
                secretKeyRef:
                  name: "defaultCreds"
                  key: "defaultAccessKeyId"
            - name: "nessie-catalog-secrets.s3.default-options.access-key.secret"
              valueFrom:
                secretKeyRef:
                  name: "defaultCreds"
                  key: "defaultSecretAccessKey"#

via helm template --debug --namespace nessie-ns --values ci/catalog-secrets-values.yaml .

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally!!!
Got something into ci.yml that validates all this using "real" "secrets".

@snazy snazy force-pushed the configurable-warehouses branch from f40a16c to d21ab26 Compare August 16, 2024 07:59
@snazy snazy requested a review from dimas-b August 16, 2024 08:00
cd helm/nessie
for f in values.yaml ci/*.yaml ; do
echo "::group::helm template $f"
helm template --debug --namespace nessie-ns $f .
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think it auto-detects syntax errors... just dumps the generated YAML, so manual validation of the output is still required.

{{- include "nessie.secretToEnv" (list .Values.catalog.storage.s3.defaultOptions.accessKeySecret "awsAccessKeyId" "nessie.catalog.service.s3.default-options.access-key.name" . ) }}
{{- include "nessie.secretToEnv" (list .Values.catalog.storage.s3.defaultOptions.accessKeySecret "awsSecretAccessKey" "nessie.catalog.service.s3.default-options.access-key.secret" . ) }}
{{- include "nessie.secretToEnv" (list .Values.catalog.storage.s3.defaultOptions.accessKeySecret "awsAccessKeyId" "nessie-catalog-secrets.s3.default-options.access-key.name" "nessie.catalog.service.s3.default-options.access-key" "nessie-catalog-secrets.s3.default-options.access-key" . ) }}
{{- include "nessie.secretToEnv" (list .Values.catalog.storage.s3.defaultOptions.accessKeySecret "awsSecretAccessKey" "nessie-catalog-secrets.s3.default-options.access-key.secret" "" "" . ) }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generated Deployment yaml still looks wrong to me:

            - name: "nessie-catalog-secrets.s3.default-options.access-key.name"
              valueFrom:
                secretKeyRef:
                  name: "k8sSecABC"
                  key: "key111"- name: "nessie-catalog-secrets.s3.default-options.access-key.secret"

Also, I think we should reference nessie-catalog-secrets.s3.default-options.access-key from the config map as nessie.catalog.service.s3.default-options.access-key=nessie-catalog-secrets.s3.default-options.access-key, but it's not there.

nit: WDYT about renaming the env. var. to nessie-catalog-secrets.s3.access-key for simplicity. In values.yaml there's only one parameter for defining this secret.

@snazy snazy force-pushed the configurable-warehouses branch 15 times, most recently from 105e820 to fd7c650 Compare August 21, 2024 11:32
@snazy snazy requested a review from dimas-b August 22, 2024 16:02
@snazy snazy force-pushed the configurable-warehouses branch 2 times, most recently from 9b79997 to 711bf1d Compare August 28, 2024 16:11
@snazy snazy force-pushed the configurable-warehouses branch from 711bf1d to 32970ee Compare September 4, 2024 08:04
@snazy
Copy link
Member Author

snazy commented Sep 4, 2024

Ping @dimas-b

@snazy snazy force-pushed the configurable-warehouses branch 3 times, most recently from a5e9f32 to c1dc43a Compare September 6, 2024 10:15
Copy link
Member

@dimas-b dimas-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The URN idea SGTM, but I have a comment about the specific impl. (below)

checkArgument(
!next.isBlank() && next.charAt(0) != ':',
"Invalid secret URI, must be in the form 'urn:nessie-secret:<provider>:<secret-name>'");
name = URI.create(next);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For an input of urn:nessie-secret:one:basic, the URI becomes just basic here and feeds back to the same SecretsProvider.getSecret() API... This looks odd to me. The new URI does not even have a scheme (it has only path).

If the underlying implementations are supposed to interpret it as an opaque string, I think it would be work making a separate java API method for that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From another angle, if the SecretsProvider were to change from ResolvingSecretsProvider to SmallryeConfigSecretsProvider, the caller using the same input would get nothing, but should get the same output as before (assuming the correct provider is substituted).

I think it would be clearer if each leaf implementation dealt with its own URN sub-scheme, while taking the original URN on input (in whole or by component). WDYT?

The parsing could be done in a default or common base method.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SecretsProvider implementations don't know their name, and they don't need to know their name - that's the whole idea here.
ResolvingSecretsProvider is mandatory - and in #8708 hidden behind a short-time cache.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But shouldn't ResolvingSecretsProvider be the only implementation of SecretsProvider? Components like SmallryeConfigSecretsProvider and UnsafePlainTextSecretsProvider look more "low-level" and imho should only be accessed by ResolvingSecretsProvider and thus implement a different interface (probably an SPI since we would later accept more secret sources like Vault etc.). That second interface would likely take a String as input, not an URI.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separated into two interfaces.

import java.util.Map;

public class UnsafePlainTextSecretsProvider extends AbstractMapBasedSecretsProvider {
private final Map<URI, Map<String, String>> unsafeSecrets;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the Map necessary with URNs? How about urn:nessie-secret:plain:key1=value1,key2=value2?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you're asking for. This class is rather just for testing purposes.
Also, there's by design no need to ask for multiple secrets.

Copy link
Member

@dimas-b dimas-b Sep 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean putting plain text secrets directly in the URN and avoiding an extra config/map layer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a fan of optimizing for plain text "secrets".

snazy added 10 commits September 9, 2024 11:52
Warehouses/object-stores require secrets (access-keys, tokens, etc). These secrets are currently injected into the object-store specific configuration holders, when the effective bucket configuration is calculated.

The configuration of warehouses/object-stores should be distinct from the definition of secrets. This requires to not have the secrets in a bucket configuration at any time, but instead _reference_ secrets by name. These names are then resolved to secrets.

This change replaces the `Secret` types in all bucket configuration types with a `String` as the _reference_ to the secret.

Also implicitly refactors most of `:nessie-catalog-secrets-api` module.
@snazy snazy force-pushed the configurable-warehouses branch from 5a475ad to 9936274 Compare September 9, 2024 11:43
@snazy snazy merged commit 904c2ae into projectnessie:main Sep 9, 2024
16 checks passed
@snazy snazy deleted the configurable-warehouses branch September 9, 2024 14:05
@snazy snazy added this to the 0.96.0 milestone Sep 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants