Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

THREESCALE-11395 Improved secret watched-by logic #1030

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

carlkyrillos
Copy link
Contributor

@carlkyrillos carlkyrillos commented Nov 5, 2024

Issue Link

JIRA: THREESCALE-11395

What

This PR improves on the existing secret watched-by logic in a few ways:

  1. Previously the secret labels on the APIManager CR followed this pattern: secret.apimanager.apps.3scale.net/{secret-UID: 'true', where the value was set to true regardless of whether the secret had the watched-by label. Now the value is set to false if the Secret doesn't have the watched-by label and true if the secret does have the watched-by label.

  2. Previously the operator would annotate the apicast pod(s) with apimanager.apps.3scale.net/{secret-name}: '{secret-resourceVersion}' whenever a secret was referenced in the APIManager CR. Now the apicast pod(s) are only annotated if the referenced secret also has the watched-by label on it.

  3. Previously, any and all changes to a watched secret would trigger the apicast deployment(s) to rollout a new pod(s) whose annotations contained the latest resourceVersion of the watched secret. Now the operator will only rollout new pods when there is a change to the secret's .data, i.e. changes to the watched secrets labels, annotations, etc. won't trigger a rollout even if though the secret's resourceVersion changed. This is accomplished through a new secret called hashed-secret-data that stores a hash of each watched secret's data using SHA256 encryption. Whenever the operator detects that a watched secret's resourceVersion has changed, it first takes a hash of the secret's current .data and if that matches the secret's entry in the hashed-secret-data secret, then the operator will ignore the change and prevent a rollout. If the .data has changed, the operator will update the apicast pod's annotations with the watched secret's latest resourceVersion which will trigger a rollout.

NOTE: This PR doesn't implement support to watch TLS or ACL secrets. Support to watch these secrets will be added in a followup PR once #1025 and #1026 are merged.

Verification Steps

  1. Checkout this PR
  2. Prepare the cluster for a local install:
make download
make install

export NAMESPACE=3scale-test
oc new-project $NAMESPACE

cat << EOF | oc create -f -
kind: Secret
apiVersion: v1
metadata:
  name: s3-credentials
  namespace: $NAMESPACE
data:
  AWS_ACCESS_KEY_ID: c29tZXRoaW5nCg==
  AWS_BUCKET: c29tZXRoaW5nCg==
  AWS_REGION: dXMtd2VzdC0xCg==
  AWS_SECRET_ACCESS_KEY: c29tZXRoaW5nCg==
type: Opaque
EOF
  1. Create a custom policy secret (without the watched-by label):
cat << EOF | oc create -f -
apiVersion: v1
kind: Secret
metadata:
  name: custom-policy-1
  namespace: $NAMESPACE
type: Opaque
stringData:
  apicast-policy.json: |
    {
      "services": [
        {
          "proxy": {
            "policy_chain": [
              { "name": "apicast.policy.upstream",
                "configuration": {
                  "rules": [{
                    "regex": "/",
                    "url": "http://echo-api.3scale.net"
                  }]
                }
              }
            ]
          }
        }
      ]
    }
  example.lua: |
    local setmetatable = setmetatable

    local _M = require('apicast.policy').new('Example', '0.1')
    local mt = { __index = _M }
    
    function _M.new()
      return setmetatable({}, mt)
    end
    
    function _M:init()
      -- do work when nginx master process starts
    end
    
    function _M:init_worker()
      -- do work when nginx worker process is forked from master
    end
    
    function _M:rewrite()
      -- change the request before it reaches upstream
        ngx.req.set_header('X-CustomPolicy', 'customValue')
    end
    
    function _M:access()
      -- ability to deny the request before it is sent upstream
    end
    
    function _M:content()
      -- can create content instead of connecting to upstream
    end
    
    function _M:post_action()
      -- do something after the response was sent to the client
    end
    
    function _M:header_filter()
      -- can change response headers
    end
    
    function _M:body_filter()
      -- can read and change response body
      -- https://github.com/openresty/lua-nginx-module/blob/master/README.markdown#body_filter_by_lua
    end
    
    function _M:log()
      -- can do extra logging
    end
    
    function _M:balancer()
      -- use for example require('resty.balancer.round_robin').call to do load balancing
    end
    
    return _M
  init.lua: |
    return require('example')
EOF
  1. Create a custom environment secret (with the watched-by label):
cat << EOF | oc create -f -
apiVersion: v1
kind: Secret
metadata:
  name: custom-env-1
  namespace: $NAMESPACE
  labels:
    apimanager.apps.3scale.net/watched-by: apimanager
type: Opaque
stringData:
  custom_env.lua: |
    local cjson = require('cjson')
    local PolicyChain = require('apicast.policy_chain')
    local policy_chain = context.policy_chain
    
    local logging_policy_config = cjson.decode([[
    {
      "enable_access_logs": false,
      "custom_logging": "\"{{request}}\" to service {{service.id}} and {{service.name}}"
    }
    ]])
    
    policy_chain:insert( PolicyChain.load_policy('logging', 'builtin', logging_policy_config), 1)
    
    return {
      policy_chain = policy_chain,
      port = { metrics = 9421 },
    }
EOF
  1. Create an APIManager CR that references the custom-policy-1 and custom-env-1 secrets:
DOMAIN=$(oc get routes console -n openshift-console -o json | jq -r '.status.ingress[0].routerCanonicalHostname' | sed 's/router-default.//')
cat << EOF | oc create -f -
kind: APIManager
apiVersion: apps.3scale.net/v1alpha1
metadata:
  name: 3scale
  namespace: $NAMESPACE
spec:
  wildcardDomain: $DOMAIN
  system:
    fileStorage:
      simpleStorageService:
        configurationSecretRef:
          name: s3-credentials
  apicast:
    productionSpec:
      customPolicies:
        - name: custom-policy1
          version: "0.1"
          secretRef:
            name: custom-policy-1
      customEnvironments:
        - secretRef:
            name: custom-env-1
    stagingSpec:
      customPolicies:
        - name: custom-policy1
          version: "0.1"
          secretRef:
            name: custom-policy-1
      customEnvironments:
        - secretRef:
            name: custom-env-1
EOF
  1. Run the operator:
make run
  1. Wait for the install to complete:
oc get apimanager 3scale -oyaml -w
  1. Verify that the APIManager's labels reference the two secrets' UIDs with the custom-env-1 label value set to true and the custom-policy-1 label value set to false:
oc get apimanager 3scale -oyaml | yq '.metadata.labels'

The labels should look like this:

secret.apimanager.apps.3scale.net/6fd57b4c-2811-43a7-bcb2-8c84a8c70e43: "true"
secret.apimanager.apps.3scale.net/5691d37c-3707-4073-ab92-581302a78a2a: "false"
  1. Verify that the apicast-staging and apicast-productions pods' annotations have a references to the custom-env-1 secret but not the custom-policy-1 secret:
oc get pods -l deployment=apicast-staging -oyaml | yq '.items[0].metadata.annotations' | grep apimanager.apps.3scale.net

oc get pods -l deployment=apicast-production -oyaml | yq '.items[0].metadata.annotations' | grep apimanager.apps.3scale.net

The annotations for both pods should look like this:

apimanager.apps.3scale.net/customenv-secret-resource-version-custom-env-1: "944031"
apimanager.apps.3scale.net/env-configmap-hash: "1046001186"

NOTE: The env-configmap-hash annotation is supposed to be there and is not related to the watched-by secrets.

  1. Verify that the hashed-secret-data secret exists but only has an entry for the custom-env-1 secret:
oc get secret hashed-secret-data -oyaml
  1. Add the watched-by label to the custom-policy-1 secret:
oc label secret custom-policy-1 apimanager.apps.3scale.net/watched-by=apimanager
  1. Verify that the APIManager labels were updated and that both secret labels have a value of true:
oc get apimanager 3scale -oyaml | yq '.metadata.labels'

The labels should now look like this:

secret.apimanager.apps.3scale.net/6fd57b4c-2811-43a7-bcb2-8c84a8c70e43: "true"
secret.apimanager.apps.3scale.net/5691d37c-3707-4073-ab92-581302a78a2a: "true"
  1. Once the new apicast-staging and apicast-production pods are ready, verify that they have annotations with references to both the custom-env-1 secret and the custom-policy-1 secret:
oc get pods -l deployment=apicast-staging -oyaml | yq '.items[0].metadata.annotations' | grep apimanager.apps.3scale.net

oc get pods -l deployment=apicast-production -oyaml | yq '.items[0].metadata.annotations' | grep apimanager.apps.3scale.net

The annotations should look like this:

apimanager.apps.3scale.net/customenv-secret-resource-version-custom-env-1: "944031"
apimanager.apps.3scale.net/custompolicy-secret-resource-version-custom-policy-1: "954907"
apimanager.apps.3scale.net/env-configmap-hash: "1046001186"

NOTE: The env-configmap-hash annotation is supposed to be there and is not related to the watched-by secrets.

  1. Verify that the hashed-secret-data secret has an entry for both the custom-env-1 and custom-policy-1 secrets:
oc get secret hashed-secret-data -oyaml
  1. Edit the data in the either secret and verify that new apicast-staging and apicast-production pods are created:
oc get pods
  1. Once the pods have stabilized, add a label to the config secret and verify that no new pods are created even though the resourceVersion has changed:
oc label secret apicast-config-secret dummy=label

@carlkyrillos carlkyrillos requested a review from a team as a code owner November 5, 2024 18:42
@carlkyrillos carlkyrillos force-pushed the THREESCALE-11395 branch 3 times, most recently from 1b639f9 to fad7348 Compare November 7, 2024 21:25
@carlkyrillos carlkyrillos changed the title [WIP] THREESCALE-11395 Improved secret watched-by logic THREESCALE-11395 Improved secret watched-by logic Nov 7, 2024

* [**recommended way**] Create another secret with a different name and update the APIManager custom resource field `customEnvironments[].secretRef.name`. The operator will trigger a rolling update loading the new custom environment content.
* Update the existing secret content and redeploy apicast turning `spec.apicast.productionSpec.replicas` or `spec.apicast.stagingSpec.replicas` to 0 and then back to the previous value.
**NOTE**: If the referenced secret does not exist, the operator will mark the APIManager CustomResource as failed. The apicast Deployment object will also fail if the referenced secret does not exist.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @carlkyrillos . I did test as in Description and all looks fine. Code review is not completed, but meanwhile looks very good. Just this behavior - I'm not sure. I tried delete secret, but apimanager CR was not updated, althouh reconciler reporting error.

  • Pods status - all UP and running ok
  • Delete secret: oc delete secret custom-env-1
  • Error in reconciler:
2024-11-26T15:54:30+02:00	ERROR	Reconciler error	{"controller": "apimanager", "controllerGroup": "apps.3scale.net", "controllerKind": "APIManager", "APIManager": {"name":"3scale","namespace":"3scale-test"}, "namespace": "3scale-test", "name": "3scale", "reconcileID": "151a6cea-64bd-40cd-8f1b-cf8ad6be102c", "error": "spec.apicast.productionSpec.customEnvironments[0]: Invalid value: v1alpha1.CustomEnvironmentSpec{SecretRef:(*v1.LocalObjectReference)(0x14000f8f7e0)}: Secret \"custom-env-1\" not found", "errorCauses": [{"error": "spec.apicast.productionSpec.customEnvironments[0]: Invalid value: v1alpha1.CustomEnvironmentSpec{SecretRef:(*v1.LocalObjectReference)(0x14000f8f7e0)}: Secret \"custom-env-1\" not found"}]}
  • No error in CR:
~/go/3scale-operator oc describe  apimanager  3scale |grep -i error
~/go/3scale-operator oc describe  apimanager  3scale |grep -i warn 
~/go/3scale-operator oc describe  apimanager  3scale |grep Status 
Status:
    Status:                True
    Status:                True
~/go/3scale-operator oc describe  apimanager  3scale |grep Message
    Message:               All requirements for the current version are met

Thank you

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@valerymo You are right this is a bug, thank you for catching this. I'll take a look at the code and will push a fix.

Copy link
Contributor Author

@carlkyrillos carlkyrillos Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MStokluska After thinking about this some more I'm second guessing my original logic. Are we sure we want to fail the APIManager CR if a secret referenced in the .spec is missing? Would it be better to just fail the deployment that is using the watched secret? That way we wouldn't be holding up the entire installation, just the one component relying on the watched secret. CC: @valerymo

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if I understand correctly, if a secret referenced missing in APIM we are setting APIM as not ready?
If so, I see no objection in setting apim via not ready deployment to not ready as well. Especially that APIM doesn't log "lastError". But I have not looked at the PR so I might be missing some context.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MStokluska @valerymo I pushed a commit that checks to see if any of the watched secrets can't be found. If any watched secrets are missing, then the Available Condition in the APIManager CR's .status will switch to false and I added a message to the condition that lists which secrets couldn't be found.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants