From 506c32b0adab25d250bf722c3cb9f092af981c1e Mon Sep 17 00:00:00 2001 From: Eguzki Astiz Lezaun Date: Tue, 8 Nov 2022 07:23:41 +0100 Subject: [PATCH 1/5] initial draft --- ...000-wire-policies-with-backend-services.md | 151 ++++++++++++++++++ 1 file changed, 151 insertions(+) create mode 100644 rfcs/0000-wire-policies-with-backend-services.md diff --git a/rfcs/0000-wire-policies-with-backend-services.md b/rfcs/0000-wire-policies-with-backend-services.md new file mode 100644 index 00000000..b5a130e4 --- /dev/null +++ b/rfcs/0000-wire-policies-with-backend-services.md @@ -0,0 +1,151 @@ +# RFC 0000 + +- Feature Name: `wire_policies_with_backend` +- Start Date: 2022-11-04 +- RFC PR: [Kuadrant/architecture#0000](https://github.com/Kuadrant/architecture/pull/0000) +- Issue tracking: [Kuadrant/architecture#0000](https://github.com/Kuadrant/architecture/issues/0000) + +# Summary +[summary]: #summary + +This RFC proposes a mechanism to wire Kuadrant policies with rate limiting and authN/authZ services. + +# Motivation +[motivation]: #motivation + +After the PR [#48](https://github.com/Kuadrant/kuadrant-operator/pull/48) was merged, +Kuadrant suffered an unwanted side effect: Kuadrant's policies only worked when kuadrant was installed in the `kuadrant-system` namespace. +This issue comes from the fact that the policy controllers are no longer deployed as components of a particular Kuadrant instance. +Instead, the policy controllers live at the Kuadrant's operator pod and they are up&running even if there is no Kuadrant instance running in the cluster. +The very source issue of this "side effect" is the design about how backend services were wired with the Ratelimit/Auth policies. +The design allowed one kuadrant instance to be installed in any namespace, however, the design only allowed one kuadrant instance to be running in the cluster. + +### How it worked before the merge #48 +When an instance of Kuadrant, represented by a Kuadrant custom Resource (RC), was created, the following workflow was run by the Kuadrant operator: +* Read the Kuadrant custom resource, paying attention to the namespace. Let's call `K` the namespace where the Kuadrant CR is created. +* Deploy one Limitador instance in the `K` namespace +* Deploy one Authorino instance in the `K` namespace +* Register Authorino instance living in `K` in the Istio system as an external authorization service. +* Deploy the RateLimitPolicy controller passing as env var the address of the limitador instance in the `K` namespace. +* Deploy the AuthPolicy controller + +When the user created a rate limit policy, the controller already knew about the Limitador's location (name and namespace) to configure it accordingly with the spec of the policy. + +Authorino is a k8s controller and the Kuadrant's operator deploys it in cluster-wide mode +without any [sharding](https://github.com/Kuadrant/authorino/blob/main/docs/architecture.md#sharding) defined. +When the user created an auth policy, the controller does not need to know where authorino lives because a) it assumes that there is only one Authorino instance (which might be wrong as well) and b) the controller assumes that Authorino is watching the entire cluster without filtering. +Thus, the controller manages an AuthConfig object in the hard-coded `kuadrant-system` namespace (which btw it is also wrong). + +### How it works after the merge #48 +When the policy controllers were moved to the operator's pod, one of the design's requirements was unmet: the controllers know at deploy time Limitador's location. Thus, causing the issue. +The design assumed one policy controller instance per limitador instance. The policy controller got limitador's location at boot time via an environment variable. +After the policies merge into the operator's pod, the policies controllers will be a singleton instance (one pod, one container) at the cluster level. +Regardless of the kuadrant's support for multiple or single limitador/authorino instances, the policies controllers will be running in a single pod in the entire cluster. +Even if kuadrant only supports a single limitador/authorino instance, +the policies controllers still need to know the location of the limitador/authorino instances. + +Therefore, a new design is needed that wires the user created policies with an existing limitador/authorino instance. Even though, currently, this wiring up works for the AuthPolicy, it is done under the assumption of a single Authorino watching all the cluster for the AuthConfig objects, which is a sub-optimal design. + +### Potential Scenarios + +* **Kuadrant supports only one instance deployed in a hard-coded namespace** + +No need to wire, as authorino and limitador location is well known. All policies will be linked to the same instance of Limitador/Authorino. + +* **Kuadrant supports only one instance deployed in a configurable namespace** + +Wiring is needed. The policy controllers do not know where Limitador/Authorino live. All policies will be linked to the same instance of Limitador/Authorino. + +* **Kuadrant supports multiple instances in a cluster** + +Wiring is needed. There should be a way to know the Limitador/Authorino instance only by reading the policy. + +# Guide-level explanation +[guide-level-explanation]: #guide-level-explanation + +The proposal is to make Kuadrant to support multiple instances in a cluster. + +This proposal is based on a simple design decision: One gateway runs one and only one kuadrant's external auth service (Authorino) and only one kuadrant's external rate limit service (Limitador). + +Multiple rate limit or auth stages involving multiple instances of Authorino and Limitador can be implemented. +Until there is a real use case and it is strictly necessary, this scenario is discarded for implementing kuadrant's protection. +The main reason is about complexity. It is already complex enough to reason about rate limiting and auth services having a single stage. Adding multiple rate limiting stages, or hitting multiple Limitador instances in a single stage (doable with the WASM module) makes it too complex to reason about observed behavior. Currently there is no use case to require that complex scenario. + +A kuadrant instance includes: + +* One Limitador deployment instance +* One Authorino deployment instance +* A list of dedicated gateways. Those gateways cannot be shared between multiple kuadrant instances. + +![](https://i.imgur.com/QdeCYs6.png) + +Highlights: +* The Kuadrant instance is not enclosed by k8s namespaces. +* One gateway can belong to (be managed by) one and only one kuadrant instance. +* One kuadrant instance does not own rate limit policies or auth policies. +* The traffic routed by HTTPRoute 1 through the gateway A and gateway B will be protected by RLP 1 and KAP 1, using Limitador and Authorino instances located at the namespace K1. +* The traffic routed by HTTPRoute 2 through the gateway B will be protected by RLP 2 and KAP 2, using Limitador and Authorino instances located at the namespace K1 +* The traffic routed by HTTPRoute 2 through the gateway C will be protected by RLP 2 and KAP 2, using Limitador and Authorino instances located at the namespace K2 +* The traffic + +Explain the proposal as if it was implemented and you were teaching it to Kuadrant user. That generally means: + +- Introducing new named concepts. +- Explaining the feature largely in terms of examples. +- Explaining how a user should *think* about the feature, and how it would impact the way they already use Kuadrant. It should explain the impact as concretely as possible. +- If applicable, provide sample error messages, deprecation warnings, or migration guidance. +- If applicable, describe the differences between teaching this to existing and new Kuadrant users. + +# Reference-level explanation +[reference-level-explanation]: #reference-level-explanation + +This is the technical portion of the RFC. Explain the design in sufficient detail that: + +- Its interaction with other features is clear. +- It is reasonably clear how the feature would be implemented. +- How error would be reported to the users. +- Corner cases are dissected by example. + +The section should return to the examples given in the previous section, and explain more fully how the detailed proposal makes those examples work. + +# Drawbacks +[drawbacks]: #drawbacks + +Why should we *not* do this? + +# Rationale and alternatives +[rationale-and-alternatives]: #rationale-and-alternatives + +- Why is this design the best in the space of possible designs? +- What other designs have been considered and what is the rationale for not choosing them? +- What is the impact of not doing this? + +# Prior art +[prior-art]: #prior-art + +Discuss prior art, both the good and the bad, in relation to this proposal. +A few examples of what this can include are: + +- Does another project have a similar feature? +- What can be learned from it? What's good? What's less optimal? +- Papers: Are there any published papers or great posts that discuss this? If you have some relevant papers to refer to, this can serve as a more detailed theoretical background. + +This section is intended to encourage you as an author to think about the lessons from other tentatives - successful or not, provide readers of your RFC with a fuller picture. + +Note that while precedent set by other projects is some motivation, it does not on its own motivate an RFC. + +# Unresolved questions +[unresolved-questions]: #unresolved-questions + +- What parts of the design do you expect to resolve through the RFC process before this gets merged? +- What parts of the design do you expect to resolve through the implementation of this feature before stabilization? +- What related issues do you consider out of scope for this RFC that could be addressed in the future independently of the solution that comes out of this RFC? + +# Future possibilities +[future-possibilities]: #future-possibilities + +Think about what the natural extension and evolution of your proposal would be and how it would affect the platform and project as a whole. Try to use this section as a tool to further consider all possible interactions with the project and its components in your proposal. Also consider how this all fits into the roadmap for the project and of the relevant sub-team. + +This is also a good place to "dump ideas", if they are out of scope for the RFC you are writing but otherwise related. + +Note that having something written down in the future-possibilities section is not a reason to accept the current or a future RFC; such notes should be in the section on motivation or rationale in this or subsequent RFCs. The section merely provides additional information. From 8bf1836ee23853c5d49afa3852141eb18e024eca Mon Sep 17 00:00:00 2001 From: Eguzki Astiz Lezaun Date: Mon, 14 Nov 2022 15:06:02 +0100 Subject: [PATCH 2/5] comments copied from source doc in hackmd --- rfcs/0000-wire-policies-with-backend-services.md | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/rfcs/0000-wire-policies-with-backend-services.md b/rfcs/0000-wire-policies-with-backend-services.md index b5a130e4..66cfdf01 100644 --- a/rfcs/0000-wire-policies-with-backend-services.md +++ b/rfcs/0000-wire-policies-with-backend-services.md @@ -137,6 +137,13 @@ Note that while precedent set by other projects is some motivation, it does not # Unresolved questions [unresolved-questions]: #unresolved-questions +- `Deploy the RateLimitPolicy controller passing as env var the address of the limitador instance in the K namespace.` + * "This is meant for the WASM plugin and the istio `envoy_filter` I reckon, no?" +- `Deploy the AuthPolicy controller` + * "Same for this one ^^" +- `The traffic routed by HTTPRoute 2 through the gateway B will be protected by RLP 2 and KAP 2, using Limitador and Authorino instances located at the namespace K1` + * "It's a bit tricky to think of a use case of an underlying service, rate limited by 2 instances of limitador (not sure if applies to authorino too) when it comes to the same HTTPRoute with 2 parentRefs of GW managed by different Kuadrant instances" + - What parts of the design do you expect to resolve through the RFC process before this gets merged? - What parts of the design do you expect to resolve through the implementation of this feature before stabilization? - What related issues do you consider out of scope for this RFC that could be addressed in the future independently of the solution that comes out of this RFC? From b558f092f7af86cf13b2dfbaba8816ce3ea8969e Mon Sep 17 00:00:00 2001 From: Eguzki Astiz Lezaun Date: Mon, 14 Nov 2022 16:40:32 +0100 Subject: [PATCH 3/5] the kuadrant CRD proposal --- ...000-wire-policies-with-backend-services.md | 63 +++++++++++++------ 1 file changed, 43 insertions(+), 20 deletions(-) diff --git a/rfcs/0000-wire-policies-with-backend-services.md b/rfcs/0000-wire-policies-with-backend-services.md index 66cfdf01..dbe1bded 100644 --- a/rfcs/0000-wire-policies-with-backend-services.md +++ b/rfcs/0000-wire-policies-with-backend-services.md @@ -18,30 +18,30 @@ Kuadrant suffered an unwanted side effect: Kuadrant's policies only worked when This issue comes from the fact that the policy controllers are no longer deployed as components of a particular Kuadrant instance. Instead, the policy controllers live at the Kuadrant's operator pod and they are up&running even if there is no Kuadrant instance running in the cluster. The very source issue of this "side effect" is the design about how backend services were wired with the Ratelimit/Auth policies. -The design allowed one kuadrant instance to be installed in any namespace, however, the design only allowed one kuadrant instance to be running in the cluster. +The design allowed one kuadrant instance to be installed in any namespace, however, the design only allowed one kuadrant instance to be running in the cluster. ### How it worked before the merge #48 When an instance of Kuadrant, represented by a Kuadrant custom Resource (RC), was created, the following workflow was run by the Kuadrant operator: -* Read the Kuadrant custom resource, paying attention to the namespace. Let's call `K` the namespace where the Kuadrant CR is created. +* Read the Kuadrant custom resource, paying attention to the namespace. Let's call `K` the namespace where the Kuadrant CR is created. * Deploy one Limitador instance in the `K` namespace * Deploy one Authorino instance in the `K` namespace * Register Authorino instance living in `K` in the Istio system as an external authorization service. -* Deploy the RateLimitPolicy controller passing as env var the address of the limitador instance in the `K` namespace. +* Deploy the RateLimitPolicy controller passing as env var the address of the limitador instance in the `K` namespace. * Deploy the AuthPolicy controller -When the user created a rate limit policy, the controller already knew about the Limitador's location (name and namespace) to configure it accordingly with the spec of the policy. +When the user created a rate limit policy, the controller already knew about the Limitador's location (name and namespace) to configure it accordingly with the spec of the policy. Authorino is a k8s controller and the Kuadrant's operator deploys it in cluster-wide mode without any [sharding](https://github.com/Kuadrant/authorino/blob/main/docs/architecture.md#sharding) defined. -When the user created an auth policy, the controller does not need to know where authorino lives because a) it assumes that there is only one Authorino instance (which might be wrong as well) and b) the controller assumes that Authorino is watching the entire cluster without filtering. +When the user created an auth policy, the controller does not need to know where authorino lives because a) it assumes that there is only one Authorino instance (which might be wrong as well) and b) the controller assumes that Authorino is watching the entire cluster without filtering. Thus, the controller manages an AuthConfig object in the hard-coded `kuadrant-system` namespace (which btw it is also wrong). ### How it works after the merge #48 When the policy controllers were moved to the operator's pod, one of the design's requirements was unmet: the controllers know at deploy time Limitador's location. Thus, causing the issue. -The design assumed one policy controller instance per limitador instance. The policy controller got limitador's location at boot time via an environment variable. -After the policies merge into the operator's pod, the policies controllers will be a singleton instance (one pod, one container) at the cluster level. +The design assumed one policy controller instance per limitador instance. The policy controller got limitador's location at boot time via an environment variable. +After the policies merge into the operator's pod, the policies controllers will be a singleton instance (one pod, one container) at the cluster level. Regardless of the kuadrant's support for multiple or single limitador/authorino instances, the policies controllers will be running in a single pod in the entire cluster. -Even if kuadrant only supports a single limitador/authorino instance, +Even if kuadrant only supports a single limitador/authorino instance, the policies controllers still need to know the location of the limitador/authorino instances. Therefore, a new design is needed that wires the user created policies with an existing limitador/authorino instance. Even though, currently, this wiring up works for the AuthPolicy, it is done under the assumption of a single Authorino watching all the cluster for the AuthConfig objects, which is a sub-optimal design. @@ -75,26 +75,49 @@ A kuadrant instance includes: * One Limitador deployment instance * One Authorino deployment instance -* A list of dedicated gateways. Those gateways cannot be shared between multiple kuadrant instances. +* A list of dedicated gateways. Those gateways cannot be shared between multiple kuadrant instances. ![](https://i.imgur.com/QdeCYs6.png) Highlights: -* The Kuadrant instance is not enclosed by k8s namespaces. +* The Kuadrant instance is not enclosed by k8s namespaces. * One gateway can belong to (be managed by) one and only one kuadrant instance. * One kuadrant instance does not own rate limit policies or auth policies. -* The traffic routed by HTTPRoute 1 through the gateway A and gateway B will be protected by RLP 1 and KAP 1, using Limitador and Authorino instances located at the namespace K1. +* The traffic routed by HTTPRoute 1 through the gateway A and gateway B will be protected by RLP 1 and KAP 1, using Limitador and Authorino instances located at the namespace K1. * The traffic routed by HTTPRoute 2 through the gateway B will be protected by RLP 2 and KAP 2, using Limitador and Authorino instances located at the namespace K1 * The traffic routed by HTTPRoute 2 through the gateway C will be protected by RLP 2 and KAP 2, using Limitador and Authorino instances located at the namespace K2 -* The traffic - -Explain the proposal as if it was implemented and you were teaching it to Kuadrant user. That generally means: - -- Introducing new named concepts. -- Explaining the feature largely in terms of examples. -- Explaining how a user should *think* about the feature, and how it would impact the way they already use Kuadrant. It should explain the impact as concretely as possible. -- If applicable, provide sample error messages, deprecation warnings, or migration guidance. -- If applicable, describe the differences between teaching this to existing and new Kuadrant users. +* The HTTPRoute 2 example shows that when the traffic for the same service is routed through multiple gateways, at least for rate limiting, Kuadrant cannot keep consistent counters. The user would expect X rps and actually it would be X rps per gateway. + +### The Kuadrant CRD + +Currently, the Kuadrant CRD has an empty spec. + +```yaml +apiVersion: kuadrant.io/v1beta1 +kind: Kuadrant +metadata: + name: kuadrant-sample +spec: {} +``` + +According to the definition above of a kuadrant instance, a Kuadrant instance, the proposed new Kuadrant CRD would add a label __selector__ to specify which gateways that instance would manage. Additionally, for dev testing purposes, the Kuadrant CRD would have image fields for the kuadrant controller, Limitador and Authorino. A Kuadrant CR example + +```yaml +apiVersion: kuadrant.io/v1beta1 +kind: Kuadrant +metadata: + name: kuadrant-sample +spec: + controlPlane: + image: quay.io/kuadrant/kuadrant-operator:mytag + limitador: + image: quay.io/kuadrant/limitador:mytag + authorino: + image: quay.io/kuadrant/authorino:mytag + gatewaysSelector: + matchLabels: + app: kuadrant +``` # Reference-level explanation [reference-level-explanation]: #reference-level-explanation From 9545f760b66ff9aaddeeaab126d5276a086e228c Mon Sep 17 00:00:00 2001 From: Eguzki Astiz Lezaun Date: Mon, 14 Nov 2022 17:22:24 +0100 Subject: [PATCH 4/5] updated kuadrant diagram --- .../0000-wire-policies-with-backend-services.md | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/rfcs/0000-wire-policies-with-backend-services.md b/rfcs/0000-wire-policies-with-backend-services.md index dbe1bded..2b901e6a 100644 --- a/rfcs/0000-wire-policies-with-backend-services.md +++ b/rfcs/0000-wire-policies-with-backend-services.md @@ -77,16 +77,21 @@ A kuadrant instance includes: * One Authorino deployment instance * A list of dedicated gateways. Those gateways cannot be shared between multiple kuadrant instances. -![](https://i.imgur.com/QdeCYs6.png) +A diagram to ilustrate some concepts: + +![](https://i.imgur.com/y7gQfRa.png) Highlights: * The Kuadrant instance is not enclosed by k8s namespaces. * One gateway can belong to (be managed by) one and only one kuadrant instance. * One kuadrant instance does not own rate limit policies or auth policies. -* The traffic routed by HTTPRoute 1 through the gateway A and gateway B will be protected by RLP 1 and KAP 1, using Limitador and Authorino instances located at the namespace K1. -* The traffic routed by HTTPRoute 2 through the gateway B will be protected by RLP 2 and KAP 2, using Limitador and Authorino instances located at the namespace K1 -* The traffic routed by HTTPRoute 2 through the gateway C will be protected by RLP 2 and KAP 2, using Limitador and Authorino instances located at the namespace K2 +* Each kuadrant instance owns one instance (possibly multiple replicas, though) of Limitador and one instance of Authorino. Those instances are shared among all gateways managed by the kuadrant instance. +* The traffic routed by HTTPRoute 1 through the gateway A will be protected by RLP 1 and KAP 1, using Limitador and Authorino instances located at the namespace K1. +* The traffic routed by HTTPRoute 2 through the gateway B will be protected by RLP 2 and KAP 2, using Limitador and Authorino instances located at the namespace K1. +* The traffic routed by HTTPRoute 2 through the gateway C will be protected by RLP 2 and KAP 2, using Limitador and Authorino instances located at the namespace K2. * The HTTPRoute 2 example shows that when the traffic for the same service is routed through multiple gateways, at least for rate limiting, Kuadrant cannot keep consistent counters. The user would expect X rps and actually it would be X rps per gateway. +* The traffic matching HTTPRoute 3 will be protected by RLP 3 and KAP 3. These policies are both gateway targeting policies. Gateway targeted policies will be applied only to traffic matching at least one HTTPRoute. +* The traffic hitting the Gateway E3 will __not__ be protected by any policies even though RLP 4 and KAP 4 target the Gateway E. Gateway targeted policies will be applied only to traffic matching at least one HTTPRoute. Since no HTTPRoute is targeting Gateway E, policies have no effect. ### The Kuadrant CRD @@ -100,7 +105,9 @@ metadata: spec: {} ``` -According to the definition above of a kuadrant instance, a Kuadrant instance, the proposed new Kuadrant CRD would add a label __selector__ to specify which gateways that instance would manage. Additionally, for dev testing purposes, the Kuadrant CRD would have image fields for the kuadrant controller, Limitador and Authorino. A Kuadrant CR example +According to the definition above of the kuadrant instance, +the proposed new Kuadrant CRD would add a label __selector__ to specify which gateways that instance would manage. +Additionally, for dev testing purposes, the Kuadrant CRD would have image URL fields for the kuadrant controller, Limitador and Authorino components. A Kuadrant CR example: ```yaml apiVersion: kuadrant.io/v1beta1 From 16356b46a24412b0cfb5531b9905083598057aaa Mon Sep 17 00:00:00 2001 From: Eguzki Astiz Lezaun Date: Fri, 20 Jan 2023 11:51:02 +0100 Subject: [PATCH 5/5] focus proposal in multiple kuadrant instances --- rfcs/0000-multiple-kuadrant-instances.md | 152 ++++++++++++++ ...000-wire-policies-with-backend-services.md | 188 ------------------ 2 files changed, 152 insertions(+), 188 deletions(-) create mode 100644 rfcs/0000-multiple-kuadrant-instances.md delete mode 100644 rfcs/0000-wire-policies-with-backend-services.md diff --git a/rfcs/0000-multiple-kuadrant-instances.md b/rfcs/0000-multiple-kuadrant-instances.md new file mode 100644 index 00000000..ed38fb58 --- /dev/null +++ b/rfcs/0000-multiple-kuadrant-instances.md @@ -0,0 +1,152 @@ +# RFC 0000 + +- Feature Name: `multiple kuadrant instances` +- Start Date: 2023-01-12 +- RFC PR: [Kuadrant/architecture#0000](https://github.com/Kuadrant/architecture/pull/0000) +- Issue tracking: [Kuadrant/architecture#0000](https://github.com/Kuadrant/architecture/issues/0000) + +# Summary +[summary]: #summary + +This RFC proposes a new kuadrant architecture design to enable **multiple kuadrant instances** to be running in a single cluster. + +![](https://i.imgur.com/ZsPibfO.png) + +# Motivation +[motivation]: #motivation + +The main benefit of multiple Kuadrant instances in a single cluster is that it allows dedicated Kuadrant's services for tenants. + +Dedicated Kuadrant deployment brings lots of benefits. Just to name a few: +* Protection against external traffic load spikes. Other tenant's traffic spikes does not affect Authorino/Limitador throughput and delay as it would when shared. +* No need to have cluster administrator role to deploy a kuadrant instance. One tenant administrator can manage gateways, Limitador and Authorino instances (including deployment modes). +* The cluster administrator gets control and visibility across all the Kuadrant instances, while the tenant administrator only gets control over their specific gateway(s), Limitador and Authorino instances. +* (looking for ideas for more benefits)... + +# Guide-level explanation +[guide-level-explanation]: #guide-level-explanation + +### Kuadrant instance definition +![](https://i.imgur.com/BfOXfnB.png) + +A kuadrant is composed of: +* One Limitador deployment instance +* One Authorino deployment instance +* A list of dedicated gateways. + +Some properties to highlight: + +* The policies are not included as part of the kuadrant instances. +* The Kuadrant instance is not enclosed by k8s namespaces. +* Gateways are not shared between kuadrant instances. Each gateway is managed by a single kuadrant instance. +* The control plane has cluster scope and will be shared between instances. In other words, it is only in the data plane that each Kuadrant instance has dedicated services and resources. +* Each kuadrant instance owns one instance (possibly multiple replicas, though) of Limitador and one instance of Authorino. Those instances are shared among all gateways included in the kuadrant instance. + +In the following diagram policies RLP 1 and KAP 1 are applied in the instance *A* and the policies RLP 2 and KAP 2 are applied in the instance *B*. + +![](https://i.imgur.com/yChVsT6.png) + +### All the gateways referenced by a single policy must belong to the same kuadrant instance + +The Gateway API allows, in its latest version [v1beta1](https://gateway-api.sigs.k8s.io/v1alpha2/references/spec/#gateway.networking.k8s.io/v1beta1.CommonRouteSpec), an HTTPRoute to have multiple gateway parents. Thus, a kuadrant policy might technically target multiple gateways managed by multiple kuadrant instances. Kuadrant does **not** support this use case. + +![](https://i.imgur.com/ZpsBf4i.png) + +The main reason is related to the rate limiting capability. The limits specified in the RateLimit Policy would be enforced per kuadrant instance basis (provided by Limitador instance). Thus, traffic hitting one gateway would see different rate limiting counters than traffic hitting the other gateway. The user would expect X rps and actually it would be X rps per gateway. For consistency reasons, when this configuration happens, the control plane will reject the policy. + +### The Kuadrant CRD + +Currently, the Kuadrant CRD has an empty spec. + +```yaml +apiVersion: kuadrant.io/v1beta1 +kind: Kuadrant +metadata: + name: kuadrant-sample +spec: {} +``` + +According to the definition above of the kuadrant instance, +the proposed new Kuadrant CRD would add a label __selector__ to specify which gateways that instance would manage. + +```yaml +apiVersion: kuadrant.io/v1beta1 +kind: Kuadrant +metadata: + name: kuadrant-a +spec: + gatewaysSelector: + matchLabels: + app: kuadrant-a +``` + +# Reference-level explanation +[reference-level-explanation]: #reference-level-explanation + +### Wiring Kuadrant policies with Kuadrant instances +Technically, the Kuadrant policies do not belong to any Kuadrant instance. At any moment of time, one policy can switch the targeted network resource specified in the `spec` from one gateway to another. Directly or indirectly via the HTTPRoute. The target references are dynamic by nature, so is the list of gateways to which kuadrant policies should apply. +Thus, the Kuadrant's control plane needs a procedure to associate a policy with **one** kuadrant instance at any time. When the control plane knows which kuadrant instance is affected, the policy rules can be used to configure the Limitador and Authorino instances belonging to that kuadrant's instance. Since the associated kuadrant instance of a policy is dynamic by nature, this procedure must be executed on every event related to the policy. + +When the policy's `targetRef` targets a Gateway, there is a direct reference to the gateway. + +When the policy's `targetRef` targets an HTTPRoute, Kuadrant will follow the [`parentRef`](https://gateway-api.sigs.k8s.io/v1alpha2/references/spec/#gateway.networking.k8s.io%2fv1beta1.CommonRouteSpec) attribute which should be a direct reference to the gateway or gateways. + +Given a gateway, Kuadrant needs to find out which Kuadrant's instance is managing that specific gateway. By design, Kuadrant knows it is only one. There are at least two options to implement that mapping: +* Read all Kuadrant CR objects and the first one that matches label selector. + * This approach works as long as the control plane ensures that each gateway is matched by only one kuadrant gateway selector. The control plane must reject any new kuadrant instance matching a gateway already "taken" by other kuadrant instance. +* Add annotation in the gateway with a value of the Name/Namespace of the Kuadrant CR. + * This approach is commonly used. Requires annotation management. + + +### Just one external auth stage and one rate limiting stage +Kuadrant configures the gateway with a single external authorization stage (backed by Authorino) and a single external rate limiting stage (backed by Limitador). +Multiple rate limit or authN/AuthZ stages involving multiple instances of Authorino and Limitador can be implemented, technically speaking. +Until there is a real use case and it is strictly necessary, this scenario is discarded. The main reason is about complexity. It is already complex enough to reason about rate limiting and auth services having a single stage. Adding multiple rate limiting stages, or hitting multiple Limitador instances in a single stage (doable with the WASM module) makes it too complex to reason about observed behavior. Currently there is no use case to require that complex scenario. + +# Drawbacks +[drawbacks]: #drawbacks + +Multitenancy is not a requested capability from users. Usually ingress gateways are shared resources managed by cluster administrators and a cluster may have only few of them. It is also a cluster admin task to route traffic to the ingress gateway. Cluster users usually do not control the life cycle of the ingress gateways in order to have their own Kuadrant instance. + +# Rationale and alternatives +[rationale-and-alternatives]: #rationale-and-alternatives + +- Why is this design the best in the space of possible designs? + +The gateway is the top class entity in the design, not the policy. The API protection happens at the gateway and the configuration needs to be done at the gateway. This kuadrant instance design protects the gateway isolating them from other instances (mis)configurations or traffic spikes. + +- What other designs have been considered and what is the rationale for not choosing them? + +TODO + +- What is the impact of not doing this? + +This design is a step forward in a consistent API to protect other service APIs. It makes easier to protect any API, no matter traffic nature. Either north-south or east-west. It makes easier to have the scenario where cluster users deploy their own (not ingress) gateways and enable API protection declaratively. + +# Prior art +[prior-art]: #prior-art + +TODO + +# Unresolved questions +[unresolved-questions]: #unresolved-questions + +- What parts of the design do you expect to resolve through the RFC process before this gets merged? + +Validate the main points of the design: +a) Single auth/ratelimit stage in the processing pipeline of the gateway + +b) Gateways are not shared among kuadrant instances + +- What parts of the design do you expect to resolve through the implementation of this feature before stabilization? + +The wiring mechanism. + +- What related issues do you consider out of scope for this RFC that could be addressed in the future independently of the solution that comes out of this RFC? + +Supporting multiple gateway providers #7 + +# Future possibilities +[future-possibilities]: #future-possibilities + +TODO diff --git a/rfcs/0000-wire-policies-with-backend-services.md b/rfcs/0000-wire-policies-with-backend-services.md deleted file mode 100644 index 2b901e6a..00000000 --- a/rfcs/0000-wire-policies-with-backend-services.md +++ /dev/null @@ -1,188 +0,0 @@ -# RFC 0000 - -- Feature Name: `wire_policies_with_backend` -- Start Date: 2022-11-04 -- RFC PR: [Kuadrant/architecture#0000](https://github.com/Kuadrant/architecture/pull/0000) -- Issue tracking: [Kuadrant/architecture#0000](https://github.com/Kuadrant/architecture/issues/0000) - -# Summary -[summary]: #summary - -This RFC proposes a mechanism to wire Kuadrant policies with rate limiting and authN/authZ services. - -# Motivation -[motivation]: #motivation - -After the PR [#48](https://github.com/Kuadrant/kuadrant-operator/pull/48) was merged, -Kuadrant suffered an unwanted side effect: Kuadrant's policies only worked when kuadrant was installed in the `kuadrant-system` namespace. -This issue comes from the fact that the policy controllers are no longer deployed as components of a particular Kuadrant instance. -Instead, the policy controllers live at the Kuadrant's operator pod and they are up&running even if there is no Kuadrant instance running in the cluster. -The very source issue of this "side effect" is the design about how backend services were wired with the Ratelimit/Auth policies. -The design allowed one kuadrant instance to be installed in any namespace, however, the design only allowed one kuadrant instance to be running in the cluster. - -### How it worked before the merge #48 -When an instance of Kuadrant, represented by a Kuadrant custom Resource (RC), was created, the following workflow was run by the Kuadrant operator: -* Read the Kuadrant custom resource, paying attention to the namespace. Let's call `K` the namespace where the Kuadrant CR is created. -* Deploy one Limitador instance in the `K` namespace -* Deploy one Authorino instance in the `K` namespace -* Register Authorino instance living in `K` in the Istio system as an external authorization service. -* Deploy the RateLimitPolicy controller passing as env var the address of the limitador instance in the `K` namespace. -* Deploy the AuthPolicy controller - -When the user created a rate limit policy, the controller already knew about the Limitador's location (name and namespace) to configure it accordingly with the spec of the policy. - -Authorino is a k8s controller and the Kuadrant's operator deploys it in cluster-wide mode -without any [sharding](https://github.com/Kuadrant/authorino/blob/main/docs/architecture.md#sharding) defined. -When the user created an auth policy, the controller does not need to know where authorino lives because a) it assumes that there is only one Authorino instance (which might be wrong as well) and b) the controller assumes that Authorino is watching the entire cluster without filtering. -Thus, the controller manages an AuthConfig object in the hard-coded `kuadrant-system` namespace (which btw it is also wrong). - -### How it works after the merge #48 -When the policy controllers were moved to the operator's pod, one of the design's requirements was unmet: the controllers know at deploy time Limitador's location. Thus, causing the issue. -The design assumed one policy controller instance per limitador instance. The policy controller got limitador's location at boot time via an environment variable. -After the policies merge into the operator's pod, the policies controllers will be a singleton instance (one pod, one container) at the cluster level. -Regardless of the kuadrant's support for multiple or single limitador/authorino instances, the policies controllers will be running in a single pod in the entire cluster. -Even if kuadrant only supports a single limitador/authorino instance, -the policies controllers still need to know the location of the limitador/authorino instances. - -Therefore, a new design is needed that wires the user created policies with an existing limitador/authorino instance. Even though, currently, this wiring up works for the AuthPolicy, it is done under the assumption of a single Authorino watching all the cluster for the AuthConfig objects, which is a sub-optimal design. - -### Potential Scenarios - -* **Kuadrant supports only one instance deployed in a hard-coded namespace** - -No need to wire, as authorino and limitador location is well known. All policies will be linked to the same instance of Limitador/Authorino. - -* **Kuadrant supports only one instance deployed in a configurable namespace** - -Wiring is needed. The policy controllers do not know where Limitador/Authorino live. All policies will be linked to the same instance of Limitador/Authorino. - -* **Kuadrant supports multiple instances in a cluster** - -Wiring is needed. There should be a way to know the Limitador/Authorino instance only by reading the policy. - -# Guide-level explanation -[guide-level-explanation]: #guide-level-explanation - -The proposal is to make Kuadrant to support multiple instances in a cluster. - -This proposal is based on a simple design decision: One gateway runs one and only one kuadrant's external auth service (Authorino) and only one kuadrant's external rate limit service (Limitador). - -Multiple rate limit or auth stages involving multiple instances of Authorino and Limitador can be implemented. -Until there is a real use case and it is strictly necessary, this scenario is discarded for implementing kuadrant's protection. -The main reason is about complexity. It is already complex enough to reason about rate limiting and auth services having a single stage. Adding multiple rate limiting stages, or hitting multiple Limitador instances in a single stage (doable with the WASM module) makes it too complex to reason about observed behavior. Currently there is no use case to require that complex scenario. - -A kuadrant instance includes: - -* One Limitador deployment instance -* One Authorino deployment instance -* A list of dedicated gateways. Those gateways cannot be shared between multiple kuadrant instances. - -A diagram to ilustrate some concepts: - -![](https://i.imgur.com/y7gQfRa.png) - -Highlights: -* The Kuadrant instance is not enclosed by k8s namespaces. -* One gateway can belong to (be managed by) one and only one kuadrant instance. -* One kuadrant instance does not own rate limit policies or auth policies. -* Each kuadrant instance owns one instance (possibly multiple replicas, though) of Limitador and one instance of Authorino. Those instances are shared among all gateways managed by the kuadrant instance. -* The traffic routed by HTTPRoute 1 through the gateway A will be protected by RLP 1 and KAP 1, using Limitador and Authorino instances located at the namespace K1. -* The traffic routed by HTTPRoute 2 through the gateway B will be protected by RLP 2 and KAP 2, using Limitador and Authorino instances located at the namespace K1. -* The traffic routed by HTTPRoute 2 through the gateway C will be protected by RLP 2 and KAP 2, using Limitador and Authorino instances located at the namespace K2. -* The HTTPRoute 2 example shows that when the traffic for the same service is routed through multiple gateways, at least for rate limiting, Kuadrant cannot keep consistent counters. The user would expect X rps and actually it would be X rps per gateway. -* The traffic matching HTTPRoute 3 will be protected by RLP 3 and KAP 3. These policies are both gateway targeting policies. Gateway targeted policies will be applied only to traffic matching at least one HTTPRoute. -* The traffic hitting the Gateway E3 will __not__ be protected by any policies even though RLP 4 and KAP 4 target the Gateway E. Gateway targeted policies will be applied only to traffic matching at least one HTTPRoute. Since no HTTPRoute is targeting Gateway E, policies have no effect. - -### The Kuadrant CRD - -Currently, the Kuadrant CRD has an empty spec. - -```yaml -apiVersion: kuadrant.io/v1beta1 -kind: Kuadrant -metadata: - name: kuadrant-sample -spec: {} -``` - -According to the definition above of the kuadrant instance, -the proposed new Kuadrant CRD would add a label __selector__ to specify which gateways that instance would manage. -Additionally, for dev testing purposes, the Kuadrant CRD would have image URL fields for the kuadrant controller, Limitador and Authorino components. A Kuadrant CR example: - -```yaml -apiVersion: kuadrant.io/v1beta1 -kind: Kuadrant -metadata: - name: kuadrant-sample -spec: - controlPlane: - image: quay.io/kuadrant/kuadrant-operator:mytag - limitador: - image: quay.io/kuadrant/limitador:mytag - authorino: - image: quay.io/kuadrant/authorino:mytag - gatewaysSelector: - matchLabels: - app: kuadrant -``` - -# Reference-level explanation -[reference-level-explanation]: #reference-level-explanation - -This is the technical portion of the RFC. Explain the design in sufficient detail that: - -- Its interaction with other features is clear. -- It is reasonably clear how the feature would be implemented. -- How error would be reported to the users. -- Corner cases are dissected by example. - -The section should return to the examples given in the previous section, and explain more fully how the detailed proposal makes those examples work. - -# Drawbacks -[drawbacks]: #drawbacks - -Why should we *not* do this? - -# Rationale and alternatives -[rationale-and-alternatives]: #rationale-and-alternatives - -- Why is this design the best in the space of possible designs? -- What other designs have been considered and what is the rationale for not choosing them? -- What is the impact of not doing this? - -# Prior art -[prior-art]: #prior-art - -Discuss prior art, both the good and the bad, in relation to this proposal. -A few examples of what this can include are: - -- Does another project have a similar feature? -- What can be learned from it? What's good? What's less optimal? -- Papers: Are there any published papers or great posts that discuss this? If you have some relevant papers to refer to, this can serve as a more detailed theoretical background. - -This section is intended to encourage you as an author to think about the lessons from other tentatives - successful or not, provide readers of your RFC with a fuller picture. - -Note that while precedent set by other projects is some motivation, it does not on its own motivate an RFC. - -# Unresolved questions -[unresolved-questions]: #unresolved-questions - -- `Deploy the RateLimitPolicy controller passing as env var the address of the limitador instance in the K namespace.` - * "This is meant for the WASM plugin and the istio `envoy_filter` I reckon, no?" -- `Deploy the AuthPolicy controller` - * "Same for this one ^^" -- `The traffic routed by HTTPRoute 2 through the gateway B will be protected by RLP 2 and KAP 2, using Limitador and Authorino instances located at the namespace K1` - * "It's a bit tricky to think of a use case of an underlying service, rate limited by 2 instances of limitador (not sure if applies to authorino too) when it comes to the same HTTPRoute with 2 parentRefs of GW managed by different Kuadrant instances" - -- What parts of the design do you expect to resolve through the RFC process before this gets merged? -- What parts of the design do you expect to resolve through the implementation of this feature before stabilization? -- What related issues do you consider out of scope for this RFC that could be addressed in the future independently of the solution that comes out of this RFC? - -# Future possibilities -[future-possibilities]: #future-possibilities - -Think about what the natural extension and evolution of your proposal would be and how it would affect the platform and project as a whole. Try to use this section as a tool to further consider all possible interactions with the project and its components in your proposal. Also consider how this all fits into the roadmap for the project and of the relevant sub-team. - -This is also a good place to "dump ideas", if they are out of scope for the RFC you are writing but otherwise related. - -Note that having something written down in the future-possibilities section is not a reason to accept the current or a future RFC; such notes should be in the section on motivation or rationale in this or subsequent RFCs. The section merely provides additional information.