Skip to content

Commit

Permalink
Complete Policy sync RFC
Browse files Browse the repository at this point in the history
  • Loading branch information
sergioifg94 committed Nov 29, 2023
1 parent c95987a commit 5fc107c
Showing 1 changed file with 121 additions and 36 deletions.
157 changes: 121 additions & 36 deletions rfcs/0004-policy-sync-v1.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,69 +8,136 @@
# Summary
[summary]: #summary

When a gateway in the hub is targeted by a policy in the hub, enable the gateway controller to be able to sync both the gateway and policy together to spoke clusters.
The ability for the Multicluster Gateway Controller to sync policies defined in
the hub cluster downstream to the spoke clusters, therefore allowing all policies
to be defined in the same place. These policies will be reconciled by the downstream
Gateway controller.

# Nomenclature

* Policy: When refering to a Policy, this document is refering to a Gateway API
policy as defined in the Policy Attachment Model. The Multicluster Gateway Controller
relies on [OCM]() as a Multicluster solution, which defines its own unrelated
set of Policies and Policy Framework. Unless explicitely mentioned, this document
refers to Policies as Gateway API Policies.

# Motivation
[motivation]: #motivation

Currently, any policies targeting gateways in the spokes need to be defined in the spokes, and it can be cumbersome, time-consuming and error prone to require these to be duplicated across multiple spoke clusters.
Currently, Kuadrant's support for the Policy Attachment Model can be divided in
two categories:
* Policies targeting the Multicluster Gateway, defined in the hub cluster and
reconciled by the Multicluster Gateway Controller
* Policies targeting the downstream Gateway, defined in the spoke clusters and
reconciled by the downstream Gateway controllers.

Gateway-admin has a set of homogeneous clusters and needs to apply per cluster rate limits across the entire set.
In a realistic multicluster scenario where multiple spoke clusters are present, the management of these policies can become tedious and error-prone, as policies have
to be defined in the hub cluster, as well as replicated in the multiple spoke clusters.

Platform-admin with a set of clusters with rate limits applied needs to change rate limit for one particular cluster.
As Kuadrant users:
* Gateway-admin has a set of homogeneous clusters and needs to apply per cluster rate limits across the entire set.
* Platform-admin with a set of clusters with rate limits applied needs to change rate limit for one particular cluster.

# Guide-level explanation
[guide-level-explanation]: #guide-level-explanation

Make explicit that OCM policy is different to a gateway API policy, and that this work is not related at all to the OCM Policy framework.
The policy sync feature will allow a gateway-admin to configure, via GatewayClass
parameters, a set of Policy GVRs to be synced by the Multicluster Gateway Controller.

The `policiesToSync` field in the parameters defines those GVRs. For example, in
order to configure the controller to sync AuthPolicies:

```json
"policiesToSync": [
{
"group": "kuadrant.io",
"version": "v1beta1",
"resource": "authpolicies"
}
]
```

The support for resources that the controller can sync is limited by the following:
* The controller ServiceAccount must have permission to watch, list, and get the
resource to be synced
* The resource must implement the Policy schema:
* Have a `.spec.targetRef` field

When a Policy is configured to be synced in a GatewayClass, the Multicluster
Gateway Controller starts watching events on the resources, and propagates changes
by placing the policy in the spoke clusters, with the following mutations:
* The `TargetRef` of the policy is changed to reference the downstream Gateway
* The `kuadrant.io/policy-synced` annotation is set

The policy sync system will allow a gateway-admin to create or modify a gateway class at the hub level and specify a series of GVKs for that gatewayClass (for example the AuthPolicy GVK).
# Reference-level explanation
[reference-level-explanation]: #reference-level-explanation

When a gateway is created in the hub that uses this gatewayClass, any AuthPolicies that target that gateway will be watched by the MGC.
### Process overview

When these gateways are placed on a spoke, any AuthPolicies targeting that gateway will also be placed on that same spoke.
#### Dynamic Policy watches

When the AuthPolicies are placed on the relevant spokes, they will be manipulated to target the new gateway in the spoke.
The Multicluster Gateway Controller reconciles parameters referenced by the
GatewayClass of a Gateway. A new field is added to the parameters that allows
the configuration of a set of GVRs of Policies to be synced.

# Reference-level explanation
[reference-level-explanation]: #reference-level-explanation
The GatewayClass reconciler validates that:
* The GVRs reference existing resource definitions
* The GVRs reference resources that implement the Policy schema.

### Process overview
- gateway controller updated to monitor params in gateway class
- Set up dynamic watches against each listed GVK
- error when GVK not present - report in gatewayclass
- confirm GVK is a policy - report in gatewayclass if not, don't create watch
- find all matching GVKs that target reconciling gateway
- copy GVK CR into manifestwork for same clusters as gateway
- mutate GVK CR to target the spoke instance of the gateway
- mutate GVK CR to annotate that it came from the hub
- add common policy status fields to manifestwork
- handle error if status fields are not present?
- read status from manifestwork and update status in hub GVK CR
- read errors on gateway if conflicting policies are present in spoke?

### Policy Hierarchy Details

- kuadrant operator to encode policy heirarchy
- hub policy overidden by spoke policy overridden by route policy.
- annotate gateway when a hub policy is overridden in this manner.
Validation failures are reported as part of the status of the GatewayClass

The Gateway reconciler sets up dynamic watches to react to events on the configured
Policies, calling the PolicySyncer component with the updated Policy as well
as the associated Gateway.

#### PolicySyncer component

The PolicySyncer component is in charge of reconciling Policy watch events to
apply the necessary changes and place the Policies in the spoke clusters.

This component is injected in the event source and called when a change is made
to a hub Policy that has been configured to be synced.

The PolicySyncer implementation uses OCM ManifestWorks to place the policies in
the spoke clusters. Through the ManifestWorks, OCM allows to:
* Place the Policy in each spoke cluster
* Report the desired status back to the hub using JSON feedback rules

### Policy Hierarchy

In order to avoid conflict with Policies created directly in the spoke clusters,
a hierarchy must be defined to prioritise those Policies.

The controller will set the `kuadrant.io/policy-synced` annotation on the policy
when placing it in the spoke cluster.

The Kuadrant operator will be aware of the presence of this annotation, and, in case
of conflicts, override Policies that contain this annotation.

# Drawbacks
[drawbacks]: #drawbacks

cluster-admins can already create policies in spoke clusters that affect spoke level gateways, without this solution.
## Third party Policy support

In order for a Policy to be supported for syncing, the MGC must have permissions
to watch/list/get the resource, and the implementation of the downstream Gateway
controller must be aware of the `policy-synced` annotation.

# Rationale and alternatives
[rationale-and-alternatives]: #rationale-and-alternatives

## Consequences of not implementing

Gateway-admins will have no centralized system for handling spoke-level policies targeting a gateway created there from the hub.

#### We will not be using the policy framework to complete this objective:
The policy framework is a system designed to make assertions about the state of a spoke, and potentially take actions based on that state, as such it is not a suitable replacement for manifestworks in the case of syncing resources to a spoke.
#### OCMs Policy Framework will not be used to complete this objective:

OCMs Policy Framework is a system designed to make assertions about the state of a spoke, and potentially take actions based on that state, as such it is not a suitable replacement for manifestworks in the case of syncing resources to a spoke.

### Potential migration from ManifestWorks to ManifestWorkReplicaSets

### We could eventually migrate from manifestworks to manifestworkreplicasets
Manifestworkreplicasets maybe a future improvement that we could change the MGC to use but not as part of this RFC.
ManifestWorkPeplicaSets may be a future improvement that the MGC could support
to simplify the placement of related resources, but beyond the scope of this RFC.

# Prior art
[prior-art]: #prior-art
Expand All @@ -80,9 +147,27 @@ No applicable prior art.
# Unresolved questions
[unresolved-questions]: #unresolved-questions

## Status reporting

While the controller can assume common status fields among the Policies that it
syncs, there might be a scenario where certain policies use custom status fields
that are not handled by the controller. In order to support this, two alternatives
are identified:

1. Configurable rules.

An extra field is added in the GatewayClass params that configures the policies
to sync, to specify custom fields that the controller must propagate back from
the spokes to the hub.

2. Hard-coded support.

The PolicySync component can identify the Policy type and select which extra
status fields are propagated

# Future possibilities
[future-possibilities]: #future-possibilities

If the policy-framework is updated to enable syncing of resources status back to the hub, that could be a good time to refactor the MGC to use the policy framework in place of the current approach of creating manifestworks directly.
If OCMs Policy Framework is updated to enable syncing of resources status back to the hub, it could be an opportunity to refactor the MGC to use this framework in place of the current approach of creating ManifestWorks directly.

This system could mutate over time to dynamically sync more CRDs than policies to spoke clusters.

0 comments on commit 5fc107c

Please sign in to comment.