Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add activation-id label for kubernetes resources #641

Closed
wants to merge 4 commits into from

Conversation

kurokobo
Copy link
Contributor

@kurokobo kurokobo commented Feb 1, 2024

Closes #616

In the current implementation, the name and label for Job/Pod/Service changes with each restart of the Rulebook Activation. This is a behavior that compromises stability when exposing the Service to the outside of the cluster.

This PR adds the Activation ID as an immutable label that does not change upon restart. This allows users to create arbitrarily named Service with this label as the selector, and arbitrarily named Ingress that use it.

There are no side effects because this do not remove nor replace existing labels, but simply add to them.

Changes:

  • Add new label activation-id to Job, Pod, Service, Secret that created by enabling Rulebook Activation

Tests:

$ task test -- tests/integration/services/activation/engine/test_kubernetes.py 
task: [test] poetry run python -m pytest tests/integration/services/activation/engine/test_kubernetes.py
====================================== test session starts =======================================
platform linux -- Python 3.9.7, pytest-7.3.1, pluggy-1.0.0
django: settings: aap_eda.settings.development (from env)
rootdir: /home/kuro/work/kurokobo/eda-server
configfile: pytest.ini
plugins: django-4.5.2, asyncio-0.21.0, cov-4.1.0, lazy-fixture-0.6.3, redis-3.0.2
asyncio: mode=auto
collected 27 items                                                                               

tests/integration/services/activation/engine/test_kubernetes.py .......................... [ 96%]
.                                                                                          [100%]

======================================= 27 passed in 2.29s =======================================

By enabling Rulebook Activation (ID: 5) that listens inbound connection on 5000, following resources are created:

$ kubectl -n eda get job,pod,svc,secret
NAME                            COMPLETIONS   DURATION   AGE
job.batch/activation-job-5-39   0/1           96s        96s

NAME                                                          READY   STATUS    RESTARTS       AGE
...
pod/activation-job-5-39-5fktt                                 1/1     Running   0              96s

NAME                                                             TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
...
service/activation-job-5-39-5000                                 ClusterIP   10.43.34.31     <none>        5000/TCP   94s

NAME                                     TYPE                             DATA   AGE
...
secret/activation-secret-5               kubernetes.io/dockerconfigjson   1      96s


$ kubectl -n eda get job.batch/activation-job-5-39 pod/activation-job-5-39-5fktt service/activation-job-5-39-5000 secret/activation-secret-5 -o yaml
...
- apiVersion: batch/v1
  kind: Job
  metadata:
    ...
    labels:
      activation-id: "5" ✅
      app: eda
      job-name: activation-job-5-39
    name: activation-job-5-39
    namespace: eda
    ...
- apiVersion: v1
  kind: Pod
  metadata:
    ...
    labels:
      activation-id: "5" ✅
      app: eda
      batch.kubernetes.io/controller-uid: 2aaa6f9f-95b8-4aca-a782-cc7bdf33c86c
      batch.kubernetes.io/job-name: activation-job-5-39
      controller-uid: 2aaa6f9f-95b8-4aca-a782-cc7bdf33c86c
      job-name: activation-job-5-39
    name: activation-job-5-39-5fktt
    namespace: eda
    ...
- apiVersion: v1
  kind: Service
  metadata:
    ...
    labels:
      activation-id: "5" ✅
      app: eda
      job-name: activation-job-5-39
    name: activation-job-5-39-5000
    namespace: eda
    ...
- apiVersion: v1
  ...
  kind: Secret
  metadata:
    ...
    labels:
      aap: eda
      activation-id: "5" ✅
    name: activation-secret-5
    namespace: eda
    ...

By creating the following Service and Inrgess that refer to this new label, we will not need to modify them after the Rulebook Activation is restarted.

---
apiVersion: v1
kind: Service
metadata:
  namespace: eda
  name: webhook-demo-svc ✅
spec:
  ports:
    - port: 5000
      protocol: TCP
      targetPort: 5000
  selector: 
    activation-id: "5" 
  type: ClusterIP

---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  namespace: eda
  name: webhook-demo-ingress
spec:
  rules:
    - host: webhook-demo.example.com
      http:
        paths:
          - path: /
            pathType: ImplementationSpecific
            backend:
              service:
                name: webhook-demo-svc ✅
                port:
                  number: 5000

@kurokobo kurokobo requested a review from a team as a code owner February 1, 2024 17:02
@jon-nfc
Copy link

jon-nfc commented Feb 2, 2024

@kurokobo, your a legend this PR will enable moving forward with implementing!!!

@kurokobo
Copy link
Contributor Author

kurokobo commented Feb 5, 2024

rebased

@jon-nfc
Copy link

jon-nfc commented Feb 8, 2024

@Alex-Izquierdo any chance of having this reviewed and merged?

@Alex-Izquierdo
Copy link
Collaborator

Hi. @kurokobo Thank you very much for your contribution.

I am not against it, but I need to discuss it with the team first and how we integrate this with our current plans. We are really aware of this problem. This change would also have a very short life as it would be changed soon due to the new features we are working on, the container engine needs to be agnostic of the rulebook type (there will be a new job type in addition to activation).

@jon-nfc
Copy link

jon-nfc commented Feb 9, 2024

Hi. @kurokobo Thank you very much for your contribution.

I am not against it, but I need to discuss it with the team first and how we integrate this with our current plans. We are really aware of this problem. This change would also have a very short life as it would be changed soon due to the new features we are working on, the container engine needs to be agnostic of the rulebook type (there will be a new job type in addition to activation).

@Alex-Izquierdo, thanks for your time for progressing this PR. Same same for @kurokobo for your work too. May you legends have a cold pillow when you try to sleep tonight!!!

Although the changes in this PR would make EDA a viable solution within the kubernetes environment, are the new features "coming soon (timeframe?)" going to take into account that a service name for instance must remain the same for the life-cycle of the pod? this includes having permanent labels that don't change? as without connectivity will be lost as it is now.

@Alex-Izquierdo
Copy link
Collaborator

Alex-Izquierdo commented Feb 9, 2024

Hi @jon-nfc thanks for your nice words
I can not promise a timeframe; it is a work in progress. We hope to have it completed in the next few weeks. As I mentioned, we are fully aware of this issue, and it is our intention to resolve it. We are evaluating options, one of which is to implement a new field that allows users to set a custom name for the resources. This approach of adding a new label is still valid even without that new field, although there will be more types in addition to activation and the current logic needs to be modified.

I understand that the current behavior is problematic, but there is no other option than to wait a while for a final solution that fits in with the upcoming features.

@mkanoor @hsong-rh I think adding a new label like "<parent_type>: <parent_id>" is something feasible, thoughts?

@kurokobo
Copy link
Contributor Author

kurokobo commented Feb 9, 2024

@Alex-Izquierdo
Thanks for the technical details, I eagerly await the release of the new features.

That said, other high-priority tasks (troubleshooting, incident response, etc.) often make the 'ready soon' schedule not ready at all, so if it's going to take months against your plan, please use this PR as a short-term solution.

I wish you the best of luck with the development of the new feature 😃

@Alex-Izquierdo
Copy link
Collaborator

Closed in favor of #704

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rulebook restart within kubernetes causes rulebooks to become unavailable
3 participants