Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CronJob for medusa purge not in the correct namespace #1299

Closed
smutel opened this issue Apr 26, 2024 · 7 comments · Fixed by #1300
Closed

CronJob for medusa purge not in the correct namespace #1299

smutel opened this issue Apr 26, 2024 · 7 comments · Fixed by #1300
Labels
bug Something isn't working done Issues in the state 'done'

Comments

@smutel
Copy link
Contributor

smutel commented Apr 26, 2024

What happened?

  • The k8ssandra operator is installed in the k8ssandra-operator namespace
  • My cluster and the backups are done in the cluster namespace
  • A CronJob is created automatically in the k8ssandra-operator. This cronjob will start a purge process in the operator namespace instead of in the cluster namespace
  • A medusa task is created by this cronjob in the wrong namespace

Did you expect to see something different?

  • The cronjob need to start the purge on the cluster namespace

How to reproduce it (as minimally and precisely as possible):

  • Install k8ssandra on the k8ssandra-operator namespace
  • Create a cluster in another namespace
  • The CronJob is created
  • A medusa task is created in the wrong namespace

Environment

  • K8ssandra Operator version:

    1.15.0

  • Kubernetes version information:

Client Version: v1.29.4
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.2
  • Kubernetes cluster kind:

AKS

  • K8ssandra Operator Logs:
2024-04-26T09:20:20.161Z        ERROR   Reconciler error        {"controller": "medusatask", "controllerGroup": "medusa.k8ssandra.io", "controllerKind": "MedusaTask", "MedusaTask": {"name":"purge-backups-20240426092008","namespace":"k8ssandra-operator"}, "namespace": "k8ssandra-operator", "name": "purge-backups-20240426092008", "reconcileID": "2b82f272-8107-4378-a34b-81a0465d9d4e", "error": "CassandraDatacenter.cassandra.datastax.com \"dc1\" not found"}

datacenter dc1 is in the cluster namespace not in k8ssandra-operator

@smutel smutel added the bug Something isn't working label Apr 26, 2024
@adziura-tcloud
Copy link

in addition, I would propose to make this CronJob optional, or at least configurable
this is what we are doing in our current setup

apiVersion: batch/v1
kind: CronJob
metadata:
  name: cassandra-backup-purge
  namespace: cassandra
spec:
  schedule: "30 2 * * *"
  successfulJobsHistoryLimit: 0
  failedJobsHistoryLimit: 1
  startingDeadlineSeconds: 10
  jobTemplate:
    spec:
      backoffLimit: 0
      template:
        metadata:
          labels:
            app: cassandra-backup-purge
        spec:
          restartPolicy: Never
          serviceAccountName: cassandra-backup-purge
          imagePullSecrets:
            - name: container-registries
          containers:
            - name: cassandra-backup-purge
              image: bitnami/kubectl:1.29
              imagePullPolicy: Always
              resources:
                limits:
                  ephemeral-storage: 20Mi
                  memory: 300Mi
                requests:
                  cpu: 100m
                  ephemeral-storage: 1Mi
                  memory: 100Mi
              command:
                - sh
                - -c
                - |
                   # Purge obsolete backups
                   kubectl apply -f - <<EOF
                   apiVersion: medusa.k8ssandra.io/v1alpha1
                   kind: MedusaTask
                   metadata:
                     name: purge-backups-$(date +%Y%m%d%H%M%S)
                     namespace: cassandra
                   spec:
                     cassandraDatacenter: dc1
                     operation: purge
                   EOF
                   # Purge obsolete backup jobs
                   for i in $(kubectl -n cassandra get medusabackupjobs.medusa.k8ssandra.io -o go-template --template '{{range .items}}{{.metadata.name}} {{.metadata.creationTimestamp}}{{"\n"}}{{end}}' | awk '$2 <= "'$(date -d'now-30 days' -u +"%Y-%m-%dT%H:%M:%SZ")'" { print $1 }'); do kubectl -n cassandra delete medusabackupjobs.medusa.k8ssandra.io ${i}; done
                   # Purge obsolete Medusa tasks
                   for i in $(kubectl -n cassandra get medusatasks.medusa.k8ssandra.io -o go-template --template '{{range .items}}{{.metadata.name}} {{.metadata.creationTimestamp}}{{"\n"}}{{end}}' | awk '$2 <= "'$(date -d'now-30 days' -u +"%Y-%m-%dT%H:%M:%SZ")'" { print $1 }'); do kubectl -n cassandra delete medusatasks.medusa.k8ssandra.io ${i}; done

Key points:

  • specifying CJ time
  • successfulJobsHistoryLimit: 0
  • Purge obsolete (more than 30d) backup jobs as we are using MedusaBackupSchedule for backups
  • Purge obsolete (more than 30d) Medusa tasks

@adejanovski
Copy link
Contributor

Actually the cronjob is in the right namespace, because it needs to reference the service account which exists in the operator namespace.
But the job creates a MedusaTask in the wrong namespace, which prevents it from running properly. It also doesn't take datacenter name overrides into account and won't reference the DC correctly if an override is used (.dc.datacenterName).
This CronJob was obviously a bad idea and we'll shortly create a new API to handle purge schedules.

@adejanovski adejanovski added the done Issues in the state 'done' label Jun 11, 2024
@zehweh
Copy link

zehweh commented Jun 28, 2024

Hi @adejanovski,
I'm running v1.17.0 and the MedusaTask still gets created in the operator namespace.
If I understand this correctly, this means that dcConfig.Meta.Namespace is not set?

thanks!

@adejanovski
Copy link
Contributor

Hi @zehweh, yeah sadly we still have a bug there. You can indeed specify the dc namespace explicitly as a workaround, and know that in v1.18.0 we will replace the CronJobs with the MedusaBackupSchedule now able to schedule purges as well.
Sorry for all the trouble with these CronJobs, it's been buggy all along 😕

@zehweh
Copy link

zehweh commented Jun 28, 2024

Thanks for the quick reply.
I guess I can wait for v1.18.0 :)

@zehweh
Copy link

zehweh commented Nov 29, 2024

Hi @adejanovski,
I've seen the new API additions you made but I'm not sure how to apply them.
I'm running v1.20.2 and the CronJobs are still there (and the MedusaTasks still get created in the operator namespace).

Do I have to remove

  medusa:
    purgeBackups: true

in the K8ssandraCluster manifest and manually create a MedusaBackupSchedule
like this?

apiVersion: medusa.k8ssandra.io/v1alpha1
kind: MedusaBackupSchedule
metadata:
  name: purge-schedule
  namespace: k8ssandra-operator
spec:
  backupSpec:
    cassandraDatacenter: dc1
  cronSchedule: '* * * * *'
  operationType: purge

thanks again!

@adejanovski
Copy link
Contributor

Try setting .spec.medusa.purgeBackups  to false and create a MedusaBackupSchedule indeed.
I think it's time we modify the logic behind purgeBackups to generate a MedusaBackupSchedule instead 🤔.

@rzvoncek , how do you think we should proceed here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working done Issues in the state 'done'
Projects
No open projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants