Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removing Datacenter Blocked in Failing to Alter system_auth Keyspace #1470

Open
loshsu opened this issue Dec 18, 2024 · 0 comments
Open

Removing Datacenter Blocked in Failing to Alter system_auth Keyspace #1470

loshsu opened this issue Dec 18, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@loshsu
Copy link

loshsu commented Dec 18, 2024

What happened?
The process of removing a datacenter seems to be blocked in a loop because it's not allowed to change system_auth keyspace when a datacenter still have active nodes.

Did you expect to see something different?
Prior to Cassandra 4.1, the process of removing a datacenter works fine.
But after upgrading Cassandra to 5.0.2, the process is blocked.

How to reproduce it (as minimally and precisely as possible):
Simply remove a datacenter from K8ssandraCluster resource that has two datacenter.

From this:

apiVersion: k8ssandra.io/v1alpha1
kind: K8ssandraCluster
metadata:
  name: test
  namespace: k8ssandra-operator
  annotations:
    k8ssandra.io/dc-replication: '{"dc2": {"ks1": 2, "ks2": 2}}'
spec:
  cassandra:
    serverVersion: "4.0.3"
    storageConfig:
      cassandraDataVolumeClaimSpec:
        storageClassName: standard
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 5Gi
    config:
      jvmOptions:
        heapSize: 512Mi
    datacenters:
      - metadata:
          name: dc1
        k8sContext: east
        size: 3
      - metadata:
          name: dc2
        k8sContext: west
        size: 3  
  stargate:
    size: 1
    heapSize: 512Mi
  reaper:
    autoScheduling:
      enabled: true

To this:

apiVersion: k8ssandra.io/v1alpha1
kind: K8ssandraCluster
metadata:
  name: test
  namespace: k8ssandra-operator
spec:
  cassandra:
    serverVersion: "4.0.3"
    storageConfig:
      cassandraDataVolumeClaimSpec:
        storageClassName: standard
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 5Gi
    config:
      jvmOptions:
        heapSize: 512Mi
    datacenters:
      - metadata:
          name: dc1
        k8sContext: east
        size: 3
  stargate:
    size: 1
    heapSize: 512Mi
  reaper:
    autoScheduling:
      enabled: true

Environment

  • K8ssandra Operator version:

    v1.20.2

  • Kubernetes version information:

Client Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.4", GitCommit:"95ee5ab382d64cfe6c28967f36b53970b8374491", GitTreeState:"clean", BuildDate:"2022-08-17T18:54:23Z", GoVersion:"go1.18.5", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.4
Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.9", GitCommit:"c1de2d70269039fe55efb98e737d9a29f9155246", GitTreeState:"clean", BuildDate:"2022-07-13T14:19:57Z", GoVersion:"go1.17.11", Compiler:"gc", Platform:"linux/amd64"}
  • Kubernetes cluster kind:

    kubeadmin

  • Manifests:

apiVersion: k8ssandra.io/v1alpha1
kind: K8ssandraCluster
metadata:
  name: test
  namespace: k8ssandra-operator
  annotations:
    k8ssandra.io/dc-replication: '{"dc2": {"ks1": 2, "ks2": 2}}'
spec:
  cassandra:
    serverVersion: "4.0.3"
    storageConfig:
      cassandraDataVolumeClaimSpec:
        storageClassName: standard
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 5Gi
    config:
      jvmOptions:
        heapSize: 512Mi
    datacenters:
      - metadata:
          name: dc1
        k8sContext: east
        size: 3
      - metadata:
          name: dc2
        k8sContext: west
        size: 3  
  stargate:
    size: 1
    heapSize: 512Mi
  reaper:
    autoScheduling:
      enabled: true

  • K8ssandra Operator Logs:
2024-12-18T06:49:53.307Z        ERROR   Failed to CALL alter keyspace system_auth on pod sdp-k8ssandra-dc1-default-sts-0        {"controller": "k8ssandracluster", "controllerGroup": "k8ssandra.io", "controllerKind": "K8ssandraCluster", "K8ssandraCluster": {"name":"sdp-k8ssandra","namespace":"k8ssandra-operator"}, "namespace": "k8ssandra-operator", "name": "sdp-k8ssandra", "reconcileID": "f58ca45a-c51d-4115-9a68-80dd6f681ddc", "K8ssandraCluster": {"name":"sdp-k8ssandra","namespace":"k8ssandra-operator"}, "CassandraDatacenter": {"name":"dc1","namespace":"k8ssandra-operator"}, "K8SContext": "", "error": "incorrect status code of 500 when calling endpoint"}
github.com/k8ssandra/k8ssandra-operator/pkg/cassandra.(*defaultManagementApiFacade).AlterKeyspace
        /workspace/pkg/cassandra/management.go:222
github.com/k8ssandra/k8ssandra-operator/pkg/cassandra.(*defaultManagementApiFacade).EnsureKeyspaceReplication
        /workspace/pkg/cassandra/management.go:310
github.com/k8ssandra/k8ssandra-operator/controllers/k8ssandra.(*K8ssandraClusterReconciler).updateReplicationOfSystemKeyspaces
        /workspace/controllers/k8ssandra/schemas.go:165
github.com/k8ssandra/k8ssandra-operator/controllers/k8ssandra.(*K8ssandraClusterReconciler).checkSchemas
        /workspace/controllers/k8ssandra/schemas.go:43
github.com/k8ssandra/k8ssandra-operator/controllers/k8ssandra.(*K8ssandraClusterReconciler).reconcileDatacenters
        /workspace/controllers/k8ssandra/datacenters.go:224
github.com/k8ssandra/k8ssandra-operator/controllers/k8ssandra.(*K8ssandraClusterReconciler).reconcile
        /workspace/controllers/k8ssandra/k8ssandracluster_controller.go:152
github.com/k8ssandra/k8ssandra-operator/controllers/k8ssandra.(*K8ssandraClusterReconciler).Reconcile
        /workspace/controllers/k8ssandra/k8ssandracluster_controller.go:96
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:119
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:316
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:227
2024-12-18T06:49:53.307Z        ERROR   Failed to update replication    {"controller": "k8ssandracluster", "controllerGroup": "k8ssandra.io", "controllerKind": "K8ssandraCluster", "K8ssandraCluster": {"name":"sdp-k8ssandra","namespace":"k8ssandra-operator"}, "namespace": "k8ssandra-operator", "name": "sdp-k8ssandra", "reconcileID": "f58ca45a-c51d-4115-9a68-80dd6f681ddc", "K8ssandraCluster": {"name":"sdp-k8ssandra","namespace":"k8ssandra-operator"}, "CassandraDatacenter": {"name":"dc1","namespace":"k8ssandra-operator"}, "K8SContext": "", "keyspace": "system_auth", "error": "CALL alter keyspaces system_auth failed on all datacenter dc1 pods"}
github.com/k8ssandra/k8ssandra-operator/controllers/k8ssandra.(*K8ssandraClusterReconciler).updateReplicationOfSystemKeyspaces
        /workspace/controllers/k8ssandra/schemas.go:169
github.com/k8ssandra/k8ssandra-operator/controllers/k8ssandra.(*K8ssandraClusterReconciler).checkSchemas
        /workspace/controllers/k8ssandra/schemas.go:43
github.com/k8ssandra/k8ssandra-operator/controllers/k8ssandra.(*K8ssandraClusterReconciler).reconcileDatacenters
        /workspace/controllers/k8ssandra/datacenters.go:224
github.com/k8ssandra/k8ssandra-operator/controllers/k8ssandra.(*K8ssandraClusterReconciler).reconcile
        /workspace/controllers/k8ssandra/k8ssandracluster_controller.go:152
github.com/k8ssandra/k8ssandra-operator/controllers/k8ssandra.(*K8ssandraClusterReconciler).Reconcile
        /workspace/controllers/k8ssandra/k8ssandracluster_controller.go:96
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:119
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:316
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:227
2024-12-18T06:49:53.307Z        DEBUG   events  CALL alter keyspaces system_auth failed on all datacenter dc1 pods      {"type": "Warning", "object": {"kind":"K8ssandraCluster","namespace":"k8ssandra-operator","name":"sdp-k8ssandra","uid":"cf3ec20b-be14-45c3-b574-14accc499730","apiVersion":"k8ssandra.io/v1alpha1","resourceVersion":"338386607"}, "reason": "Reconcile Error"}

Anything else we need to know?:
This issue might be related to https://issues.apache.org/jira/browse/CASSANDRA-17478
According to this discussion thread: https://lists.apache.org/thread/tlm4yl4oytlhd70wdfb78p976j9h1gg5

┆Issue is synchronized with this Jira Story by Unito
┆Issue Number: K8OP-302

@loshsu loshsu added the bug Something isn't working label Dec 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
No open projects
Status: No status
Development

No branches or pull requests

1 participant