Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MongoDB replicaset with PS or PSS setup could not have arbiter added due to crashloop for goal state #1615

Open
KarooolisZi opened this issue Sep 9, 2024 · 10 comments
Labels

Comments

@KarooolisZi
Copy link

KarooolisZi commented Sep 9, 2024

@nammn Hello, I am not able to add arbiter to 2 member replicaset. (Same for 3 member replicaset). After adding, the pod is created however operator throws error about not reaching the goal state. After some time replicaset members go down as readiness probe is failed.
What did you do to encounter the bug?
Steps to reproduce the behavior:

  1. Changed my CR MongoDB cluster yaml manifest database.yaml spec.arbiters: 0 to spec.arbiters: 1
  2. Use kubectl apply -f database.yaml
  3. Checked MongoDB community operator logs. Observed the debug log stating that none of pods reached goal state.

What did you expect?
Arbiter to be added to 2 member replicaset. Both 2 members and 1 arbiter to reach goal state.

What happened instead?
Neither members or arbiter could reach goal state. Replicaset stuck in crashloop. Working members become failed after not reaching goal state for some time, too.

Operator Information

  • 0.10.0
  • docker.io/mongo:6.0.17

Kubernetes Cluster Information

  • AWS EKS
  • 1.28

If possible, please include:

  • The operator logs
2024-09-09T11:35:35.491Z	DEBUG	scram/scram.go:102	Credentials have not changed, using credentials stored in: secret/dms-user-scram-scram-credentials
2024-09-09T11:35:35.492Z	DEBUG	agent/agent_readiness.go:111	The Agent in the Pod 'mongodb-0' hasn't reached the goal state yet (goal: 30, agent: 29)	{"ReplicaSet": "mongodb-surplus/mongodb"}
2024-09-09T11:35:35.492Z	DEBUG	agent/agent_readiness.go:111	The Agent in the Pod 'mongodb-1' hasn't reached the goal state yet (goal: 30, agent: 29)	{"ReplicaSet": "mongodb-surplus/mongodb"}
2024-09-09T11:35:35.492Z	DEBUG	agent/agent_readiness.go:111	The Agent in the Pod 'mongodb-2' hasn't reached the goal state yet (goal: 30, agent: 29)	{"ReplicaSet": "mongodb-surplus/mongodb"}
2024-09-09T11:35:35.492Z	DEBUG	agent/agent_readiness.go:111	The Agent in the Pod 'mongodb-arb-0' hasn't reached the goal state yet (goal: 30, agent: -1)	{"ReplicaSet": "mongodb-surplus/mongodb"}
2024-09-09T11:35:35.492Z	DEBUG	agent/replica_set_port_manager.go:122	No port change required	{"ReplicaSet": "mongodb-surplus/mongodb"}
2024-09-09T11:35:35.492Z	DEBUG	agent/replica_set_port_manager.go:40	Calculated process port map: map[mongodb-0:27017 mongodb-1:27017 mongodb-2:27017 mongodb-arb-0:27017]	{"ReplicaSet": "mongodb-surplus/mongodb"}
2024-09-09T11:35:35.492Z	DEBUG	controllers/replica_set_controller.go:505	AutomationConfigMembersThisReconciliation	{"mdb.AutomationConfigMembersThisReconciliation()": 3}
2024-09-09T11:35:35.492Z	DEBUG	controllers/replica_set_controller.go:358	Waiting for agents to reach version 30	{"ReplicaSet": "mongodb-surplus/mongodb"}
2024-09-09T11:35:35.492Z	DEBUG	agent/agent_readiness.go:111	The Agent in the Pod 'mongodb-0' hasn't reached the goal state yet (goal: 30, agent: 29)	{"ReplicaSet": "mongodb-surplus/mongodb"}
2024-09-09T11:35:35.492Z	DEBUG	agent/agent_readiness.go:111	The Agent in the Pod 'mongodb-1' hasn't reached the goal state yet (goal: 30, agent: 29)	{"ReplicaSet": "mongodb-surplus/mongodb"}
2024-09-09T11:35:35.492Z	DEBUG	agent/agent_readiness.go:111	The Agent in the Pod 'mongodb-2' hasn't reached the goal state yet (goal: 30, agent: 29)	{"ReplicaSet": "mongodb-surplus/mongodb"}
2024-09-09T11:35:35.492Z	INFO	controllers/mongodb_status_options.go:110	ReplicaSet is not yet ready, retrying in 10 seconds
  • Below we assume that your replicaset database pods are named mongo-<>. For instance:
❯ k get pods
NAME      READY   STATUS    RESTARTS   AGE
mongo-0   2/2     Running   0          19h
mongo-1   2/2     Running   0          19h
                                                                                     
❯ k get mdbc
NAME    PHASE     VERSION
mongo   Running   4.4.0
  • yaml definitions of your MongoDB Deployment(s):
    • kubectl get mdbc -oyaml
apiVersion: v1
items:
- apiVersion: mongodbcommunity.mongodb.com/v1
  kind: MongoDBCommunity
  metadata:
    annotations:
      mongodb.com/v1.lastAppliedMongoDBVersion: 6.0.17
    creationTimestamp: "2024-01-03T07:47:03Z"
    generation: 48
    labels:
      k8slens-edit-resource-version: v1
    name: mongodb
    namespace: mongodb-<SENSITIVE>
    resourceVersion: "391080428"
    uid: 8dbc92a1-061b-4ebb-a2be-d1b5dd6d696b
  spec:
    additionalMongodConfig:
      storage.wiredTiger.engineConfig.journalCompressor: zlib
    arbiters: 1
    members: 3
    security:
      authentication:
        ignoreUnknownUsers: true
        modes:
        - SCRAM
    statefulSet:
      spec:
        template:
          spec:
            affinity:
              nodeAffinity:
                requiredDuringSchedulingIgnoredDuringExecution:
                  nodeSelectorTerms:
                  - matchExpressions:
                    - key: NodeGroup
                      operator: In
                      values:
                      - <SENSITIVE>
              podAntiAffinity:
                preferredDuringSchedulingIgnoredDuringExecution:
                - podAffinityTerm:
                    labelSelector:
                      matchExpressions:
                      - key: app
                        operator: In
                        values:
                        - mongodb-<SENSITIVE>
                    topologyKey: kubernetes.io/hostname
                  weight: 100
            containers:
            - name: mongod
              resources:
                limits:
                  cpu: "1"
                  memory: 2Gi
                requests:
                  cpu: 500m
                  memory: 1Gi
        volumeClaimTemplates:
        - metadata:
            name: data-volume
          spec:
            accessModes:
            - ReadWriteOnce
            resources:
              requests:
                storage: 70G
            storageClassName: ebs-sc
        - metadata:
            name: logs-volume
          spec:
            accessModes:
            - ReadWriteOnce
            resources:
              requests:
                storage: 10G
            storageClassName: ebs-sc
    type: ReplicaSet
    version: 6.0.17
  status:
    currentMongoDBMembers: 3
    currentStatefulSetReplicas: 3
    message: ReplicaSet is not yet ready, retrying in 10 seconds
    mongoUri: mongodb://mongodb-0.mongodb-svc.mongodb-surplus.svc.cluster.local:27017,mongodb-1.mongodb-svc.mongodb-surplus.svc.cluster.local:27017,mongodb-2.mongodb-svc.mongodb-surplus.svc.cluster.local:27017/?replicaSet=mongodb
    phase: Pending
    version: 6.0.17
kind: List
metadata:
  resourceVersion: ""
  • yaml definitions of your kubernetes objects like the statefulset(s), pods (we need to see the state of the containers):
    • kubectl get sts -oyaml
apiVersion: v1
items:
- apiVersion: apps/v1
  kind: StatefulSet
  metadata:
    creationTimestamp: "2024-09-06T13:47:18Z"
    generation: 4
    name: mongodb
    namespace: mongodb-XXX
    ownerReferences:
    - apiVersion: mongodbcommunity.mongodb.com/v1
      blockOwnerDeletion: true
      controller: true
      kind: MongoDBCommunity
      name: mongodb
      uid: 8dbc92a1-061b-4ebb-a2be-d1b5dd6d696b
    resourceVersion: "391084063"
    uid: 25fa6a25-5016-4a16-af39-8c6907338a49
  spec:
    persistentVolumeClaimRetentionPolicy:
      whenDeleted: Retain
      whenScaled: Retain
    podManagementPolicy: OrderedReady
    replicas: 3
    revisionHistoryLimit: 10
    selector:
      matchLabels:
        app: mongodb-XXX
    serviceName: mongodb-XXX
    template:
      metadata:
        annotations:
          kubectl.kubernetes.io/restartedAt: "2024-09-09T07:49:13Z"
        creationTimestamp: null
        labels:
          app: mongodb-XXX
      spec:
        affinity:
          nodeAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
              nodeSelectorTerms:
              - matchExpressions:
                - key: NodeGroup
                  operator: In
                  values:
                  - XXX
          podAntiAffinity:
            preferredDuringSchedulingIgnoredDuringExecution:
            - podAffinityTerm:
                labelSelector:
                  matchExpressions:
                  - key: app
                    operator: In
                    values:
                    - mongodb-XXX
                topologyKey: kubernetes.io/hostname
              weight: 100
        containers:
        - args:
          - ""
          command:
          - /bin/sh
          - -c
          - "\nif [ -e \"/hooks/version-upgrade\" ]; then\n\t#run post-start hook
            to handle version changes (if exists)\n    /hooks/version-upgrade\nfi\n\n#
            wait for config and keyfile to be created by the agent\nwhile ! [ -f /data/automation-mongod.conf
            -a -f /var/lib/mongodb-mms-automation/authentication/keyfile ]; do sleep
            3 ; done ; sleep 2 ;\n\n# start mongod with this configuration\nexec mongod
            -f /data/automation-mongod.conf;\n\n"
          env:
          - name: AGENT_STATUS_FILEPATH
            value: /healthstatus/agent-health-status.json
          image: docker.io/mongo:6.0.17
          imagePullPolicy: IfNotPresent
          name: mongod
          resources:
            limits:
              cpu: "1"
              memory: 2Gi
            requests:
              cpu: 500m
              memory: 1Gi
          securityContext:
            allowPrivilegeEscalation: false
            readOnlyRootFilesystem: true
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          volumeMounts:
          - mountPath: /data
            name: data-volume
          - mountPath: /healthstatus
            name: healthstatus
          - mountPath: /hooks
            name: hooks
          - mountPath: /var/log/mongodb-mms-automation
            name: logs-volume
          - mountPath: /var/lib/mongodb-mms-automation/authentication
            name: mongodb-keyfile
          - mountPath: /tmp
            name: tmp
        - command:
          - /bin/bash
          - -c
          - |-
            current_uid=$(id -u)
            declare -r current_uid
            if ! grep -q "${current_uid}" /etc/passwd ; then
            sed -e "s/^mongodb:/builder:/" /etc/passwd > /tmp/passwd
            echo "mongodb:x:$(id -u):$(id -g):,,,:/:/bin/bash" >> /tmp/passwd
            export NSS_WRAPPER_PASSWD=/tmp/passwd
            export LD_PRELOAD=libnss_wrapper.so
            export NSS_WRAPPER_GROUP=/etc/group
            fi
            agent/mongodb-agent -healthCheckFilePath=/var/log/mongodb-mms-automation/healthstatus/agent-health-status.json -serveStatusPort=5000 -cluster=/var/lib/automation/config/cluster-config.json -skipMongoStart -noDaemonize -useLocalMongoDbTools -logFile /var/log/mongodb-mms-automation/automation-agent.log -logLevel INFO -maxLogFileDurationHrs 24
          env:
          - name: AGENT_STATUS_FILEPATH
            value: /var/log/mongodb-mms-automation/healthstatus/agent-health-status.json
          - name: AUTOMATION_CONFIG_MAP
            value: mongodb-config
          - name: HEADLESS_AGENT
            value: "true"
          - name: POD_NAMESPACE
            valueFrom:
              fieldRef:
                apiVersion: v1
                fieldPath: metadata.namespace
          image: quay.io/mongodb/mongodb-agent:12.0.15.7646-1
          imagePullPolicy: Always
          name: mongodb-agent
          readinessProbe:
            exec:
              command:
              - /opt/scripts/readinessprobe
            failureThreshold: 40
            initialDelaySeconds: 5
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 1
          resources:
            limits:
              cpu: "1"
              memory: 500M
            requests:
              cpu: 500m
              memory: 400M
          securityContext:
            allowPrivilegeEscalation: false
            readOnlyRootFilesystem: true
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          volumeMounts:
          - mountPath: /opt/scripts
            name: agent-scripts
          - mountPath: /var/lib/automation/config
            name: automation-config
            readOnly: true
          - mountPath: /data
            name: data-volume
          - mountPath: /var/log/mongodb-mms-automation/healthstatus
            name: healthstatus
          - mountPath: /var/log/mongodb-mms-automation
            name: logs-volume
          - mountPath: /var/lib/mongodb-mms-automation/authentication
            name: mongodb-keyfile
          - mountPath: /tmp
            name: tmp
        dnsPolicy: ClusterFirst
        initContainers:
        - command:
          - cp
          - version-upgrade-hook
          - /hooks/version-upgrade
          image: quay.io/mongodb/mongodb-kubernetes-operator-version-upgrade-post-start-hook:1.0.6
          imagePullPolicy: Always
          name: mongod-posthook
          resources:
            limits:
              cpu: "1"
              memory: 500M
            requests:
              cpu: 500m
              memory: 400M
          securityContext:
            allowPrivilegeEscalation: false
            readOnlyRootFilesystem: true
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          volumeMounts:
          - mountPath: /hooks
            name: hooks
        - command:
          - cp
          - /probes/readinessprobe
          - /opt/scripts/readinessprobe
          image: quay.io/mongodb/mongodb-kubernetes-readinessprobe:1.0.12
          imagePullPolicy: Always
          name: mongodb-agent-readinessprobe
          resources:
            limits:
              cpu: "1"
              memory: 500M
            requests:
              cpu: 500m
              memory: 400M
          securityContext:
            allowPrivilegeEscalation: false
            readOnlyRootFilesystem: true
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          volumeMounts:
          - mountPath: /opt/scripts
            name: agent-scripts
        restartPolicy: Always
        schedulerName: default-scheduler
        securityContext:
          fsGroup: 2000
          runAsNonRoot: true
          runAsUser: 2000
        serviceAccount: mongodb-database
        serviceAccountName: mongodb-database
        terminationGracePeriodSeconds: 30
        volumes:
        - emptyDir: {}
          name: agent-scripts
        - name: automation-config
          secret:
            defaultMode: 416
            secretName: mongodb-config
        - emptyDir: {}
          name: healthstatus
        - emptyDir: {}
          name: hooks
        - emptyDir: {}
          name: mongodb-keyfile
        - emptyDir: {}
          name: tmp
    updateStrategy:
      type: RollingUpdate
    volumeClaimTemplates:
    - apiVersion: v1
      kind: PersistentVolumeClaim
      metadata:
        creationTimestamp: null
        name: data-volume
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 70G
        storageClassName: ebs-sc
        volumeMode: Filesystem
      status:
        phase: Pending
    - apiVersion: v1
      kind: PersistentVolumeClaim
      metadata:
        creationTimestamp: null
        name: logs-volume
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 10G
        storageClassName: ebs-sc
        volumeMode: Filesystem
      status:
        phase: Pending
  status:
    availableReplicas: 0
    collisionCount: 0
    currentReplicas: 3
    currentRevision: mongodb-6847cc6f7
    observedGeneration: 4
    replicas: 3
    updateRevision: mongodb-6847cc6f7
    updatedReplicas: 3
- apiVersion: apps/v1
  kind: StatefulSet
  metadata:
    creationTimestamp: "2024-09-06T13:47:10Z"
    generation: 8
    name: mongodb-arb
    namespace: mongodb-XXX
    ownerReferences:
    - apiVersion: mongodbcommunity.mongodb.com/v1
      blockOwnerDeletion: true
      controller: true
      kind: MongoDBCommunity
      name: mongodb
      uid: 8dbc92a1-061b-4ebb-a2be-d1b5dd6d696b
    resourceVersion: "391081887"
    uid: 1267641d-8cfa-4d13-ae06-a79c5255facc
  spec:
    persistentVolumeClaimRetentionPolicy:
      whenDeleted: Retain
      whenScaled: Retain
    podManagementPolicy: OrderedReady
    replicas: 1
    revisionHistoryLimit: 10
    selector:
      matchLabels:
        app: mongodb-XXX
    serviceName: mongodb-XXX
    template:
      metadata:
        annotations:
          kubectl.kubernetes.io/restartedAt: "2024-09-09T07:45:41Z"
        creationTimestamp: null
        labels:
          app: mongodb-XXX
      spec:
        affinity:
          nodeAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
              nodeSelectorTerms:
              - matchExpressions:
                - key: NodeGroup
                  operator: In
                  values:
                  - XXX
          podAntiAffinity:
            preferredDuringSchedulingIgnoredDuringExecution:
            - podAffinityTerm:
                labelSelector:
                  matchExpressions:
                  - key: app
                    operator: In
                    values:
                    - mongodb-XXX
                topologyKey: kubernetes.io/hostname
              weight: 100
        containers:
        - args:
          - ""
          command:
          - /bin/sh
          - -c
          - "\nif [ -e \"/hooks/version-upgrade\" ]; then\n\t#run post-start hook
            to handle version changes (if exists)\n    /hooks/version-upgrade\nfi\n\n#
            wait for config and keyfile to be created by the agent\nwhile ! [ -f /data/automation-mongod.conf
            -a -f /var/lib/mongodb-mms-automation/authentication/keyfile ]; do sleep
            3 ; done ; sleep 2 ;\n\n# start mongod with this configuration\nexec mongod
            -f /data/automation-mongod.conf;\n\n"
          env:
          - name: AGENT_STATUS_FILEPATH
            value: /healthstatus/agent-health-status.json
          image: docker.io/mongo:6.0.17
          imagePullPolicy: IfNotPresent
          name: mongod
          resources:
            limits:
              cpu: "1"
              memory: 2Gi
            requests:
              cpu: 500m
              memory: 1Gi
          securityContext:
            allowPrivilegeEscalation: false
            readOnlyRootFilesystem: true
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          volumeMounts:
          - mountPath: /data
            name: data-volume
          - mountPath: /healthstatus
            name: healthstatus
          - mountPath: /hooks
            name: hooks
          - mountPath: /var/log/mongodb-mms-automation
            name: logs-volume
          - mountPath: /var/lib/mongodb-mms-automation/authentication
            name: mongodb-keyfile
          - mountPath: /tmp
            name: tmp
        - command:
          - /bin/bash
          - -c
          - |-
            current_uid=$(id -u)
            declare -r current_uid
            if ! grep -q "${current_uid}" /etc/passwd ; then
            sed -e "s/^mongodb:/builder:/" /etc/passwd > /tmp/passwd
            echo "mongodb:x:$(id -u):$(id -g):,,,:/:/bin/bash" >> /tmp/passwd
            export NSS_WRAPPER_PASSWD=/tmp/passwd
            export LD_PRELOAD=libnss_wrapper.so
            export NSS_WRAPPER_GROUP=/etc/group
            fi
            agent/mongodb-agent -healthCheckFilePath=/var/log/mongodb-mms-automation/healthstatus/agent-health-status.json -serveStatusPort=5000 -cluster=/var/lib/automation/config/cluster-config.json -skipMongoStart -noDaemonize -useLocalMongoDbTools -logFile /var/log/mongodb-mms-automation/automation-agent.log -logLevel INFO -maxLogFileDurationHrs 24
          env:
          - name: AGENT_STATUS_FILEPATH
            value: /var/log/mongodb-mms-automation/healthstatus/agent-health-status.json
          - name: AUTOMATION_CONFIG_MAP
            value: mongodb-config
          - name: HEADLESS_AGENT
            value: "true"
          - name: POD_NAMESPACE
            valueFrom:
              fieldRef:
                apiVersion: v1
                fieldPath: metadata.namespace
          image: quay.io/mongodb/mongodb-agent:12.0.15.7646-1
          imagePullPolicy: Always
          name: mongodb-agent
          readinessProbe:
            exec:
              command:
              - /opt/scripts/readinessprobe
            failureThreshold: 40
            initialDelaySeconds: 5
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 1
          resources:
            limits:
              cpu: "1"
              memory: 500M
            requests:
              cpu: 500m
              memory: 400M
          securityContext:
            allowPrivilegeEscalation: false
            readOnlyRootFilesystem: true
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          volumeMounts:
          - mountPath: /opt/scripts
            name: agent-scripts
          - mountPath: /var/lib/automation/config
            name: automation-config
            readOnly: true
          - mountPath: /data
            name: data-volume
          - mountPath: /var/log/mongodb-mms-automation/healthstatus
            name: healthstatus
          - mountPath: /var/log/mongodb-mms-automation
            name: logs-volume
          - mountPath: /var/lib/mongodb-mms-automation/authentication
            name: mongodb-keyfile
          - mountPath: /tmp
            name: tmp
        dnsPolicy: ClusterFirst
        initContainers:
        - command:
          - cp
          - version-upgrade-hook
          - /hooks/version-upgrade
          image: quay.io/mongodb/mongodb-kubernetes-operator-version-upgrade-post-start-hook:1.0.6
          imagePullPolicy: Always
          name: mongod-posthook
          resources:
            limits:
              cpu: "1"
              memory: 500M
            requests:
              cpu: 500m
              memory: 400M
          securityContext:
            allowPrivilegeEscalation: false
            readOnlyRootFilesystem: true
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          volumeMounts:
          - mountPath: /hooks
            name: hooks
        - command:
          - cp
          - /probes/readinessprobe
          - /opt/scripts/readinessprobe
          image: quay.io/mongodb/mongodb-kubernetes-readinessprobe:1.0.12
          imagePullPolicy: Always
          name: mongodb-agent-readinessprobe
          resources:
            limits:
              cpu: "1"
              memory: 500M
            requests:
              cpu: 500m
              memory: 400M
          securityContext:
            allowPrivilegeEscalation: false
            readOnlyRootFilesystem: true
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          volumeMounts:
          - mountPath: /opt/scripts
            name: agent-scripts
        restartPolicy: Always
        schedulerName: default-scheduler
        securityContext:
          fsGroup: 2000
          runAsNonRoot: true
          runAsUser: 2000
        serviceAccount: XXX
        serviceAccountName: XXX
        terminationGracePeriodSeconds: 30
        volumes:
        - emptyDir: {}
          name: agent-scripts
        - name: automation-config
          secret:
            defaultMode: 416
            secretName: mongodb-config
        - emptyDir: {}
          name: healthstatus
        - emptyDir: {}
          name: hooks
        - emptyDir: {}
          name: mongodb-keyfile
        - emptyDir: {}
          name: tmp
    updateStrategy:
      type: RollingUpdate
    volumeClaimTemplates:
    - apiVersion: v1
      kind: PersistentVolumeClaim
      metadata:
        creationTimestamp: null
        name: data-volume
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 70G
        storageClassName: ebs-sc
        volumeMode: Filesystem
      status:
        phase: Pending
    - apiVersion: v1
      kind: PersistentVolumeClaim
      metadata:
        creationTimestamp: null
        name: logs-volume
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 10G
        storageClassName: ebs-sc
        volumeMode: Filesystem
      status:
        phase: Pending
  status:
    availableReplicas: 1
    collisionCount: 0
    currentReplicas: 1
    currentRevision: mongodb-arb-5f6bc75bb8
    observedGeneration: 8
    readyReplicas: 1
    replicas: 1
    updateRevision: mongodb-arb-5f6bc75bb8
    updatedReplicas: 1
kind: List
metadata:
  resourceVersion: ""
  • The agent clusterconfig of the faulty members:
    • kubectl exec -it mongo-0 -c mongodb-agent -- cat /var/lib/automation/config/cluster-config.json
{"version":32,"processes":[{"name":"mongodb-0","disabled":false,"hostname":"mongodb-0.mongodb-svc.mongodb-xxx.svc.cluster.local","args2_6":{"net":{"port":27017},"repl
ication":{"replSetName":"mongodb"},"storage":{"dbPath":"/data","wiredTiger":{"engineConfig":{"journalCompressor":"zlib"}}}},"featureCompatibilityVersion":"6.0","processTy
pe":"mongod","version":"6.0.17","authSchemaVersion":5},{"name":"mongodb-1","disabled":false,"hostname":"mongodb-1.mongodb-svc.mongodb-xxx.svc.cluster.local","args2_6"
:{"net":{"port":27017},"replication":{"replSetName":"mongodb"},"storage":{"dbPath":"/data","wiredTiger":{"engineConfig":{"journalCompressor":"zlib"}}}},"featureCompatibil
ityVersion":"6.0","processType":"mongod","version":"6.0.17","authSchemaVersion":5},{"name":"mongodb-2","disabled":false,"hostname":"mongodb-2.mongodb-svc.mongodb-xxx.
svc.cluster.local","args2_6":{"net":{"port":27017},"replication":{"replSetName":"mongodb"},"storage":{"dbPath":"/data","wiredTiger":{"engineConfig":{"journalCompressor":"
zlib"}}}},"featureCompatibilityVersion":"6.0","processType":"mongod","version":"6.0.17","authSchemaVersion":5},{"name":"mongodb-arb-0","disabled":false,"hostname":"mongod
b-arb-0.mongodb-svc.mongodb-xxx.svc.cluster.local","args2_6":{"net":{"port":27017},"replication":{"replSetName":"mongodb"},"storage":{"dbPath":"/data","wiredTiger":{"
engineConfig":{"journalCompressor":"zlib"}}}},"featureCompatibilityVersion":"6.0","processType":"mongod","version":"6.0.17","authSchemaVersion":5}],"replicaSets":[{"_id":
"mongodb","members":[{"_id":0,"host":"mongodb-0","arbiterOnly":false,"votes":1,"priority":1},{"_id":1,"host":"mongodb-1","arbiterOnly":false,"votes":1,"priority":1},{"_id
":2,"host":"mongodb-2","arbiterOnly":false,"votes":1,"priority":1},{"_id":100,"host":"mongodb-arb-0","arbiterOnly":true,"votes":1,"priority":1}],"protocolVersion":"1","nu
mberArbiters":1}],"auth":{"usersWanted":[{"mechanisms":[],"roles":[{"role":"clusterAdmin","db":"admin"},{"role":"userAdminAnyDatabase","db":"admin"}],"user":"admin-user",
"db":"admin","authenticationRestrictions":[],"scramSha256Creds":{"iterationCount":15000,"salt":"WioeMJQXT8w9Coif5Fq3gqV1OfeDi64Bvq/maw==","serverKey":"iK8SJLwCzmk95+mUePC
3wrpGw29Tfx9vN+ZCKSMPKMM=","storedKey":"Z33GU9ix2W++nlnkFBIbYP7kEATZ/6sDVQqdhEd+tT0="},"scramSha1Creds":{"iterationCount":10000,"salt":"Q9mmbNXpyLDRtYmoln1xgA==","serverK
ey":"AmaNP+YmbrNf23l8URaZAZKKOz0=","storedKey":"0d8SscAfTMph+2aW416TXB1/UZw="}},{"mechanisms":[],"roles":[{"role":"readWrite","db":"xxx"}],"user":"xxx-prod-user","db":"ad
min","authenticationRestrictions":[],"scramSha256Creds":{"iterationCount":15000,"salt":"GOkTsWgdrct5KSSQtTHC20myEJM76v5OMEGXOA==","serverKey":"ksqF9YIWnI50+bQJhjl0/zA1a0H
0UpcNnzxnFEjciV4=","storedKey":"GyxjpwCp9hTHsK5CX2ObkIs73NP2zL1VrwbCQTLDvGE="},"scramSha1Creds":{"iterationCount":10000,"salt":"4RAcRyAxnRCQQhcHWRDA2w==","serverKey":"59K
Q8PQV/rS4zxSuVca/tQbDNWw=","storedKey":"v68O4bx8u7/RNIks1WBvmXIJ+H8="}},{"mechanisms":[],"roles":[{"role":"readWrite","db":"xxx"}],"user":"xxx-prod-user","db":"admin","au
thenticationRestrictions":[],"scramSha256Creds":{"iterationCount":15000,"salt":"vYX0jOTF0NvPPmpQm1oz/b7v1/sAnFOMWdm5Pg==","serverKey":"MWuSeUedUk33YD57g/pVw+kV89vQK8OmTib
RLl2hR0U=","storedKey":"ffXuxQ5HTf0FcH2FdcNvKigWSO/TgPdF0elXk9iYX3E="},"scramSha1Creds":{"iterationCount":10000,"salt":"zV90H0Z2XJ8sCiupCSK3PQ==","serverKey":"IwdxN4BVrGS
qyLPXDhrbZKFsbtc=","storedKey":"e5XxJTwdueUyUyJd3ioqQFuEKbc="}},{"mechanisms":[],"roles":[{"role":"readWrite","db":"xxx"},{"role":"readWrite","db":"xxx"}],"user":"xxx-
user","db":"admin","authenticationRestrictions":[],"scramSha256Creds":{"iterationCount":15000,"salt":"0HwrZJa9FIMCy5r4erCq7o2gb/RSaHofCV1XMw==","serverKey":"4FTg/fstci+8W
6BZE4jfyXpLJr9/f4zsuDiKrLnBcgg=","storedKey":"lThVu1E2tv14Q7H58DYYNK1jlqXaIZDCp/Omp44wR1A="},"scramSha1Creds":{"iterationCount":10000,"salt":"gDxOqOLC16/e/WvhWSGDdA==","s
erverKey":"q6SKd30cOY+PFQnqRFMpdgmNTFA=","storedKey":"9swpmpNkjLofRRRprZWGrCBoolk="}},{"mechanisms":[],"roles":[{"role":"dbAdmin","db":"admin"},{"role":"userAdminAnyDatab
ase","db":"admin"},{"role":"readWrite","db":"xxx"},{"role":"readWrite","db":"xxx"},{"role":"readWrite","db":"local"}],"user":"xxx-user","db":"admin","authenticationRe
strictions":[],"scramSha256Creds":{"iterationCount":15000,"salt":"m40BqXb1jbVT4NIN9NjAdbTdQp84O6KtEbRjgA==","serverKey":"hztnbCDXJs0zBcwtaJsLquRtEgCHDykKj04SaQ3eLn8=","st
oredKey":"cftguOpTPNr4QYvL2XtrRlybPzi96CgzGoZ91EVZg2g="},"scramSha1Creds":{"iterationCount":10000,"salt":"SofiWm+P4s3RwvvIxflOOQ==","serverKey":"73knk0VrQPm6PWSYM5PFYwUK1
lA=","storedKey":"wnYGbRIv2qPtcpv3j4r+lUX8x/4="}},{"mechanisms":[],"roles":[{"role":"read","db":"cps"},{"role":"read","db":"xxx"},{"role":"changestreamrole","db":"admin
"}],"user":"xxx-user","db":"admin","authenticationRestrictions":[],"scramSha256Creds":{"iterationCount":15000,"salt":"wzzV+4JFqwIGw1mAzRucb1oiIcYYR/gdcwZyJw==","serverKey
":"Ty105G/oxXhrv9UwgIqqXHO7ZxM5LYW9T/mta7uiQYo=","storedKey":"63E/7kQy2g4O/MUd0a62q8pQNBtITkJ74dUagrRESO4="},"scramSha1Creds":{"iterationCount":10000,"salt":"R/PWGnO94tyg
xdNjIavdkQ==","serverKey":"Gr8vtcLPyb/pR/h8GRivOrY6/sE=","storedKey":"kwWqu/dnDzNTDOEu03TcgeDWxPY="}}],"disabled":false,"authoritativeSet":false,"autoAuthMechanisms":["SC
RAM-SHA-256"],"autoAuthMechanism":"SCRAM-SHA-256","deploymentAuthMechanisms":["SCRAM-SHA-256"],"autoUser":"mms-automation","key":"8tQDoV1eZdKJvpc7cA8rtu939Glj1IgsL9CNE1nf
7SuZMFw8Te47PmhA9Z1NPi27cRw5+bs16kenEAPP82V7v51Xcv5Q9xPZKUxltKlc3t9cfq2Q7Il42DJsrjhQUhne5lKNghWLRHPSFVb8IHbuImgPcvu7mPz6VsYClu6Lno5ewW3ziIvilIW/2xvpxqG0qz4jvz5/cmtTWeNn7V
JzNOYwYurWdFfvdUDL+Z+kQqcbsa95SSYA8217h6aKE2guOwlVpK0VZBYCPACg+ID1dARawAHG7xCA92lFttymLfgu8kbUXeW6RxBsgwz5iuOXjiIrm8XpWhpWHLJNplf5YaGsqBMIbRlH3tAXGv6auqLaiGup3+kQXDJNwC7J
uaa5F0FGXg+PdQPMOH4xv2SZy0zGHh988CaEtXhBVWiQ06FhnNWyxziLCl8BGJpCbD2bsGiiUBcUHvxkCybARhguLYdnS60+tlJcMIr3rpt7MTgRuHhwki0gX1KcVmEe+tPeg57RdqcVcEEqqHYwc4Ghkk/PF/10BlsO0NiUZm
JxZqow7ffSRZHtZ/VKW2og6CZp2V3BaYZmzYwHn5XFFRCDNUu8mbwvtySQVSlVVY4GbKRkgepYsWrYGc20yPH7Hzni9b8N0zCmX8HPy5icn8+jf4z7BRw=","keyfile":"/var/lib/mongodb-mms-automation/authent
ication/keyfile","keyfileWindows":"%SystemDrive%\\MMSAutomation\\versions\\keyfile","autoPwd":"eX-rXNR2PB_ytwagyylk"},"tls":{"CAFilePath":"","clientCertificateMode":"OPTI
ONAL"},"mongoDbVersions":[{"name":"6.0.17","builds":[{"platform":"linux","url":"","gitVersion":"","architecture":"amd64","flavor":"rhel","minOsVersion":"","maxOsVersion":
"","modules":[]},{"platform":"linux","url":"","gitVersion":"","architecture":"amd64","flavor":"ubuntu","minOsVersion":"","maxOsVersion":"","modules":[]},{"platform":"linu
x","url":"","gitVersion":"","architecture":"aarch64","flavor":"ubuntu","minOsVersion":"","maxOsVersion":"","modules":[]},{"platform":"linux","url":"","gitVersion":"","arc
hitecture":"aarch64","flavor":"rhel","minOsVersion":"","maxOsVersion":"","modules":[]}]}],"backupVersions":[],"monitoringVersions":[],"options":{"downloadBase":"/var/lib/
mongodb-mms-automation"}}
  • The agent health status of the faulty members:
    • kubectl exec -it mongo-0 -c mongodb-agent -- cat /var/log/mongodb-mms-automation/healthstatus/agent-health-status.json
{"statuses":{"mongodb-arb-0":{"IsInGoalState":false,"LastMongoUpTime":1725903583,"ExpectedToBeUp":true,"ReplicationStatus":-1}},"mmsStatus":{"mongodb-arb-0":{"name":"mong
odb-arb-0","lastGoalVersionAchieved":-1,"plans":[{"automationConfigVersion":32,"started":"2024-09-09T17:37:58.413084367Z","completed":null,"moves":[{"move":"Start","moveD
oc":"Start the process","steps":[{"step":"StartFresh","stepDoc":"Start a mongo instance  (start fresh)","isWaitStep":false,"started":"2024-09-09T17:37:58.413103457Z","com
pleted":"2024-09-09T17:38:02.701441838Z","result":"success"}]},{"move":"WaitRsInit","moveDoc":"Wait for the replica set to be initialized by another member","steps":[{"st
ep":"WaitRsInit","stepDoc":"Wait for the replica set to be initialized by another member","isWaitStep":true,"started":"2024-09-09T17:38:02.701493829Z","completed":null,"r
esult":"wait"}]},{"move":"WaitFeatureCompatibilityVersionCorrect","moveDoc":"Wait for featureCompatibilityVersion to be right","steps":[{"step":"WaitFeatureCompatibilityV
ersionCorrect","stepDoc":"Wait for featureCompatibilityVersion to be right","isWaitStep":true,"started":null,"completed":null,"result":""}]}]}],"errorCode":0,"errorString
":""}}}
  • The verbose agent logs of the faulty members:
    • kubectl exec -it mongo-0 -c mongodb-agent -- cat /var/log/mongodb-mms-automation/automation-agent-verbose.log
[2024-09-09T17:40:52.041+0000] [.info] [src/director/director.go:tracef:806] <mongodb-arb-0> [17:40:52.041] because
[All the following are true:
    ['currentState.Up' = true]
    ['currentState.CanRsInit' = false]
    ['desiredState.ReplSetConf' != <nil> ('desiredState.ReplSetConf' = ReplSetConfig{id=mongodb,version=0,commitmentStatus=false,configsvr=false,protocolVersion=1,forcePr
otocolVersion=false,writeConcernMajorityJournalDefault=,members={id:0,HostPort:mongodb-0.mongodb-svc.mongodb-xxx.svc.cluster.local:27017,ArbiterOnly:falsePriority:1,H
idden:false,SecondaryDelaySecs:0,Votes:1,Tags:map[]},{id:1,HostPort:mongodb-1.mongodb-svc.mongodb-xxx.svc.cluster.local:27017,ArbiterOnly:falsePriority:1,Hidden:false
,SecondaryDelaySecs:0,Votes:1,Tags:map[]},{id:2,HostPort:mongodb-2.mongodb-svc.mongodb-xxx.svc.cluster.local:27017,ArbiterOnly:falsePriority:1,Hidden:false,SecondaryD
elaySecs:0,Votes:1,Tags:map[]},{id:100,HostPort:mongodb-arb-0.mongodb-svc.mongodb-xxx.svc.cluster.local:27017,ArbiterOnly:truePriority:1,Hidden:false,SecondaryDelaySe
cs:0,Votes:1,Tags:map[]},settings=map[]})]
    ['currentState.ReplSetConf' = <nil>]
]
[2024-09-09T17:40:52.041+0000] [.info] [src/director/director.go:planAndExecute:575] <mongodb-arb-0> [17:40:52.041] Step=WaitRsInit as part of Move=WaitRsInit in plan fai
led : <mongodb-arb-0> [17:40:52.041] Postcondition not yet met for step WaitRsInit because
['currentState.ReplSetConf' = <nil>].
 Recomputing a plan...
[2024-09-09T17:40:52.362+0000] [.warn] [metrics/collector/util.go:getPingStatus:84] <hardwareMetricsCollector> [17:40:52.362] Failed to fetch replStatus for mongodb-arb-0
 : <hardwareMetricsCollector> [17:40:52.362] Error executing WithClientFor() for cp=mongodb-arb-0.mongodb-svc.mongodb-surplus.svc.cluster.local:27017 (local=false) connec
tMode=SingleConnect : <hardwareMetricsCollector> [17:40:52.362] Error running command for runCommandWithTimeout(dbName=admin, cmd=[{replSetGetStatus 1}]) : result={} iden
tityUsed=__system@local[[MONGODB-CR/SCRAM-SHA-1 SCRAM-SHA-256]][668] : (NotYetInitialized) no replset config has been received
[2024-09-09T17:40:52.678+0000] [.info] [src/config/config.go:ReadClusterConfig:433] [17:40:52.678] Retrieving cluster config from /var/lib/automation/config/cluster-confi
g.json...
[2024-09-09T17:40:52.678+0000] [.info] [main/components/agent.go:LoadClusterConfig:277] [17:40:52.678] clusterConfig unchanged
[2024-09-09T17:40:53.072+0000] [.info] [src/mongoclientservice/mongoclientservice.go:func1:1619] [17:40:53.072] Testing auth with username __system db=local to mongodb-ar
b-0.mongodb-svc.mongodb-xxx.svc.cluster.local:27017 (local=false) connectMode=SingleConnect ipversion=0 tls=false
[2024-09-09T17:40:53.081+0000] [.info] [src/mongoctl/processctl.go:GetKeyHashes:2080] <mongodb-arb-0> [17:40:53.081] Able to successfully auth to mongodb-arb-0.mongodb-sv
c.mongodb-xxx.svc.cluster.local:27017 (local=false) using desired auth key
[2024-09-09T17:40:53.108+0000] [.info] [src/mongoctl/processctl.go:Update:3555] <mongodb-arb-0> [17:40:53.108] <DB_WRITE> Updated with query map[] and update [{$set [{age
ntFeatures [StateCache]} {nextVersion 32}]}] and upsert=true on local.clustermanager
[2024-09-09T17:40:53.125+0000] [.info] [src/director/director.go:computePlan:278] <mongodb-arb-0> [17:40:53.125] ... process has a plan : WaitRsInit,WaitFeatureCompatibil
ityVersionCorrect
[2024-09-09T17:40:53.125+0000] [.info] [src/director/director.go:tracef:806] <mongodb-arb-0> [17:40:53.125] Running step: 'WaitRsInit' of move 'WaitRsInit'
[2024-09-09T17:40:53.125+0000] [.info] [src/director/director.go:tracef:806] <mongodb-arb-0> [17:40:53.125] because
[All the following are true:
    ['currentState.Up' = true]
    ['currentState.CanRsInit' = false]
    ['desiredState.ReplSetConf' != <nil> ('desiredState.ReplSetConf' = ReplSetConfig{id=mongodb,version=0,commitmentStatus=false,configsvr=false,protocolVersion=1,forcePr
otocolVersion=false,writeConcernMajorityJournalDefault=,members={id:0,HostPort:mongodb-0.mongodb-svc.mongodb-xxx.svc.cluster.local:27017,ArbiterOnly:falsePriority:1,H
idden:false,SecondaryDelaySecs:0,Votes:1,Tags:map[]},{id:1,HostPort:mongodb-1.mongodb-svc.mongodb-xxx.svc.cluster.local:27017,ArbiterOnly:falsePriority:1,Hidden:false
,SecondaryDelaySecs:0,Votes:1,Tags:map[]},{id:2,HostPort:mongodb-2.mongodb-svc.mongodb-xxx.svc.cluster.local:27017,ArbiterOnly:falsePriority:1,Hidden:false,SecondaryD
elaySecs:0,Votes:1,Tags:map[]},{id:100,HostPort:mongodb-arb-0.mongodb-svc.mongodb-xxx.svc.cluster.local:27017,ArbiterOnly:truePriority:1,Hidden:false,SecondaryDelaySe
cs:0,Votes:1,Tags:map[]},settings=map[]})]
    ['currentState.ReplSetConf' = <nil>]
]
[2024-09-09T17:40:53.125+0000] [.info] [src/director/director.go:planAndExecute:575] <mongodb-arb-0> [17:40:53.125] Step=WaitRsInit as part of Move=WaitRsInit in plan fai
led : <mongodb-arb-0> [17:40:53.125] Postcondition not yet met for step WaitRsInit because
['currentState.ReplSetConf' = <nil>].
 Recomputing a plan...
[2024-09-09T17:40:53.364+0000] [.warn] [metrics/collector/util.go:getPingStatus:84] <hardwareMetricsCollector> [17:40:53.364] Failed to fetch replStatus for mongodb-arb-0
 : <hardwareMetricsCollector> [17:40:53.364] Error executing WithClientFor() for cp=mongodb-arb-0.mongodb-svc.mongodb-xxx.svc.cluster.local:27017 (local=false) connec
tMode=SingleConnect : <hardwareMetricsCollector> [17:40:53.364] Error running command for runCommandWithTimeout(dbName=admin, cmd=[{replSetGetStatus 1}]) : result={} iden
tityUsed=__system@local[[MONGODB-CR/SCRAM-SHA-1 SCRAM-SHA-256]][668] : (NotYetInitialized) no replset config has been received
[2024-09-09T17:40:53.718+0000] [.info] [src/config/config.go:ReadClusterConfig:433] [17:40:53.718] Retrieving cluster config from /var/lib/automation/config/cluster-confi
g.json...
[2024-09-09T17:40:53.718+0000] [.info] [main/components/agent.go:LoadClusterConfig:277] [17:40:53.718] clusterConfig unchanged
[2024-09-09T17:40:54.157+0000] [.info] [src/mongoclientservice/mongoclientservice.go:func1:1619] [17:40:54.157] Testing auth with username __system db=local to mongodb-ar
b-0.mongodb-svc.mongodb-xxx.svc.cluster.local:27017 (local=false) connectMode=SingleConnect ipversion=0 tls=false
[2024-09-09T17:40:54.166+0000] [.info] [src/mongoctl/processctl.go:GetKeyHashes:2080] <mongodb-arb-0> [17:40:54.166] Able to successfully auth to mongodb-arb-0.mongodb-sv
c.mongodb-xxx.svc.cluster.local:27017 (local=false) using desired auth key
[2024-09-09T17:40:54.191+0000] [.info] [src/mongoctl/processctl.go:Update:3555] <mongodb-arb-0> [17:40:54.191] <DB_WRITE> Updated with query map[] and update [{$set [{age
ntFeatures [StateCache]} {nextVersion 32}]}] and upsert=true on local.clustermanager
[2024-09-09T17:40:54.203+0000] [.info] [src/director/director.go:computePlan:278] <mongodb-arb-0> [17:40:54.203] ... process has a plan : WaitRsInit,WaitFeatureCompatibil
ityVersionCorrect
  • You might not have the verbose ones, in that case the non-verbose agent logs:
    • kubectl exec -it mongo-0 -c mongodb-agent -- cat /var/log/mongodb-mms-automation/automation-agent.log
[2024-09-09T17:37:58.248+0000] [header.info] [::0]        GitCommitId = 25bb5320d7087c7aa24eb6118df217a028238723
[2024-09-09T17:37:58.248+0000] [header.info] [::0]  AutomationVersion = 12.0.15.7646
[2024-09-09T17:37:58.248+0000] [header.info] [::0]          localhost = mongodb-arb-0.mongodb-svc.mongodb-xxx.svc.cluster.local
[2024-09-09T17:37:58.249+0000] [header.info] [::0] ErrorStateSleepTime = 10s
[2024-09-09T17:37:58.249+0000] [header.info] [::0] GoalStateSleepTime = 10s
[2024-09-09T17:37:58.249+0000] [header.info] [::0] NotGoalStateSleepTime = 1s
[2024-09-09T17:37:58.249+0000] [header.info] [::0]     PlanCutoffTime = 300000
[2024-09-09T17:37:58.249+0000] [header.info] [::0]       TracePlanner = false
[2024-09-09T17:37:58.249+0000] [header.info] [::0]               User = mongodb
[2024-09-09T17:37:58.249+0000] [header.info] [::0]         Go version = go1.18.5
[2024-09-09T17:37:58.249+0000] [header.info] [::0]         MmsBaseURL =
[2024-09-09T17:37:58.249+0000] [header.info] [::0]         MmsGroupId =
[2024-09-09T17:37:58.249+0000] [header.info] [::0]          HttpProxy =
[2024-09-09T17:37:58.249+0000] [header.info] [::0] DisableHttpKeepAlive = false
[2024-09-09T17:37:58.249+0000] [header.info] [::0]        HttpsCAFile =
[2024-09-09T17:37:58.249+0000] [header.info] [::0] TlsRequireValidMMSServerCertificates = true
[2024-09-09T17:37:58.249+0000] [header.info] [::0] TlsMMSServerClientCertificate =
[2024-09-09T17:37:58.249+0000] [header.info] [::0] KMIPProxyCertificateDir = /tmp
[2024-09-09T17:37:58.249+0000] [header.info] [::0] EnableLocalConfigurationServer = false
[2024-09-09T17:37:58.249+0000] [header.info] [::0] DialTimeoutSeconds = 40
[2024-09-09T17:37:58.249+0000] [header.info] [::0] KeepUnusedMongodbVersions = false
[2024-09-09T17:37:58.249+0000] [header.info] [::0] DisallowDowngrades = false
[2024-09-09T17:37:59.378+0000] [.error] [src/mongoctl/processctl.go:RunCommand:1120] <hardwareMetricsCollector> [17:37:59.378] Server at mongodb-arb-0.mongodb-svc.mongodb
-xxx.svc.cluster.local:27017 (local=false) is down
[2024-09-09T17:37:59.479+0000] [.error] [src/mongoctl/processctl.go:RunCommand:1120] <hardwareMetricsCollector> [17:37:59.479] Server at mongodb-arb-0.mongodb-svc.mongodb
-xxx.svc.cluster.local:27017 (local=false) is down
[2024-09-09T17:38:00.430+0000] [.error] [src/mongoctl/processctl.go:RunCommand:1120] <hardwareMetricsCollector> [17:38:00.430] Server at mongodb-arb-0.mongodb-svc.mongodb
-xxx.svc.cluster.local:27017 (local=false) is down
[2024-09-09T17:38:00.531+0000] [.error] [src/mongoctl/processctl.go:RunCommand:1120] <hardwareMetricsCollector> [17:38:00.531] Server at mongodb-arb-0.mongodb-svc.mongodb
-xxx.svc.cluster.local:27017 (local=false) is down
[2024-09-09T17:38:01.461+0000] [.error] [src/mongoctl/processctl.go:RunCommand:1120] <hardwareMetricsCollector> [17:38:01.461] Server at mongodb-arb-0.mongodb-svc.mongodb
-xxx.svc.cluster.local:27017 (local=false) is down
[2024-09-09T17:38:01.569+0000] [.error] [src/mongoctl/processctl.go:RunCommand:1120] <hardwareMetricsCollector> [17:38:01.569] Server at mongodb-arb-0.mongodb-svc.mongodb
-xxx.svc.cluster.local:27017 (local=false) is down
[2024-09-09T17:38:02.385+0000] [.error] [src/mongoctl/processctl.go:RunCommand:1120] <hardwareMetricsCollector> [17:38:02.385] Server at mongodb-arb-0.mongodb-svc.mongodb
-xxx.svc.cluster.local:27017 (local=false) is down
[2024-09-09T17:38:02.487+0000] [.error] [src/mongoctl/processctl.go:RunCommand:1120] <hardwareMetricsCollector> [17:38:02.487] Server at mongodb-arb-0.mongodb-svc.mongodb
-xxx.svc.cluster.local:27017 (local=false) is down
@nammn
Copy link
Collaborator

nammn commented Sep 9, 2024

@KarooolisZi thanks for opening the issue! We will try to have a look at it.

@nammn
Copy link
Collaborator

nammn commented Sep 9, 2024

@KarooolisZi can you also please supply the agenthealth status file and agent logs (you can see more in the github issue template on how to retrieve them)

@KarooolisZi
Copy link
Author

Hi @nammn updated issue with required information

@KarooolisZi
Copy link
Author

@nammn any updates?

@KarooolisZi
Copy link
Author

Due to mentioned reason my 3 member set crashed. I can't revive them because first node is created as secondary (they had similar priority) and yet I can't initiate elections as primary is not recreated

@KarooolisZi
Copy link
Author

@nammn any updates?

Copy link
Contributor

This issue is being marked stale because it has been open for 60 days with no activity. Please comment if this issue is still affecting you. If there is no change, this issue will be closed in 30 days.

@github-actions github-actions bot added the stale label Nov 16, 2024
@KarooolisZi
Copy link
Author

@nammn Hello, is there any progress?

@GotoUnsigned
Copy link

Hey i'm facing the same issue, is there any news ?

@GotoUnsigned
Copy link

@KarooolisZi Hey, i just found a way to make it work, a painfull one but it does work.
You need to delete previously created PVC of the mongodb replicasets and redeploy with arbiter. Now the arbiter is in the config, of said replicasets.

Maybe it's only the config file of each replicaset that u need to delete ? I just deleted all PVC and it worked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants