Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubectl-wait-pods-any-ephemeral phase never completes #577

Open
mattmceuen opened this issue Jun 9, 2021 · 2 comments
Open

kubectl-wait-pods-any-ephemeral phase never completes #577

mattmceuen opened this issue Jun 9, 2021 · 2 comments
Assignees
Labels
bug Something isn't working priority/low Items that are considered non-critical for functionality, such as quality of life improvements size s
Milestone

Comments

@mattmceuen
Copy link
Contributor

When running the airshipctl gating locally, the phase plan progresses until the kubectl-wait-pods-any-ephemeral phase, which launches a generic container to wait for at least one pod to exist. I see from the airshipctl output that pods are successfully seen and the script appears to get to the end. However, airshipctl doesn't report that the phase is complete, and stalls there.

I get the same behavior consistently whether I run the phase plan, or re-run the kubectl-wait-pods-any-ephemeral phase by itself.

The generic container is completes successfully from a docker standpoint:

02bee10c7880        localhost/toolbox                                  "/usr/local/bin/conf…"   17 minutes ago      Exited (0) 17 minutes ago                          ecstatic_feistel
45620b49f7f4        localhost/toolbox                                  "/usr/local/bin/conf…"   25 minutes ago      Exited (0) 25 minutes ago                          serene_euclid
25131f421281        localhost/toolbox                                  "/usr/local/bin/conf…"   About an hour ago   Exited (0) About an hour ago                       elastic_kalam

Airshipctl phase run output:

madgin@piseag-01:~$ airshipctl phase run kubectl-wait-pods-any-ephemeral  
{"Message":"starting generic container","Operation":"GenericContainerStart","Timestamp":"2021-06-09T16:08:28.567785119-05:00","Type":"GenericContainerEvent"}
[airshipctl] 2021/06/09 16:08:28 Starting container with image: 'localhost/toolbox', cmd: '[]'
[airshipctl] 2021/06/09 21:08:30 Filtering input bundle by Group: , Version: , Kind: 
+ N=0
+ MAX_RETRY=30
+ DELAY=60
+ '[' 0 -ge 30 ]
+ kubectl --context ephemeral-cluster --request-timeout 10s get pods --all-namespaces -o name
+ wc -l
+ '[' 7 -ge 1 ]
+ kubectl --context ephemeral-cluster --request-timeout 10s get pods --all-namespaces
NAMESPACE     NAME                                READY   STATUS    RESTARTS   AGE
kube-system   coredns-66bff467f8-dtr68            0/1     Pending   0          8m41s
kube-system   coredns-66bff467f8-pjctm            0/1     Pending   0          8m41s
kube-system   etcd-ephemeral                      1/1     Running   0          8m51s
kube-system   kube-apiserver-ephemeral            1/1     Running   0          8m51s
kube-system   kube-controller-manager-ephemeral   1/1     Running   0          8m51s
kube-system   kube-proxy-8lh24                    1/1     Running   0          8m41s
kube-system   kube-scheduler-ephemeral            1/1     Running   0          8m51s
+ break
+ '[' 0 -ge 30 ]
(... never progresses from this point)

The docker container logs match the above, but appends the resource list:

apiVersion: config.kubernetes.io/v1alpha1                                                                                                                                                                                                                                      
kind: ResourceList                                                                                                                                                                                                                                                             
items: []                                                                                                                                                                                                                                                                      
functionConfig:                                                                                                                                                                                                                                                                
  apiVersion: v1                                                                                                                                                                                                                                                               
  data:                                                                                                                                                                                                                                                                        
    script: |                                                                                                                                                                                                                                                                  
      #!/bin/sh                                                                                                                                                                                                                                                                
      # Licensed under the Apache License, Version 2.0 (the "License");                                                                                                                                                                                                        
      # you may not use this file except in compliance with the License.                                                                                                                                                                                                       
      # You may obtain a copy of the License at                                                                                                                                                                                                                                
      #
      #     http://www.apache.org/licenses/LICENSE-2.0                                                                                                                                                                                                                         
      #                                                                                                                                                                                                                                                                        
      # Unless required by applicable law or agreed to in writing, software                                                                                                                                                                                                    
      # distributed under the License is distributed on an "AS IS" BASIS,                                                                                                                                                                                                      
      # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.                                                                                                                                                                                               
      # See the License for the specific language governing permissions and                                                                                                                                                                                                    
      # limitations under the License.                                                                                                                                                                                                                                         
      set -xe                                                                                                                                                                                                                                                                  
      N=0                                                                                                                                                                                                                                                                      
      MAX_RETRY=30                                                                                                                                                                                                                                                             
      DELAY=60                                                                                                                                                                                                                                                                 
      until [ "$N" -ge ${MAX_RETRY} ]                                                                                                                                                                                                                                          
      do                                                                                                                                                                                                                                                                       
          if [ "$(kubectl --context $KCTL_CONTEXT \                                                                                                                                                                                                                            
                    --request-timeout 10s \                                                                                                                                                                                                                                    
                    get pods \                                                                                                                                                                                                                                                 
                    --all-namespaces -o name | wc -l)" -ge "1" ]; then                                                                                                                                                                                                         
            kubectl --context $KCTL_CONTEXT --request-timeout 10s get pods --all-namespaces 1>&2                                                                                                                                                                               
            break                                                                                                                                                                                                                                                              
        fi                                                                                                                                                                                                                                                                     
        N=$((N+1))                                                                                                                                                                                                                                                             
        echo "$N: Retrying to get any pods" 1>&2
        sleep ${DELAY}
      done
      if [ "$N" -ge ${MAX_RETRY} ]; then
        echo "Could not get any pods" 1>&2
        exit 1
      fi
  kind: ConfigMap
  metadata:
    annotations:
      config.kubernetes.io/path: configmap_kubectl-wait-pods-any.yaml
    name: kubectl-wait-pods-any
@mattmceuen mattmceuen added bug Something isn't working triage Needs evaluation by project members labels Jun 9, 2021
@matthew-fuller
Copy link
Contributor

I can take a stab at this one. Please assign this to me.

@matthew-fuller
Copy link
Contributor

Hmm... maybe I just got lucky (or the problem has been resolved since this issue was created), but I haven't been able to reproduce this locally either by running the deploy-gating plan or by running the kubectl-wait-pods-any-ephemeral phase separately. In each case, the phase completes as expected as long as there's at least one pod in the cluster.

I'll check with @mattmceuen to see if this is still an issue, and if so, we can compare environments to see what's different.

@jezogwza jezogwza added priority/low Items that are considered non-critical for functionality, such as quality of life improvements and removed triage Needs evaluation by project members labels Jun 16, 2021
@jezogwza jezogwza added this to the Future milestone Jun 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working priority/low Items that are considered non-critical for functionality, such as quality of life improvements size s
Projects
None yet
Development

No branches or pull requests

3 participants