i have a question about nodes #2

jalpino-talo · 2022-08-27T04:46:17Z

Hello, first of all, thank you very much for the code, excellent project, I followed the readme to deploy the eks and everything works in principle, I manage to connect to the cluster and visualize the nodes but they appear with notReady status and the pods do not start.

According to what I was reading some logs, it seems to me that it is related to the vpc that does not connect, I think because of permissions. I leave you the logs that I have

9 node_lifecycle_controller.go:868] Node ip-10-0-82-117.us-east-2.compute.internal is NotReady as of 2022-08-27 04:29:02.250300267 +0000 UTC m=+440.833299. Adding it to the Taint queue.
I0827 04:29:02.250306 9 node_lifecycle_controller.go:868] Node ip-10-0-82-117.us-east-2.compute.internal is NotReady as of 2022-08-27 04:29:02.250300267 +0000 UTC m=+440.443833299. Adding it to the Taint queue.

No authorization-kubeconfig provided, so SubjectAccessReview of authorization tokens won't work.
W0827 04:26:05.630675 10 authorization.go:193] No authorization-kubeconfig provided, so SubjectAccessReview of authorization tokens won't work.

Zone not specified in configuration file; querying AWS metadata service
I0827 04:26:05.636204 10 aws.go:1297] Zone not specified in configuration file; querying AWS metadata service

error retrieving resource lock kube-system/cloud-controller-manager: Get "ht8.18:443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/cloud-controller-manager? timeout=5s": dial tcp 172.16.48.18:443: connect: connection refused
E0827 04:26:05.678379 10 leaderelection.go:330] error retrieving resource lock kube-system/cloud-controller-manager: Get "https://172.16.48.18:443/apis/coordination.k8s.io/v1/namespaces /kube-system/leases/cloud-controller-manager?timeout=5s": dial tcp 172.16.48.18:443: connect: connection refused

"Unable to schedule pod; no fit; waiting" pod="kube-system/coredns-5948f55769-fd6lq" err="0/2 nodes are available: 2 node(s) had taint {node.kubernetes.io/not- ready: }, that the pod didn't tolerate."
I0827 04:33:18.463511 11 factory.go:209] "Unable to schedule pod; no fit; waiting" pod="kube-system/coredns-5948f55769-fd6lq" err="0/2 nodes are available: 2 node( s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate."

"Unable to schedule pod; no fit; waiting" pod="kube-system/coredns-5948f55769-fd6lq" err="0/2 nodes are available: 2 node(s) had taint {node.kubernetes.io/not- ready: }, that the pod didn't tolerate."
I0827 04:36:18.468766 11 factory.go:209] "Unable to schedule pod; no fit; waiting" pod="kube-system/coredns-5948f55769-fd6lq" err="0/2 nodes are available: 2 node( s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate."

"Unable to schedule pod; no nodes are registered to the cluster; waiting" pod="kube-system/coredns-5db97b446d-p5ksx"
I0827 01:43:17.749524 11 factory.go:205] "Unable to schedule pod; no nodes are registered to the cluster; waiting" pod="kube-system/coredns-5db97b446d-p5ksx"

"Removed node in listed group from NodeTree" node="10.240.79.157" zone=""

Error in getting instanceID for node 10.240.79.157, error: Invalid format for AWS instance ()
E0827 01:19:12.931004 11 tagging_controller.go:221] Error in getting instanceID for node 10.240.79.157, error: Invalid format for AWS instance ()

0 actual_state_of_world.go:539] Failed to update statusUpdateNeeded field in actual state of world: Failed to set statusUpdateNeeded to needed true, because nodeName="10.240.79.157" does not exist
W0827 01:19:11.701642 10 actual_state_of_world.go:539] Failed to update statusUpdateNeeded field in actual state of world: Failed to set statusUpdateNeeded to needed true, because nodeName="10.240.79.157" does not exist

FOR ME THIS IS THE MOST RELATED ERROR

node_lifecycle_controller.go:868] Node ip-10-0-113-236.us-east-2.compute.internal is NotReady as of 2022-08-27 01:16:10.065968407 +0000 UTC m=+30464.742746419. Adding it to the Taint queue.
I0827 01:16:10.065973 10 node_lifecycle_controller.go:868] Node ip-10-0-113-236.us-east-2.compute.internal is NotReady as of 2022-08-27 01:16:10.065968407 +0000 UTC m=+30464.742746419. Adding it to the Taint queue.

I can also tell you that I had to add the keypair resource and add it to the group because the one you leave in the project gives me an error that it does not exist. Maybe that is related to not granting the correct permissions

I also have a script that activates the cluster logs in cloudwatch that I can leave you in a PR

Add a code to be able to manage the cluster with an sso user that I can leave you in a PR.

thanks,

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

i have a question about nodes #2

i have a question about nodes #2

jalpino-talo commented Aug 27, 2022

i have a question about nodes #2

i have a question about nodes #2

Comments

jalpino-talo commented Aug 27, 2022