Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

etcd notifies readiness to systemd before version becomes available #9007

Closed
discordianfish opened this issue Dec 13, 2017 · 7 comments
Closed

Comments

@discordianfish
Copy link
Contributor

discordianfish commented Dec 13, 2017

When etcd starts up, it notifies systemd that it's ready to accept requests. At this point though, requests to /version can still return an empty struct.

I filled coreos/bugs#2286 suspecting an issue with the systemd unit but that doesn't appear to be the case. Lemme copy the relevant bits:

Specifically I'm running kubeadm in a unit file with After=etcd-member.service, yet sometimes it get started too early and fails with:

Dec 13 12:52:01 ip-172-20-144-6 kubeadm[858]:         [ERROR ExternalEtcdVersion]: couldn't parse external etcd version "": Version string empty
@gyuho
Copy link
Contributor

gyuho commented Dec 14, 2017

Double-checked etcd code, and everything looks correct. We start client listener first and version handler. And then, notify systemd.

Any possibility that etcd stopped when kubeadm sends requests to /version?

After= only checks the unit start, but not the successful activation. Requires= seems more strict?

If this unit gets activated, the units listed here will be activated as well. If one of the other units fails to activate, and an ordering dependency After= on the failing unit is set, this unit will not be started.

https://www.freedesktop.org/software/systemd/man/systemd.unit.html

@xiang90
Copy link
Contributor

xiang90 commented Dec 14, 2017

@discordianfish Can you provide the etcd version you use, and the steps to reproduce the problem?

@discordianfish
Copy link
Contributor Author

Thanks for the quick response! I'm using the etcd-wrapper on coreos 1520.8.0. If etcd would have been stopped at this point, I would have gotten a different error. I'll try to use Requires= and see if it at least makes the problem more obvious, as well as getting logs for both kubeadm and etcd-member to see how events line up. Will update you next time I get to work on this.

@discordianfish
Copy link
Contributor Author

So yeah with Requires= it fails because etcd takes a few restarts to come up. So I definitely need some more coordination to fix my problem. That being said, there is definitely some scenario where etcd response with an empty version struct because that seems the only path to get to this error, otherwise this: https://github.com/kubernetes/kubernetes/blob/38e33513126e2090b578531b7c3919348bf6b167/cmd/kubeadm/app/preflight/checks.go#L740 would already fail due to status: https://github.com/kubernetes/kubernetes/blob/38e33513126e2090b578531b7c3919348bf6b167/cmd/kubeadm/app/preflight/checks.go#L825 or unmarshalling: https://github.com/kubernetes/kubernetes/blob/38e33513126e2090b578531b7c3919348bf6b167/cmd/kubeadm/app/preflight/checks.go#L829

@discordianfish
Copy link
Contributor Author

I assume it wouldn't work to make etcd signal ready only if it reached quorum? That would make my setup "just work": itskoko/kubecfn#4

@xiang90
Copy link
Contributor

xiang90 commented Jan 16, 2018

@discordianfish

Please provide an isolated way to reproduce the problem as the etcd github issue guide suggests. If this is not an etcd issue, we should close it.

@discordianfish
Copy link
Contributor Author

discordianfish commented Jan 17, 2018

So the actual bug here is that etcd seems to respond with an empty version when it's starting up. But since this isn't my root problem, I won't have time to reproduce this. Will close this issue for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants