Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

service startup order - elasticsearch #3

Open
emanuil-tolev opened this issue Jun 20, 2013 · 1 comment
Open

service startup order - elasticsearch #3

emanuil-tolev opened this issue Jun 20, 2013 · 1 comment
Assignees

Comments

@emanuil-tolev
Copy link
Contributor

Recently, cl2 must have gone down. supervisord had started properly, but before elasticsearch - or before elasticsearch had started properly. This resulted in FATAL states on our supervised software which all relies on ES (currently).

However, running supervisorctl reload started everything fine, as ES had actually started - just too late.

options:
1/ put ES under supervisord - may be painful, already has good daemonisation of its own

2/ make sure supervisord starts after elasticsearch in the init.d order - may be problematic, as elasticsearch not only needs to "start starting up" but also start up "properly" for software to be able to connect to it

3/ delay supervisord startup by 30-60 seconds - simple, a bit brittle but shown to work in other cases

4/ start ES under supervisord with task dependency (if this is supported) - i.e. oag, oag-celery and oag-celery-flower depend on ES, so don't even bother trying them until ES has been "RUNNING" for a minute (or until a certain condition is true, e.g. port 9200 responds).

Based on time, will do 3) for now. Maybe 4) later, which essentially a fancy 3) with retries and conditions.

@markmacgillivray
Copy link
Member

ES probably does start before supervisord, but when ES starts up it takes a while to make the indices available, if there are a lot of them. In this case it will respond with 500s to requests, and so our services would not start successfully.

So the best option may be to use a script that supervisord calls, and in that do a check on whether or not ES is up and running and if not start it then try again after 60 seconds. (This is as you say, 3 for now maybe 4 if useful)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants