You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Recently, cl2 must have gone down. supervisord had started properly, but before elasticsearch - or before elasticsearch had started properly. This resulted in FATAL states on our supervised software which all relies on ES (currently).
However, running supervisorctl reload started everything fine, as ES had actually started - just too late.
options:
1/ put ES under supervisord - may be painful, already has good daemonisation of its own
2/ make sure supervisord starts after elasticsearch in the init.d order - may be problematic, as elasticsearch not only needs to "start starting up" but also start up "properly" for software to be able to connect to it
3/ delay supervisord startup by 30-60 seconds - simple, a bit brittle but shown to work in other cases
4/ start ES under supervisord with task dependency (if this is supported) - i.e. oag, oag-celery and oag-celery-flower depend on ES, so don't even bother trying them until ES has been "RUNNING" for a minute (or until a certain condition is true, e.g. port 9200 responds).
Based on time, will do 3) for now. Maybe 4) later, which essentially a fancy 3) with retries and conditions.
The text was updated successfully, but these errors were encountered:
ES probably does start before supervisord, but when ES starts up it takes a while to make the indices available, if there are a lot of them. In this case it will respond with 500s to requests, and so our services would not start successfully.
So the best option may be to use a script that supervisord calls, and in that do a check on whether or not ES is up and running and if not start it then try again after 60 seconds. (This is as you say, 3 for now maybe 4 if useful)
Recently, cl2 must have gone down. supervisord had started properly, but before elasticsearch - or before elasticsearch had started properly. This resulted in FATAL states on our supervised software which all relies on ES (currently).
However, running supervisorctl reload started everything fine, as ES had actually started - just too late.
options:
1/ put ES under supervisord - may be painful, already has good daemonisation of its own
2/ make sure supervisord starts after elasticsearch in the init.d order - may be problematic, as elasticsearch not only needs to "start starting up" but also start up "properly" for software to be able to connect to it
3/ delay supervisord startup by 30-60 seconds - simple, a bit brittle but shown to work in other cases
4/ start ES under supervisord with task dependency (if this is supported) - i.e. oag, oag-celery and oag-celery-flower depend on ES, so don't even bother trying them until ES has been "RUNNING" for a minute (or until a certain condition is true, e.g. port 9200 responds).
Based on time, will do 3) for now. Maybe 4) later, which essentially a fancy 3) with retries and conditions.
The text was updated successfully, but these errors were encountered: