Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elasticsearch stop/start/restart #1990

Merged
merged 23 commits into from
Aug 6, 2018
Merged
Show file tree
Hide file tree
Changes from 21 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
b923dd9
Add action tags
pr33thi Jul 30, 2018
54ddeac
Create framework for new elasticsearch commands
pr33thi Jul 30, 2018
d8ef759
Move everything into plays, call the play instead of the module directly
pr33thi Jul 30, 2018
c02a114
Call es_rolling_restart when elasticsearch service command is called
pr33thi Jul 30, 2018
5cd16e9
[pv-deploy] Created new encrypted secrets file
pr33thi Jul 30, 2018
aeb21b2
Stop and start pillows along with es
pr33thi Jul 31, 2018
5a25dc6
Exit if error in stopping/starting pillows
pr33thi Jul 31, 2018
7ec1c42
Cleanup
pr33thi Jul 31, 2018
80d21be
Remove debug statement
pr33thi Aug 1, 2018
9fa7e23
Rename es_instances to es_pids
pr33thi Aug 1, 2018
bed0323
Use the cli ask function
pr33thi Aug 1, 2018
e9e1724
Change indentation level of ask prompt
pr33thi Aug 1, 2018
db13c7e
Merge branch 'master' into pv/es-2
pr33thi Aug 1, 2018
786e839
Change elif to if so that both get hit for restarts
pr33thi Aug 1, 2018
9328828
Fix process killing
pr33thi Aug 1, 2018
4de9d7b
Remove .travis/secrets.tar.enc
pr33thi Aug 1, 2018
2222719
Remove restart-elasticsearch
pr33thi Aug 1, 2018
ecdfce5
Update docs
pr33thi Aug 1, 2018
6fe068b
Add back the origin secrets.tar.enc file
pr33thi Aug 1, 2018
9a8c2ae
Start es before pillows, and consolidate pillow code
pr33thi Aug 2, 2018
26605f1
Use stop and start strings instead of the action variable, for restart
pr33thi Aug 2, 2018
beb719c
Merge branch 'master' into pv/es-2
pr33thi Aug 6, 2018
ba1d401
Specify the Pillow service directly
pr33thi Aug 6, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/basics/0002-installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ [email protected]:~$ commcare-cloud -h
usage: commcare-cloud [-h] [--control]

{64-test,development,echis,icds,icds-new,pna,production,softlayer,staging,swiss}
{bootstrap-users,ansible-playbook,django-manage,aps,tmux,ap,validate-environment-settings,restart-elasticsearch,deploy-stack,service,update-supervisor-confs,update-users,ping,migrate_couchdb,lookup,run-module,update-config,mosh,after-reboot,ssh,downtime,fab,update-local-known-hosts,migrate-couchdb,run-shell-command}
{bootstrap-users,ansible-playbook,django-manage,aps,tmux,ap,validate-environment-settings,deploy-stack,service,update-supervisor-confs,update-users,ping,migrate_couchdb,lookup,run-module,update-config,mosh,after-reboot,ssh,downtime,fab,update-local-known-hosts,migrate-couchdb,run-shell-command}
...

```
Expand Down
27 changes: 0 additions & 27 deletions docs/changelog/index.md

This file was deleted.

31 changes: 4 additions & 27 deletions docs/commcare-cloud/commands/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ All `commcare-cloud` commands take the following form:
```
commcare-cloud [--control]
<env>
{bootstrap-users,ansible-playbook,django-manage,aps,tmux,ap,validate-environment-settings,restart-elasticsearch,deploy-stack,service,update-supervisor-confs,update-users,ping,migrate_couchdb,lookup,run-module,update-config,copy-files,mosh,after-reboot,ssh,downtime,fab,update-local-known-hosts,list-dbs,migrate-couchdb,run-shell-command}
{bootstrap-users,ansible-playbook,django-manage,aps,tmux,ap,validate-environment-settings,deploy-stack,service,update-supervisor-confs,update-users,ping,migrate_couchdb,lookup,run-module,update-config,copy-files,mosh,after-reboot,ssh,downtime,fab,update-local-known-hosts,list-dbs,migrate-couchdb,run-shell-command}
...
```

Expand Down Expand Up @@ -716,29 +716,6 @@ for more detail in what can go here.
authenticate using the pem file (or prompt for root password if there is no pem file)


#### `restart-elasticsearch`

Do a rolling restart of elasticsearch.

```
commcare-cloud <env> restart-elasticsearch [--use-factory-auth]
```

**This command is deprecated.** Use

```
commcare-cloud <env> service elasticsearch restart
```

instead.

##### Optional Arguments

###### `--use-factory-auth`

authenticate using the pem file (or prompt for root password if there is no pem file)


#### `bootstrap-users`

Add users to a set of new machines as root.
Expand Down Expand Up @@ -865,8 +842,8 @@ Manage services.
```
commcare-cloud <env> service [--only PROCESS_PATTERN]

{celery,commcare,couchdb,couchdb2,elasticsearch,formplayer,kafka,nginx,pillowtop,postgresql,rabbitmq,redis,riakcs,touchforms,webworker}
[{celery,commcare,couchdb,couchdb2,elasticsearch,formplayer,kafka,nginx,pillowtop,postgresql,rabbitmq,redis,riakcs,touchforms,webworker} ...]
{celery,commcare,couchdb,couchdb2,elasticsearch,elasticsearch-classic,formplayer,kafka,nginx,pillowtop,postgresql,rabbitmq,redis,riakcs,touchforms,webworker}
[{celery,commcare,couchdb,couchdb2,elasticsearch,elasticsearch-classic,formplayer,kafka,nginx,pillowtop,postgresql,rabbitmq,redis,riakcs,touchforms,webworker} ...]
{start,stop,restart,status,help}
```

Expand All @@ -888,7 +865,7 @@ service and the `pgbouncer` service. We'll call the actual services

##### Positional Arguments

###### `{celery,commcare,couchdb,couchdb2,elasticsearch,formplayer,kafka,nginx,pillowtop,postgresql,rabbitmq,redis,riakcs,touchforms,webworker}`
###### `{celery,commcare,couchdb,couchdb2,elasticsearch,elasticsearch-classic,formplayer,kafka,nginx,pillowtop,postgresql,rabbitmq,redis,riakcs,touchforms,webworker}`

The name of the service group(s) to apply the action to.
There is a preset list of service groups that are supported.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,26 +6,43 @@
retries: 20
delay: 3
changed_when: result.stdout.find('"acknowledged":true') != -1
tags: action_stop

- name: stop node
become: true
service: name=elasticsearch state=stopped
tags: action_stop

- name: wait for a few seconds for ES to stop
pause: seconds=10
tags: action_stop

- name: get es instances to kill
shell: "ps aux | pgrep -f 'elasticsearc[h]'"
register: es_pids
failed_when: es_pids.rc != 0 and es_pids.rc != 1
tags: action_stop

- name: kill elasticsearch instances
shell: "pkill -f 'elasticsearc[h]'"
when: es_pids.rc != 0 and es_pids.rc != 1
tags: action_stop

- name: start node
become: true
service: name=elasticsearch state=started
tags: action_start

- debug: msg="Sometimes we try to start the node too soon. If hung start node manually"
tags: action_start

- name: wait for node to restart
shell: "curl -I -s -m 2 http://{{es_host}}:9200 | head -n 1"
register: result
until: result.stdout == "HTTP/1.1 200 OK"
retries: 200
delay: 3
tags: action_start

- name: enable cluster routing
shell: "curl -XPUT {{es_host}}:9200/_cluster/settings -d '{\"transient\" : {\"cluster.routing.allocation.enable\" : \"all\" }}'"
Expand All @@ -34,10 +51,12 @@
retries: 20
delay: 3
changed_when: result.stdout.find('"acknowledged":true') != -1
tags: action_start

- name: wait for cluster to stabilize
shell: "curl -s -m 2 {{es_host}}:9200/_cat/health | cut -d ' ' -f 4"
register: result
until: result.stdout.find("green") != -1
retries: 200
delay: 30
tags: action_start
26 changes: 0 additions & 26 deletions src/commcare_cloud/commands/ansible/ansible_playbook.py
Original file line number Diff line number Diff line change
Expand Up @@ -210,32 +210,6 @@ def run(self, args, unknown_args):
return AnsiblePlaybook(self.parser).run(args, unknown_args, always_skip_check=True)


class RestartElasticsearch(_AnsiblePlaybookAlias):
command = 'restart-elasticsearch'
help = """
Do a rolling restart of elasticsearch.

**This command is deprecated.** Use

```
commcare-cloud <env> service elasticsearch restart
```

instead.
"""

def run(self, args, unknown_args):
args.playbook = 'es_rolling_restart.yml'
if not ask('Have you stopped all the elastic pillows?', strict=True, quiet=args.quiet):
return 0 # exit code
puts(colored.yellow(
"This will cause downtime on the order of seconds to minutes,\n"
"except in a few cases where an index is replicated across multiple nodes."))
if not ask('Do a rolling restart of the ES cluster?', strict=True, quiet=args.quiet):
return 0 # exit code
return AnsiblePlaybook(self.parser).run(args, unknown_args)


class BootstrapUsers(_AnsiblePlaybookAlias):
command = 'bootstrap-users'
help = """
Expand Down
49 changes: 48 additions & 1 deletion src/commcare_cloud/commands/ansible/service.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
from abc import ABCMeta, abstractmethod, abstractproperty
from collections import defaultdict, OrderedDict
from itertools import groupby
import sys

import attr
import six
Expand All @@ -14,6 +15,7 @@
get_celery_workers,
get_pillowtop_processes
)
from commcare_cloud.cli_utils import ask
from commcare_cloud.commands.ansible.run_module import run_ansible_module
from commcare_cloud.commands.command_base import CommandBase, Argument
from commcare_cloud.environment.main import get_environment
Expand Down Expand Up @@ -274,10 +276,54 @@ class Nginx(AnsibleService):
inventory_groups = ['proxy']


class Elasticsearch(AnsibleService):
class ElasticsearchClassic(AnsibleService):
name = 'elasticsearch-classic'
service_name = 'elasticsearch'
inventory_groups = ['elasticsearch']


class Elasticsearch(ServiceBase):
name = 'elasticsearch'
service_name = 'elasticsearch'
inventory_groups = ['elasticsearch']

def execute_action(self, action, host_pattern=None, process_pattern=None):
if action == 'status':
return ElasticsearchClassic(self.environment, self.ansible_context).execute_action(action, host_pattern, process_pattern)
else:
if not ask(
"This function does more than stop and start the elasticsearch service. "
"For that, use elasticsearch-classic."
"\nStop will: stop pillows, stop es, and kill -9 if any processes still exist "
"after a period of time. "
"\nStart will start pillows and start elasticsearch "
"\nRestart is a stop followed by a start.\n Continue?", strict=False):
return 0 # exit code
if action == 'stop' or action == 'restart':
self._act_on_pillows(action='stop')
self._run_rolling_restart_yml(tags='action_stop')

if action == 'start' or action == 'restart':
self._run_rolling_restart_yml(tags='action_start')
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like elasticsearch is being started after pillows are started if i'm reading this correctly. we probably want to start elasticsearch first, and then start pillows

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come we want to do it in that order?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

most pillows populate data into elasticsearch, so for the time between the pillows starting and es starting, the pillows will just be creating errors. they will retry eventually but there's no point to starting a service if the service it depends on is down

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, done here: 9a8c2ae
(also refactored the pillow start/stop code in there)

self._act_on_pillows(action='start')

def _act_on_pillows(self, action):
# Used to stop or start pillows
ansible_context = AnsibleContext(None)
service = SERVICES_BY_NAME['pillowtop'](self.environment, ansible_context)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to reference it directly rather than looking it up in the dict?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, ba1d401

exit_code = service.run(action=action)
if not exit_code == 0:
print("ERROR while trying to {} pillows. Exiting.".format(action))
sys.exit(1)

def _run_rolling_restart_yml(self, tags):
from commcare_cloud.commands.ansible.ansible_playbook import run_ansible_playbook
run_ansible_playbook(environment=self.environment,
playbook='es_rolling_restart.yml',
ansible_context=AnsibleContext(args=None),
unknown_args=['--tags={}'.format(tags)],
skip_check=True)


class Couchdb(AnsibleService):
name = 'couchdb'
Expand Down Expand Up @@ -491,6 +537,7 @@ def get_processes_by_host(all_hosts, process_descriptors, process_pattern=None):
Couchdb2,
RabbitMq,
Elasticsearch,
ElasticsearchClassic,
Redis,
Riakcs,
Kafka,
Expand Down
3 changes: 1 addition & 2 deletions src/commcare_cloud/commcare_cloud.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@

from .commands.ansible.ansible_playbook import (
AnsiblePlaybook,
UpdateConfig, AfterReboot, RestartElasticsearch, BootstrapUsers, DeployStack,
UpdateConfig, AfterReboot, BootstrapUsers, DeployStack,
UpdateUsers, UpdateSupervisorConfs, UpdateLocalKnownHosts,
)
from commcare_cloud.commands.ansible.service import Service
Expand Down Expand Up @@ -53,7 +53,6 @@
DeployStack,
UpdateConfig,
AfterReboot,
RestartElasticsearch,
BootstrapUsers,
UpdateUsers,
UpdateSupervisorConfs,
Expand Down