Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Minor?) Small "race" in EC2 discovery of master nodes #237

Open
ankon opened this issue Aug 24, 2015 · 3 comments
Open

(Minor?) Small "race" in EC2 discovery of master nodes #237

ankon opened this issue Aug 24, 2015 · 3 comments

Comments

@ankon
Copy link
Contributor

ankon commented Aug 24, 2015

Just got this now, and from what I see it is not in any way fatal as such, but to me it smells like there might be an issue either in ES itself, or in the discovery logic:

Node1:

[2015-08-24 16:35:06,991][INFO ][node                     ] [i-e9264844] version[1.7.1], pid[2426], build[b88f43f/2015-07-29T09:54:16Z]
[2015-08-24 16:35:06,992][INFO ][node                     ] [i-e9264844] initializing ...
[2015-08-24 16:35:13,188][INFO ][plugins                  ] [i-e9264844] loaded [lang-mvel, cloud-aws, mapper-attachments, mongodb-river], sites [head, river-mongodb]
[2015-08-24 16:35:13,329][INFO ][env                      ] [i-e9264844] using [1] data paths, mounts [[/opt/collaborne/data (/dev/xvdd)]], net usable_space [31.9gb], net total_space [31.9gb], types [xfs]
[2015-08-24 16:35:20,532][WARN ][script                   ] [i-e9264844] deprecated setting [script.disable_dynamic] is set, replace with fine-grained scripting settings (e.g. script.inline, script.indexed, script.file)
[2015-08-24 16:35:20,748][INFO ][node                     ] [i-e9264844] initialized
[2015-08-24 16:35:20,748][INFO ][node                     ] [i-e9264844] starting ...
[2015-08-24 16:35:20,964][INFO ][transport                ] [i-e9264844] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/10.0.20.233:9300]}
[2015-08-24 16:35:21,137][INFO ][discovery                ] [i-e9264844] elasticsearch/1UatDfVlSfyyE9R7M7xAbQ
[2015-08-24 16:35:27,074][INFO ][cluster.service          ] [i-e9264844] new_master [i-e9264844][1UatDfVlSfyyE9R7M7xAbQ][ip-10-0-20-233][inet[/10.0.20.233:9300]]{aws_availability_zone=eu-west-1a}, reason: zen-disco-join (elected_as_master)
[2015-08-24 16:35:27,113][INFO ][http                     ] [i-e9264844] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/10.0.20.233:9200]}
[2015-08-24 16:35:27,114][INFO ][node                     ] [i-e9264844] started
[2015-08-24 16:35:27,125][INFO ][gateway                  ] [i-e9264844] recovered [0] indices into cluster_state
[2015-08-24 16:35:28,413][INFO ][repositories             ] [i-e9264844] put repository [collaborne-data]

Node2:

[2015-08-24 16:35:05,897][INFO ][node                     ] [i-591ff6f5] version[1.7.1], pid[2423], build[b88f43f/2015-07-29T09:54:16Z]
[2015-08-24 16:35:05,898][INFO ][node                     ] [i-591ff6f5] initializing ...
[2015-08-24 16:35:12,344][INFO ][plugins                  ] [i-591ff6f5] loaded [lang-mvel, cloud-aws, mapper-attachments, mongodb-river], sites [head, river-mongodb]
[2015-08-24 16:35:12,472][INFO ][env                      ] [i-591ff6f5] using [1] data paths, mounts [[/opt/collaborne/data (/dev/xvdd)]], net usable_space [31.9gb], net total_space [31.9gb], types [xfs]
[2015-08-24 16:35:19,667][WARN ][script                   ] [i-591ff6f5] deprecated setting [script.disable_dynamic] is set, replace with fine-grained scripting settings (e.g. script.inline, script.indexed, script.file)
[2015-08-24 16:35:19,903][INFO ][node                     ] [i-591ff6f5] initialized
[2015-08-24 16:35:19,904][INFO ][node                     ] [i-591ff6f5] starting ...
[2015-08-24 16:35:20,244][INFO ][transport                ] [i-591ff6f5] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/10.0.21.123:9300]}
[2015-08-24 16:35:20,475][INFO ][discovery                ] [i-591ff6f5] elasticsearch/_Vuq_rgbTG2GcNtVsmVM_w
[2015-08-24 16:35:26,977][INFO ][discovery.ec2            ] [i-591ff6f5] failed to send join request to master [[i-e9264844][1UatDfVlSfyyE9R7M7xAbQ][ip-10-0-20-233][inet[/10.0.20.233:9300]]{aws_availability_zone=eu-west-1a}], reason [RemoteTransportException[[i-e9264844][inet[/10.0.20.233:9300]][internal:discovery/zen/join]]; nested: ElasticsearchIllegalStateException[Node [[i-e9264844][1UatDfVlSfyyE9R7M7xAbQ][ip-10-0-20-233][inet[/10.0.20.233:9300]]{aws_availability_zone=eu-west-1a}] not master for join request from [[i-591ff6f5][_Vuq_rgbTG2GcNtVsmVM_w][ip-10-0-21-123][inet[/10.0.21.123:9300]]{aws_availability_zone=eu-west-1b}]]; ], tried [3] times
[2015-08-24 16:35:31,848][INFO ][cluster.service          ] [i-591ff6f5] detected_master [i-e9264844][1UatDfVlSfyyE9R7M7xAbQ][ip-10-0-20-233][inet[/10.0.20.233:9300]]{aws_availability_zone=eu-west-1a}, added {[i-e9264844][1UatDfVlSfyyE9R7M7xAbQ][ip-10-0-20-233][inet[/10.0.20.233:9300]]{aws_availability_zone=eu-west-1a},}, reason: zen-disco-receive(from master [[i-e9264844][1UatDfVlSfyyE9R7M7xAbQ][ip-10-0-20-233][inet[/10.0.20.233:9300]]{aws_availability_zone=eu-west-1a}])
[2015-08-24 16:35:32,220][INFO ][http                     ] [i-591ff6f5] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/10.0.21.123:9200]}
[2015-08-24 16:35:32,223][INFO ][node                     ] [i-591ff6f5] started

Note the first "failed to send join request" in Node2's logs: the timing is awfully close to when the master actually says its a master. Times should be comparable, both nodes run ntp.

This is with elasticsearch-cloud-aws 2.7.0.

@whybangbang
Copy link

can you show your conf? maybe we have a common trouble

@ankon
Copy link
Contributor Author

ankon commented Aug 31, 2015

My configuration is fairly basic:

cloud:
        aws:
                protocol: https
                region: eu-west-1
        node:
                auto_attributes: true

discovery:
        type: ec2
        ec2:
                host_type: private_ip

Note though that the issue in this ticket isn't a problem as such, elasticsearch worked out the cluster configuration after all. It's mainly that there is a chance that something either in the cloud-aws or the core elasticsearch code does operations in the wrong order.

@whybangbang
Copy link

do you have solve it? we have a problem ,and i dont know how to solve it, can you help me ? We have to put a plugin to the produce envrionment, but when we stop a node, the log produce a error: "TransportService is closed stopped can't send request", and when we restart the node,it's a long time even 1 minute util cluster find the node, we don't know how to do we use AWS plugin elasticsearch 1.5.2

the conffig is :

discovery.type: "ec2"
discovery.ec2.host_type: "private_ip"
discovery.ec2.ping_timeout: "30s"
if discovery.ec2.ping_timeout is short, cluster can't build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants