-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
set maintenance mode or stop corosync/pacemaker on update #563
Conversation
block do | ||
%x{#{command}} | ||
node.run_state['found_ha_packages'] = $?.exitstatus | ||
Chef::Log.debug("Run #{command}, got exit status #{node.run_state['found_ha_packages']}.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style/StringLiteralsInInterpolation: Prefer double-quoted strings inside interpolations. (https://github.com/SUSE/style-guides/blob/master/Ruby.md#stylestringliteralsininterpolation)
@@ -91,6 +120,12 @@ | |||
end # block | |||
end # ruby_block | |||
|
|||
%w(corosync pacemaker).each do |s| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style/WordArray: Use []
for an array of words. (https://github.com/SUSE/style-guides/blob/master/Ruby.md#stylewordarray)
only_if do | ||
ha = node.run_state["found_ha_packages"] | ||
is_cluster = !search(:node, "roles:pacemaker-cluster-member").empty? | ||
return !ha && is_cluster && node.run_state["needs_update"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dont return here
end # ruby_block | ||
|
||
service "pacemaker" do | ||
action :start | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not if is a remote node
needs rebase + adam patch + some changes around |
::Chef::Recipe.send(:include, CrowbarPacemaker::MaintenanceModeHelpers) | ||
::Chef::Resource.send(:include, CrowbarPacemaker::MaintenanceModeHelpers) | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style/EmptyLines: Extra blank line detected.
::Chef::Recipe.send(:include, CrowbarPacemaker::MaintenanceModeHelpers) | ||
::Chef::Resource.send(:include, CrowbarPacemaker::MaintenanceModeHelpers) | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style/EmptyLines: Extra blank line detected.
@aspiers ready for re-review. Spent most of the day testing the different scenarios and it seems to work perfectly. On a node part of a cluster, non-remote: Sadly, there is some code duplication in regards to https://github.com/crowbar/crowbar-ha/blob/master/chef/cookbooks/crowbar-pacemaker/libraries/maintenance_mode_helpers.rb#L60 This is because the helper on crowbar-pacemaker is already using an So instead we create several resources that get triggered in the case that maintenance mode is required due to the late evaluation of the variables. I could not find a better way of doing it while respecting the chef workflow. |
service s do | ||
action :stop | ||
only_if { node.run_state["found_ha_packages"] } | ||
not_if { node[:pacemaker][:is_remote] } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should make sure that node[:pacemaker]
exists; If I'm not missing sometthing, this could could be executed at non-pacemaker nodes as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Missing exceptions for non-cluster nodes in some of the resources! |
true | ||
end | ||
only_if do | ||
ha = node.run_state["found_ha_packages"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ha
seems bit confusing, could you rename this variable? (Or just not create it at all and directly do a check for run_state at line 104)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
end | ||
only_if do | ||
ha = node.run_state["found_ha_packages"] | ||
is_cluster = !search(:node, "roles:pacemaker-cluster-member").empty? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wtf, this is wrong. I wanted to check if the node is part of a cluster, this will just search for any nodes that have that role ¬_¬
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, true. Look also at the same search used earlier.
@@ -17,6 +17,9 @@ | |||
# limitations under the License. | |||
# | |||
|
|||
::Chef::Recipe.send(:include, CrowbarPacemaker::MaintenanceModeHelpers) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious - how is this different from ::Chef::Recipe.include CrowbarPacemaker::MaintenanceModeHelpers
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Chef-ism. http://lists.opscode.com/sympa/arc/chef/2011-05/msg00286.html
Actually, I think that due the restructuring of the resources in my last update this is not required, I'll retry with the usual include
to see
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed anymore, reverting to the usual include
@Itxaka I'm still a bit confused how this is supposed to handle the remote case? |
@aspiers Lets see, for remote case, there should be no pacemaker/corosync services runnig so it wont stop them. The question is, for remote nodes, when upgrading them, do we need to set the node into maintenance mode as well? The HA guide does not mention remote nodes so Im a bit lost in there and what to do with them. |
… nodes (bsc#983617) Make the update barclamp aware of HA nodes and deal with them properly. When HA packages need to be updated, stop the HA services before the update and start them again after the package have been updated. When normal packages are updated, set the node in maintenance mode as mentioned on the HA guide. Also do not stop or set maintenance mode if the node is a remote_node. Updates on normal nodes should not be affected by this changes.
end | ||
end | ||
|
||
["corosync", "pacemaker"].each do |s| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The HA docs only say to stop pacemaker, why are both here? Could you add a comment?
command += '|egrep -q "corosync|pacemaker"' | ||
system("zypper #{command}") | ||
# exit 0: found, 1 not found | ||
node.run_state["found_ha_packages"] = $?.exitstatus ? true : false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree with hound here, I don't think $? is a cryptic perlism.
But I think the logic is wrong - $?.exitstatus produces the numeric exit code, and grep will return 1 if no matches were found, so 1 ? true : false
results in true, so the pacemaker services get stopped even when it's not necessary.
Also, if needs_update ends up being false you could probably skip the extra zypper call.
true | ||
end | ||
only_if do | ||
is_cluster = node.role? "pacemaker-cluster-member" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With crowbar/crowbar-ha#187 this could probably include pacemaker-remote roles too, yes?
# HA packages are NOT gonna be updated | ||
# And Node is part of a cluster | ||
# And there is packages to update | ||
execute "crm --wait node maintenance" do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could add the remote node name following what crowbar/crowbar-ha#187 did
Revisiting this due to Bug 1061834 – installing updates causes HA failures. @Itxaka Any chance you could amend the commit message to contain the full bsc URL? Thanks! |
Ummm, seems that the branch for this PR migth have been removed somehow, so I would need to create a different PR with that branch covered somehow I guess? |
doesnt seem to be needed feel free to restore if so |
As per all the docs by pacemaker, the services should be stopped for a software update of the cluster stack. If other packages are being updated but not the cluster stack, then it's enough to set maintenance mode.
FIXME: however if the update is happening on a remote node, then the
crm
command is unavailable so we need to set maintenance mode another way. UPDATE: something like crowbar/crowbar-ha#187 might fix this.Depends on
crowbar/crowbar-ha#183