set maintenance mode or stop corosync/pacemaker on update #563

Itxaka · 2016-07-22T10:06:17Z

As per all the docs by pacemaker, the services should be stopped for a software update of the cluster stack. If other packages are being updated but not the cluster stack, then it's enough to set maintenance mode.

FIXME: however if the update is happening on a remote node, then the crm command is unavailable so we need to set maintenance mode another way. UPDATE: something like crowbar/crowbar-ha#187 might fix this.

Depends on ~~crowbar/crowbar-ha#183~~

houndci-bot · 2016-07-22T10:06:20Z

chef/cookbooks/updater/recipes/default.rb

+        block do
+          %x{#{command}}
+          node.run_state['found_ha_packages'] = $?.exitstatus
+          Chef::Log.debug("Run #{command}, got exit status #{node.run_state['found_ha_packages']}.")


Style/StringLiteralsInInterpolation: Prefer double-quoted strings inside interpolations. (https://github.com/SUSE/style-guides/blob/master/Ruby.md#stylestringliteralsininterpolation)

houndci-bot · 2016-07-22T10:40:30Z

chef/cookbooks/updater/recipes/default.rb

@@ -91,6 +120,12 @@
      end # block
    end # ruby_block

+    %w(corosync pacemaker).each do |s|


Style/WordArray: Use [] for an array of words. (https://github.com/SUSE/style-guides/blob/master/Ruby.md#stylewordarray)

Itxaka · 2017-04-12T10:17:27Z

chef/cookbooks/updater/recipes/default.rb

+      only_if do
+        ha = node.run_state["found_ha_packages"]
+        is_cluster = !search(:node, "roles:pacemaker-cluster-member").empty?
+        return !ha && is_cluster && node.run_state["needs_update"]


dont return here

Itxaka · 2017-04-12T10:17:38Z

chef/cookbooks/updater/recipes/default.rb

    end # ruby_block

+    service "pacemaker" do
+      action :start
+    end


not if is a remote node

Itxaka · 2017-04-12T11:33:40Z

needs rebase + adam patch + some changes around

houndci-bot · 2017-04-17T16:41:28Z

chef/cookbooks/updater/recipes/default.rb

+::Chef::Recipe.send(:include, CrowbarPacemaker::MaintenanceModeHelpers)
+::Chef::Resource.send(:include, CrowbarPacemaker::MaintenanceModeHelpers)
+
+


Style/EmptyLines: Extra blank line detected.

houndci-bot · 2017-04-17T16:41:28Z

chef/cookbooks/updater/recipes/default.rb

+::Chef::Recipe.send(:include, CrowbarPacemaker::MaintenanceModeHelpers)
+::Chef::Resource.send(:include, CrowbarPacemaker::MaintenanceModeHelpers)
+
+


Style/EmptyLines: Extra blank line detected.

Itxaka · 2017-04-17T16:49:33Z

@aspiers ready for re-review.

Spent most of the day testing the different scenarios and it seems to work perfectly.

On a node part of a cluster, non-remote:
When there are package to update but no ha_packages -> sets maintenance mode before update, then chef lifts the maintenance mode at the end of the run
When there are ha packages to update -> stops the services (pacemaker/corosync) then starts pacemaker after the update
Where there are no packages to update -> does nothing, duh!

Sadly, there is some code duplication in regards to https://github.com/crowbar/crowbar-ha/blob/master/chef/cookbooks/crowbar-pacemaker/libraries/maintenance_mode_helpers.rb#L60

This is because the helper on crowbar-pacemaker is already using an execute resource which cannot be called from a ruby_block.
Unfortunately, we need to wrap the crm maintenance mode in a chef resource block, as we need the previous steps to have run to obtain the proper variables in order to fire off the maintenance mode, and you cannot wrap an execute resource inside a ruby_block one (Actually you probably can, but its a huge hack and terrible)

So instead we create several resources that get triggered in the case that maintenance mode is required due to the late evaluation of the variables.

I could not find a better way of doing it while respecting the chef workflow.

jsuchome · 2017-04-18T08:54:53Z

chef/cookbooks/updater/recipes/default.rb

+      service s do
+        action :stop
+        only_if { node.run_state["found_ha_packages"] }
+        not_if { node[:pacemaker][:is_remote] }


I think you should make sure that node[:pacemaker] exists; If I'm not missing sometthing, this could could be executed at non-pacemaker nodes as well

Itxaka · 2017-04-18T09:08:24Z

Missing exceptions for non-cluster nodes in some of the resources!

jsuchome · 2017-04-18T09:09:39Z

chef/cookbooks/updater/recipes/default.rb

+        true
+      end
+      only_if do
+        ha = node.run_state["found_ha_packages"]


ha seems bit confusing, could you rename this variable? (Or just not create it at all and directly do a check for run_state at line 104)

Itxaka · 2017-04-18T11:08:28Z

chef/cookbooks/updater/recipes/default.rb

+      end
+      only_if do
+        ha = node.run_state["found_ha_packages"]
+        is_cluster = !search(:node, "roles:pacemaker-cluster-member").empty?


wtf, this is wrong. I wanted to check if the node is part of a cluster, this will just search for any nodes that have that role ¬_¬

Oh, true. Look also at the same search used earlier.

aspiers · 2017-04-18T13:47:39Z

chef/cookbooks/updater/recipes/default.rb

@@ -17,6 +17,9 @@
 # limitations under the License.
 #

+::Chef::Recipe.send(:include, CrowbarPacemaker::MaintenanceModeHelpers)


Just curious - how is this different from ::Chef::Recipe.include CrowbarPacemaker::MaintenanceModeHelpers ?

Chef-ism. http://lists.opscode.com/sympa/arc/chef/2011-05/msg00286.html

Actually, I think that due the restructuring of the resources in my last update this is not required, I'll retry with the usual include to see

Not needed anymore, reverting to the usual include

aspiers · 2017-04-18T14:10:01Z

@Itxaka I'm still a bit confused how this is supposed to handle the remote case?

Itxaka · 2017-04-18T14:17:17Z

@aspiers Lets see, for remote case, there should be no pacemaker/corosync services runnig so it wont stop them.

The question is, for remote nodes, when upgrading them, do we need to set the node into maintenance mode as well? The HA guide does not mention remote nodes so Im a bit lost in there and what to do with them.

… nodes (bsc#983617) Make the update barclamp aware of HA nodes and deal with them properly. When HA packages need to be updated, stop the HA services before the update and start them again after the package have been updated. When normal packages are updated, set the node in maintenance mode as mentioned on the HA guide. Also do not stop or set maintenance mode if the node is a remote_node. Updates on normal nodes should not be affected by this changes.

cmurphy · 2017-04-27T15:37:14Z

chef/cookbooks/updater/recipes/default.rb

+      end
+    end
+
+    ["corosync", "pacemaker"].each do |s|


The HA docs only say to stop pacemaker, why are both here? Could you add a comment?

cmurphy · 2017-04-27T15:44:33Z

chef/cookbooks/updater/recipes/default.rb

+        command += '|egrep -q "corosync|pacemaker"'
+        system("zypper #{command}")
+        # exit 0: found, 1 not found
+        node.run_state["found_ha_packages"] = $?.exitstatus ? true : false


I disagree with hound here, I don't think $? is a cryptic perlism.

But I think the logic is wrong - $?.exitstatus produces the numeric exit code, and grep will return 1 if no matches were found, so 1 ? true : false results in true, so the pacemaker services get stopped even when it's not necessary.

Also, if needs_update ends up being false you could probably skip the extra zypper call.

cmurphy · 2017-04-28T09:52:01Z

chef/cookbooks/updater/recipes/default.rb

+        true
+      end
+      only_if do
+        is_cluster = node.role? "pacemaker-cluster-member"


With crowbar/crowbar-ha#187 this could probably include pacemaker-remote roles too, yes?

cmurphy · 2017-04-28T10:03:36Z

chef/cookbooks/updater/recipes/default.rb

+    # HA packages are NOT gonna be updated
+    # And Node is part of a cluster
+    # And there is packages to update
+    execute "crm --wait node maintenance" do


Could add the remote node name following what crowbar/crowbar-ha#187 did

aspiers · 2017-10-05T13:21:21Z

Revisiting this due to Bug 1061834 – installing updates causes HA failures. @Itxaka Any chance you could amend the commit message to contain the full bsc URL? Thanks!

Itxaka · 2017-10-05T13:25:38Z

Ummm, seems that the branch for this PR migth have been removed somehow, so I would need to create a different PR with that branch covered somehow I guess?

Itxaka · 2017-10-05T13:48:08Z

Moved to #1353 @cmurphy comments are still unresolved on that PR

Itxaka · 2019-08-16T12:23:01Z

doesnt seem to be needed feel free to restore if so

Itxaka added wip do not merge yet labels Jul 22, 2016

houndci-bot reviewed Jul 22, 2016
View reviewed changes

Itxaka assigned nkrinner Jul 22, 2016

houndci-bot reviewed Jul 22, 2016
View reviewed changes

Itxaka commented Apr 12, 2017

View reviewed changes

Itxaka added the do not merge yet label Apr 12, 2017

houndci-bot reviewed Apr 17, 2017

View reviewed changes

Itxaka removed the do not merge yet label Apr 17, 2017

Itxaka requested review from vuntz, jsuchome and s-t-e-v-e-n-k April 17, 2017 16:41

jsuchome reviewed Apr 18, 2017

View reviewed changes

Itxaka added the do not merge yet label Apr 18, 2017

jsuchome reviewed Apr 18, 2017

View reviewed changes

Itxaka unassigned nkrinner Apr 18, 2017

Itxaka commented Apr 18, 2017

View reviewed changes

Itxaka removed the do not merge yet label Apr 18, 2017

aspiers reviewed Apr 18, 2017

View reviewed changes

cmurphy requested changes Apr 28, 2017

View reviewed changes

jsuchome added the needs backport to SOC7 (stable/4.0) label Jun 15, 2017

dirkmueller removed the needs backport to SOC7 (stable/4.0) label Oct 5, 2017

Itxaka closed this Aug 16, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

set maintenance mode or stop corosync/pacemaker on update #563

set maintenance mode or stop corosync/pacemaker on update #563

Itxaka commented Jul 22, 2016 •

edited by aspiers

Loading

houndci-bot Jul 22, 2016

houndci-bot Jul 22, 2016

Itxaka Apr 12, 2017

Itxaka Apr 12, 2017

Itxaka commented Apr 12, 2017

houndci-bot Apr 17, 2017

houndci-bot Apr 17, 2017

Itxaka commented Apr 17, 2017

jsuchome Apr 18, 2017

Itxaka Apr 18, 2017

Itxaka commented Apr 18, 2017

jsuchome Apr 18, 2017

Itxaka Apr 18, 2017

Itxaka Apr 18, 2017

jsuchome Apr 18, 2017

aspiers Apr 18, 2017

Itxaka Apr 18, 2017

Itxaka Apr 19, 2017

aspiers commented Apr 18, 2017

Itxaka commented Apr 18, 2017

cmurphy Apr 27, 2017

cmurphy Apr 27, 2017

cmurphy Apr 28, 2017

cmurphy Apr 28, 2017

aspiers commented Oct 5, 2017

Itxaka commented Oct 5, 2017

Itxaka commented Oct 5, 2017

Itxaka commented Aug 16, 2019

		::Chef::Recipe.send(:include, CrowbarPacemaker::MaintenanceModeHelpers)
		::Chef::Resource.send(:include, CrowbarPacemaker::MaintenanceModeHelpers)

set maintenance mode or stop corosync/pacemaker on update #563

set maintenance mode or stop corosync/pacemaker on update #563

Conversation

Itxaka commented Jul 22, 2016 • edited by aspiers Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Itxaka commented Apr 12, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Itxaka commented Apr 17, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Itxaka commented Apr 18, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aspiers commented Apr 18, 2017

Itxaka commented Apr 18, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aspiers commented Oct 5, 2017

Itxaka commented Oct 5, 2017

Itxaka commented Oct 5, 2017

Itxaka commented Aug 16, 2019

Itxaka commented Jul 22, 2016 •

edited by aspiers

Loading