Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SUSE Support and crmsh #175

Closed
marcelmamula opened this issue Dec 20, 2023 · 22 comments
Closed

SUSE Support and crmsh #175

marcelmamula opened this issue Dec 20, 2023 · 22 comments

Comments

@marcelmamula
Copy link
Contributor

Hello Team,

I am working on sap-linuxlab (community.sap_install) and our plan is to make sure that role sap_ha_pacemaker_cluster, which consumes fedora.linux_system_roles, is correctly working on SUSE systems.

I noticed that groundwork for adoption of non-pcs steps was already done thanks to Sean in #122, so adoption would consist of:

  • splitting of main.yml based on os_family, because it contains pcs related tasks directly (non-generic names)
  • creating variables for Suse
  • creating separate tasks under shell_crmsh for crmsh, corosync, sbd
  • replicating cluster setup for SUSE from sap-linuxlab/community.sles-for-sap/cluster

There are few things that I wanted to ask for clarification, before proceeding with any changes in fork:

  1. Are there any ongoing efforts for SUSE support, or is this free range to pickup?
  2. ha_cluster and idempotency?
@richm
Copy link
Contributor

richm commented Dec 20, 2023

Are there any ongoing efforts for SUSE support, or is this free range to pickup?

Not that I know of - @sean-freeman or @berndfinger might know.

ha_cluster and idempotency?

The role is not currently idempotent for all operations - not sure which are idempotent and which are not - I don't know if you can make the crmsh implementation idempotent without making the pcs implemetation idempotent

@sean-freeman
Copy link
Contributor

@richm I had started porting some code in line with the mentioned Ansible Role draft (keeping to the scope of ha_cluster to accept config and execute, i.e. not adding any of the commands for a specific platform). However there were other priorities in the SAP LinuxLab Open-Source initiative that needed attention first, so I would probably classify this as free range.

I would probably suggest that ha_cluster is psuedo-idempotent in the sense that, if you execute with the same variables you will arrive at the same outcome. However if on multiple runs some variable is changed, it will alter the config file, and the cibadmin will (without mercy or checks) replace any existing config with that new definition.

@marcelmamula
Copy link
Contributor Author

Thank you both for your comments, I will proceed with with work on my end to see how this can be implemented.

@tomjelinek
Copy link
Member

if you execute with the same variables you will arrive at the same outcome. However if on multiple runs some variable is changed, it will alter the config file, and the cibadmin will (without mercy or checks) replace any existing config with that new definition.

That's correct. When you run the role repeatedly with the same variables, you'll get the same outcome and the role should not even restart cluster daemons and thus it should not disrupt cluster operations. However, the role removes any cluster configuration not specified in the role variables. This applies to CIB, corosync and other cluster components. Changing some components' (corosync, sbd) configuration results in restarting those daemons.

@marcelmamula
Copy link
Contributor Author

@sean-freeman @tomjelinek @berndfinger Lot of idempotency is coming from fact that cluster config will be rebuilt using all those CIB tasks and then remain untouched if they are same.

How would this work combined with sap_install if you re-run them both @sean-freeman ? sap_ha_pacemaker_cluster is where all resources and pacemaker config is created and I dont see anywhere ha_cluster_resource_primitives or __sap_ha_pacemaker_cluster_resource_primitives are defined in examples, so I would assume it would result in CIB being replaced regardless.

If this destructive behavior in combination with sap_ha_pacemaker_cluster is expected, then there is not much value for us in implementing CIB steps under crmsh.

@sean-freeman
Copy link
Contributor

sean-freeman commented Jan 26, 2024

@tomjelinek Thinking about my comment / on re-visit to this GH Issue, it is a good idea to give the end-user a choice on how ha_cluster handles config changes?

The following would warrant a separate GH Issue to be raised.

Perhaps "Feature Request: ha_cluster_config_change_then_halt" would protect an end-user in case of config drift / the variables changed by accident (for whatever reason), and the end-user performs re-run but does not want to replace the working config. By default I imagine such a variable would be false, to preserve the "psuedo-idempotence" and not create a breaking change.

In case such a feature exists already and I cannot see it, please excuse me and please correct me.

As @marcelmamula perceives, and other end-users may also, the behaviour of ha_cluster could be considered "destructive" as there is no safety net in case any variable was changed.


@marcelmamula Best to move any continued discussion specific to SAP, into the sap_install GH Issues or Discussions. Will answer here for posterity.

On it's own, the Ansible Role sap_ha_pacemaker_cluster is relatively benign but does the heavy lifting on inputs/config that should be executed, and all the execution to configure Linux Pacemaker is done by Ansible Role ha_cluster (at the moment some is done from pcs PCMK Shell and other commands at the cib PCMK homogenous binaries level).

The Ansible Role sap_ha_pacemaker_cluster really does 2 things only:

  1. Prepare input variables for Ansible Role ha_cluster to be executed successfully
  2. Prepare OS dependencies (i.e. OS Packages for Fence/Resource Agents and CLIs) required for SAP and the chosen target Infrastructure Platform

The 'private' var (prefix __) referenced __sap_ha_pacemaker_cluster_resource_primitives, is generated on-the-fly based upon an end-user request to configure different SAP Software scenarios (SAP HANA, SAP NetWeaver ABAP ASCS/ERS etc), and a given target Infrastructure Platform (e.g. AWS, GCP, MS AZ, IBM Cloud, IBM PowerVC etc etc).

So the same applies.... if an end-user:

  • does not change variables for sap_ha_pacemaker_cluster, then ha_cluster is executed and nothing changes (as @tomjelinek confirms)
  • accidentally alters variables for sap_ha_pacemaker_cluster, then ha_cluster is executed and the new config pushed.... which could be destructive (that would depend on exactly what vars were altered).

@marcelmamula
Copy link
Contributor Author

Thank you @sean-freeman for lengthy explanation even though I managed to get to explanations in meantime.

@tomjelinek @sean-freeman I was asking about destructiveness and idempotence mainly because whole cib creation sequence is using pcs working on temporary configuration xml, leaving whole cluster intact. This is not possible with SUSE crm unless you start doing some extensive jinja2 changes without running crm configure

But end is most likely same, because even crm_diff at end of cib build considers all changes as changes and pushes them as patch.

@sean-freeman
Copy link
Contributor

sean-freeman commented Jan 26, 2024

@marcelmamula Can you expand your explanation? From what I investigated (a long time ago), any pcs command has an equal crmsh command?

Looking in the repository, there are a limited number of pcs shell commands in use. See equivilant below for crmsh:

# PCS Shell                   # CRMSH Shell

pcs cluster                   crm node
pcs constraint <type>         crm configure <type>
pcs property                  crm configure property
pcs qdevice                   crm cluster init qdevice
pcs quorum                    corosync-quorumtool / corosync-qnetd-tool
pcs resource                  crm ra
pcs status                    crm status
pcs stonith                   crm ra

Or do you mean it is not possible for crmsh to create/export each command to XML configurations and build the whole configuration to push to cib?

@marcelmamula
Copy link
Contributor Author

Yes @sean-freeman , My comment was mainly about -f to do changes in xml separate from live cluster cib.

@sean-freeman
Copy link
Contributor

@marcelmamula Isn't 'Shadow CRB usage' the equivalent ?

@marcelmamula
Copy link
Contributor Author

marcelmamula commented Jan 26, 2024

@sean-freeman You are correct, shadow cib is possible, but it is limited to interactive crm configure mode, because you have to do all changes before commit. I tried using it on command line as ansible would, but shadow was not kept after commands.

Update: I am testing with -c and it seems to be working, I will update if it is usable.
crm configure cib new shd
crm -c shd configure show

@marcelmamula
Copy link
Contributor Author

@tomjelinek I have started going through Contribution documentation and I have hit snag with tox availability on SUSE. Would it be fine with doing linting check and leave rest of tests for git actions when PR is submitted?

Also ha_cluster is not using FQDN module names so all new tasks I added contain them to avoid linting errors. Are you planning to switch to FQDN roles in future?

@tomjelinek
Copy link
Member

@marcelmamula As long as your code passes CI eventually, it doesn't really matter to me whether you test locally or use git actions in a PR.

I think that not using FQDN names in a role for its own modules is a system roles convention.

@richm Could you comment on these two questions?

@marcelmamula
Copy link
Contributor Author

@tomjelinek Thank you.
My FQDN question was mainly about usual ansible.builtin modules that are triggering warning.

@tomjelinek
Copy link
Member

@sean-freeman

Perhaps "Feature Request: ha_cluster_config_change_then_halt" would protect an end-user in case of config drift / the variables changed by accident (for whatever reason), and the end-user performs re-run but does not want to replace the working config. By default I imagine such a variable would be false, to preserve the "psuedo-idempotence" and not create a breaking change.

In case such a feature exists already and I cannot see it, please excuse me and please correct me.

This looks like Ansible check mode to me. The ha_cluster role supports check mode, even though some tasks are impossible to be done in check mode. You also need to know a bit about how the role works, that it builds configuration in temp files and then pushes it to cluster / nodes.

@tomjelinek
Copy link
Member

@marcelmamula Sorry for being late to the pcs -f / crm -c discussion. Does the crm -c approach work for you?

I don't think you need to strictly follow pcs at all costs. Some parts of the role are very pcs specific, e.g. pcsd configuration. There was no demand for crmsh support when the role got started, so it wasn't designed with crmsh in mind.

@marcelmamula
Copy link
Contributor Author

marcelmamula commented Jan 30, 2024

@marcelmamula Sorry for being late to the pcs -f / crm -c discussion. Does the crm -c approach work for you?

I don't think you need to strictly follow pcs at all costs. Some parts of the role are very pcs specific, e.g. pcsd configuration. There was no demand for crmsh support when the role got started, so it wasn't designed with crmsh in mind.

@tomjelinek I have tested out multiple approaches and settled down with crm -c and crm_diff on cib vs shadow. It seems to be working good including rc == 1 check.

Example for cib group:

ansible.builtin.command:
    cmd: |
      crm -c {{ __ha_cluster_crm_shadow }} configure group
      {{ resource_group.id | quote }} {% for resource in resource_group.resource_ids %}
      {{ resource | quote }} {% endfor %} \
      {% if resource_group.meta_attrs[0].attrs | default(False) %}
        meta {% for attr in resource_group.meta_attrs[0].attrs -%}
          {{ attr.name | quote }}={{ attr.value | quote }} {% endfor %}
      {% endif %}

There are some pcs specific like tickets, sets and others so I will leave them as empty placeholders for now or remove completely wherever possible.

@richm
Copy link
Contributor

richm commented Jan 30, 2024

@tomjelinek I have started going through Contribution documentation and I have hit snag with tox availability on SUSE.

you can't install python pip, then do pip install --user tox, on SUSE?

Would it be fine with doing linting check and leave rest of tests for git actions when PR is submitted?

Yes, I suppose so, but it will be painful to debug, if you find some odd issue, and have to iterate repeatedly by pushing new code to a PR, and cannot reproduce the error locally.

Also ha_cluster is not using FQDN module names so all new tasks I added contain them to avoid linting errors. Are you planning to switch to FQDN roles in future?

In general, you can use the short name for Ansible built-in modules/plugins, and use the FQCN for non-built-in modules/plugins. The Ansible team have told me that short names for built-ins will be supported for the foreseeable future. If/when Ansible changes to require the use of FQCN for everything, we will have tooling to automatically convert everything, because we will have to do that for the other 29 system roles at the same time.

@richm
Copy link
Contributor

richm commented Jan 30, 2024

@tomjelinek Thank you. My FQDN question was mainly about usual ansible.builtin modules that are triggering warning.

That should not be happening with ansible-lint because we suppress that warning - https://github.com/linux-system-roles/ha_cluster/blob/main/.ansible-lint#L16
What is the message you are seeing? Is it from ansible-lint?

@marcelmamula
Copy link
Contributor Author

@richm @tomjelinek
FQDN:
It seems my lint config was inherited from top folder, which does not have fqcn-builtins. Do you have any issue with me keeping all new files with fqdn?

Tox/Testing:
pip3 tox can be installed fine, issue starts with Use yum or dnf to install standard-test-roles-inventory-qemu since this is in fedora repo only. That package seems to be requirement to actually test using tox.

@richm
Copy link
Contributor

richm commented Jan 30, 2024

@richm @tomjelinek FQDN: It seems my lint config was inherited from top folder,

Not sure what you mean by that.

which does not have fqcn-builtins. Do you have any issue with me keeping all new files with fqdn?

Yes, you can use ansible.builtin.xxx instead of xxx. That should not cause any issues.

Tox/Testing: pip3 tox can be installed fine, issue starts with Use yum or dnf to install since this is in fedora repo only. That package seems to be requirement to actually test using tox.

It is a requirement to use qemu. But it is possible to install the qemu and kvm and seabios packages separately. So if you want to be able to run

tox -e qemu-ansible-core-2.16 -- --image-file /path/to/suse.qcow2 tests/tests_crmsh.yml

then you will need to install the qemu, kvm, seabios, whatever other packages you need to run qcow2 based VMs on your development platform.

You do not need standard-test-roles-inventory-qemu in order to run tox -e collection,ansible-lint-collection

@marcelmamula
Copy link
Contributor Author

Thank you @sean-freeman @richm @tomjelinek for #186.
I will close this issue and further changes will go separately from this original issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants