-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the steps to reboot the computes after update. #2587
base: main
Are you sure you want to change the base?
Conversation
Thanks for the PR! ❤️ |
09bc6f0
to
0641dd5
Compare
Current tested with ping test running in the background and found not loss of connectivity. |
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/4f12a60d8dc64dad92d4f7b5f5bec990 ✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 40m 14s |
0641dd5
to
a22e794
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/bf06493eaeea430c898bf25520dfdd04 ✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 44m 32s |
a22e794
to
4a5a0f2
Compare
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/caf50b2e762a4aaeb3326c9399dffd15 ✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 21m 02s |
c056d3c
to
85e4367
Compare
PATH: "{{ cifmw_path | default(ansible_env.PATH) }}" | ||
ansible.builtin.command: >- | ||
{{ cifmw_update_oc_cmd_prefix }} | ||
get openstackdataplanedeployment {{ cifmw_reboot_dep_name }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get openstackdataplanedeployment {{ cifmw_reboot_dep_name }} | |
wait openstackdataplanedeployment {{ cifmw_reboot_dep_name }} | |
--for=condition=ready | |
--timeout={{ cifmw_update_timeout_reboot }}m |
With oc wait ansible log is more readable, as there no retires logged to output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch. Done.
edpm_reboot_strategy: force | ||
ansibleLimit: {{ cifmw_update_hypervisor_short_name }} | ||
|
||
- name: Apply the OpenStackDataPlaneDeployment CR to trigger a reboot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- name: Apply the OpenStackDataPlaneDeployment CR to trigger a reboot | |
- name: Create the OpenStackDataPlaneDeployment CR to trigger a reboot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one comment about waiting for reboot to complete
This PR is stale because it has been for over 15 days with no activity. |
/retest |
This sequence implements reboot of the compute nodes after the update. We have one instance created. If the hypervisor being rebooted has the instance that instance will be live-migrated to another hypervisor before the reboot and migrated back to that original hypervisor after the reboot. Some basic sanity checks are performed after the reboot and before the migration back to ensure that the necessary services are up and running. During the reboot we start two scripts. One monitors and log the reboot of the hypervisors. The other log where the instance is currently running. The log files can be found in `~/ci-framework-data/tests/update/` in `monitor_servers.log` and `monitor_vm_placement.log` respectively. A note about node evacuation. We are still using node evaction from the nova cli. This command has not been ported to the openstack cli. There's a discussion about it [on launchpad](https://bugs.launchpad.net/python-openstackclient/+bug/2055552). The official documentation mention only the live-migration path, but as we also use the live-migration in the test sequence that part is covered. We still expect customer to use the nova cli as it's way more user friendly and is still currently working. Closes: https://issues.redhat.com/browse/OSPRH-8937
/lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added some minor suggestions.
- name: Define command for OpenStack client interactions | ||
ansible.builtin.set_fact: | ||
cifmw_update_openstack_cmd: >- | ||
oc rsh -n {{ cifmw_update_namespace }} openstackclient openstack |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be a variable, not a default in the vars/main.yaml file.
set -o pipefail; | ||
{{ cifmw_update_openstack_cmd }} volume service list -f json | | ||
jq -r -c '.[] | select(.Binary | contains("cinder-volume")) | .Host' | ||
register: storage_backend |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you already added changed_when: false
to some tasks I guess it will be good to add it here too.
|
||
- name: Construct date string for the Reboot CR name | ||
ansible.builtin.set_fact: | ||
cifmw_update_cr_date: "{{ lookup('pipe', 'date +%Y%m%d%H%S') }}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's used one is usually less confusing to place it in the vars section of the task that uses it.
--- | ||
- name: Define command prefix for OpenShift operations | ||
ansible.builtin.set_fact: | ||
cifmw_update_oc_cmd_prefix: "oc -n {{ cifmw_update_namespace }}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another one that can be an internal var.
- name: Create the OpenStackDataPlaneDeployment CR used for reboot | ||
ansible.builtin.copy: | ||
dest: "{{ cifmw_update_artifacts_basedir }}/{{ cifmw_reboot_dep_name }}.yaml" | ||
content: | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Building a yaml (or json) with plain string manipulation is asking for trouble. Why not using to_nice yaml like this?
vars:
_content:
apiVersion: dataplane.openstack.org/v1beta1
kind: OpenStackDataPlaneDeployment
metadata:
name: "{{ cifmw_reboot_dep_name }}"
namespace: "{{ cifmw_update_namespace }}"
spec:
nodeSets: "{{ cifmw_update_node_sets.stdout | split('\n') }}"
servicesOverride:
- reboot-os
ansibleExtraVars:
edpm_reboot_strategy: force
ansibleLimit: {{ cifmw_update_hypervisor_short_name }}
ansible.builtin.copy:
dest: "{{ cifmw_update_artifacts_basedir }}/{{ cifmw_reboot_dep_name }}.yaml"
content: "{{ _content | to_nice_yaml }}
This sequence implements reboot of the compute nodes after the update.
We have one instance created. If the hypervisor being rebooted has
the instance that instance will be live-migrated to another hypervisor
before the reboot and migrated back to that original hypervisor after
the reboot.
Some basic sanity checks are performed after the reboot and before the
migration back to ensure that the necessary services are up and
running.
During the reboot we start two scripts. One monitors and log the
reboot of the hypervisors. The other log where the instance is
currently running. The log files can be found in
~/ci-framework-data/tests/update/
inmonitor_servers.log
andmonitor_vm_placement.log
respectively.A note about node evacuation. We are still using node evaction from
the nova cli. This command has not been ported to the openstack
cli. There's a discussion about it on launchpad.
The official documentation mention only the live-migration path, but
as we also use the live-migration in the test sequence that part is
covered. We still expect customer to use the nova cli as it's way
more user friendly and is still currently working.
Closes: https://issues.redhat.com/browse/OSPRH-8937