-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nova adoption ffu (no extra cell) #192
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6,7 +6,12 @@ | |
|
||
## Variables | ||
|
||
(There are no shell variables necessary currently.) | ||
Define the shell variables used in the Fast-forward upgrade steps below. | ||
The values are just illustrative, use values that are correct for your environment: | ||
|
||
```bash | ||
PODIFIED_DB_ROOT_PASSWORD=$(oc get -o json secret/osp-secret | jq -r .data.DbRootPassword | base64 -d) | ||
``` | ||
|
||
## Pre-checks | ||
|
||
|
@@ -308,3 +313,124 @@ EOF | |
``` | ||
oc wait --for condition=Ready osdpns/openstack --timeout=30m | ||
``` | ||
|
||
## Nova compute services fast-forward upgrade from Wallaby to Antelope | ||
|
||
Nova services rolling upgrade cannot be done during adoption, | ||
there is in a lock-step with Nova control plane services, because those | ||
are managed independently by EDPM ansible, and Kubernetes operators. | ||
Nova service operator and OpenStack Dataplane operator ensure upgrading | ||
is done independently of each other, by configuring | ||
`[upgrade_levels]compute=auto` for Nova services. Nova control plane | ||
services apply the change right after CR is patched. Nova compute EDPM | ||
services will catch up the same config change with ansible deployment | ||
later on. | ||
|
||
> **NOTE**: Additional orchestration happening around the FFU workarounds | ||
> configuration for Nova compute EDPM service is a subject of future changes. | ||
|
||
* Wait for cell1 Nova compute EDPM services version updated (it may take some time): | ||
|
||
```bash | ||
oc exec -it mariadb-openstack-cell1 -- mysql --user=root --password=${PODIFIED_DB_ROOT_PASSWORD} \ | ||
-e "select a.version from nova_cell1.services a join nova_cell1.services b where a.version!=b.version and a.binary='nova-compute';" | ||
``` | ||
The above query should return an empty result as a completion criterion. | ||
|
||
* Remove pre-FFU workarounds for Nova control plane services: | ||
|
||
```yaml | ||
oc patch openstackcontrolplane openstack -n openstack --type=merge --patch ' | ||
spec: | ||
nova: | ||
template: | ||
cellTemplates: | ||
cell0: | ||
conductorServiceTemplate: | ||
customServiceConfig: | | ||
[workarounds] | ||
disable_compute_service_check_for_ffu=false | ||
cell1: | ||
metadataServiceTemplate: | ||
customServiceConfig: | | ||
[workarounds] | ||
disable_compute_service_check_for_ffu=false | ||
conductorServiceTemplate: | ||
customServiceConfig: | | ||
[workarounds] | ||
disable_compute_service_check_for_ffu=false | ||
apiServiceTemplate: | ||
customServiceConfig: | | ||
[workarounds] | ||
disable_compute_service_check_for_ffu=false | ||
metadataServiceTemplate: | ||
customServiceConfig: | | ||
[workarounds] | ||
disable_compute_service_check_for_ffu=false | ||
schedulerServiceTemplate: | ||
customServiceConfig: | | ||
[workarounds] | ||
disable_compute_service_check_for_ffu=false | ||
' | ||
``` | ||
|
||
* Wait for Nova control plane services' CRs to become ready: | ||
|
||
```bash | ||
oc wait --for condition=Ready --timeout=300s Nova/nova | ||
``` | ||
|
||
* Remove pre-FFU workarounds for Nova compute EDPM services: | ||
|
||
```yaml | ||
oc apply -f - <<EOF | ||
apiVersion: v1 | ||
kind: ConfigMap | ||
metadata: | ||
name: nova-compute-ffu | ||
namespace: openstack | ||
data: | ||
20-nova-compute-cell1-ffu-cleanup.conf: | | ||
[workarounds] | ||
disable_compute_service_check_for_ffu=false | ||
Comment on lines
+383
to
+395
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add a note that this is currently needed due to a bug in the config handling in the edpm_nova role. The proper solution is removing the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think I'm confused now. Please see my comment above |
||
--- | ||
apiVersion: dataplane.openstack.org/v1beta1 | ||
kind: OpenStackDataPlaneService | ||
metadata: | ||
name: nova-compute-ffu | ||
namespace: openstack | ||
spec: | ||
label: nova.compute.ffu | ||
configMaps: | ||
- nova-compute-ffu | ||
secrets: | ||
- nova-cell1-compute-config | ||
- nova-migration-ssh-key | ||
playbook: osp.edpm.nova | ||
--- | ||
apiVersion: dataplane.openstack.org/v1beta1 | ||
kind: OpenStackDataPlaneDeployment | ||
metadata: | ||
name: openstack-nova-compute-ffu | ||
namespace: openstack | ||
spec: | ||
nodeSets: | ||
- openstack | ||
servicesOverride: | ||
- nova-compute-ffu | ||
EOF | ||
``` | ||
|
||
* Wait for Nova compute EDPM service to become ready: | ||
|
||
```bash | ||
oc wait --for condition=Ready osdpd/openstack-nova-compute-ffu --timeout=5m | ||
``` | ||
|
||
* Run Nova DB online migrations to complete FFU: | ||
|
||
```bash | ||
oc exec -it nova-cell0-conductor-0 -- nova-manage db online_data_migrations | ||
oc exec -it nova-cell1-conductor-0 -- nova-manage db online_data_migrations | ||
``` | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,121 @@ | ||
- name: set podified MariaDB copy shell vars | ||
no_log: "{{ use_no_log }}" | ||
ansible.builtin.set_fact: | ||
mariadb_copy_shell_vars: | | ||
PODIFIED_DB_ROOT_PASSWORD="{{ podified_db_root_password }}" | ||
|
||
- name: wait for cell1 Nova compute EDPM services version updated | ||
ansible.builtin.shell: | | ||
{{ shell_header }} | ||
{{ oc_header }} | ||
{{ mariadb_copy_shell_vars }} | ||
oc rsh mariadb-openstack-cell1 mysql --user=root --password=${PODIFIED_DB_ROOT_PASSWORD} \ | ||
-e "select a.version from nova_cell1.services a join nova_cell1.services b where a.version!=b.version and a.binary='nova-compute';" | ||
register: records_check_results | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The conductor log shows that the issue is earlier than the compute adoption. Show the new k8s control plane has a wrong / incomplete DB setup as the conductor cannot talk to its DB. Wondering how the db sync on that same DB was run successfully. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The Nova CR status is Ready so there was a succesfully db sync run. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I checked the logs and dumps but I don't see why the cell1 conductor cannot connect to the DB. Unfortunately all the passwords are masked in must gather so I cannot check those. @marios if you have a held node with this issue then I can check the creds there There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @gibizer thanks for having a look sorry we were trying to get to a solution and missed your comments. in the end it was a dns issue resolved with https://github.com/openstack-k8s-operators/data-plane-adoption/pull/218/files green run there if you want to poke at logs https://review.rdoproject.org/zuul/build/87df5976f8814ea9a319eea1caececb2/artifacts There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The green result nova-compute logs looks good to me! |
||
until: records_check_results.rc == 0 and records_check_results.stdout_lines | length == 0 | ||
retries: 20 | ||
delay: 6 | ||
|
||
- name: remove pre-FFU workarounds for Nova control plane services | ||
ansible.builtin.shell: | | ||
{{ shell_header }} | ||
{{ oc_header }} | ||
oc patch openstackcontrolplane openstack -n openstack --type=merge --patch ' | ||
spec: | ||
nova: | ||
template: | ||
cellTemplates: | ||
cell0: | ||
conductorServiceTemplate: | ||
customServiceConfig: | | ||
[workarounds] | ||
disable_compute_service_check_for_ffu=false | ||
cell1: | ||
metadataServiceTemplate: | ||
customServiceConfig: | | ||
[workarounds] | ||
disable_compute_service_check_for_ffu=false | ||
conductorServiceTemplate: | ||
customServiceConfig: | | ||
[workarounds] | ||
disable_compute_service_check_for_ffu=false | ||
apiServiceTemplate: | ||
customServiceConfig: | | ||
[workarounds] | ||
disable_compute_service_check_for_ffu=false | ||
metadataServiceTemplate: | ||
customServiceConfig: | | ||
[workarounds] | ||
disable_compute_service_check_for_ffu=false | ||
schedulerServiceTemplate: | ||
customServiceConfig: | | ||
[workarounds] | ||
disable_compute_service_check_for_ffu=false | ||
' | ||
|
||
- name: Wait for Nova control plane services' CRs to become ready | ||
ansible.builtin.include_role: | ||
name: nova_adoption | ||
tasks_from: wait.yaml | ||
|
||
- name: remove pre-FFU workarounds for Nova compute EDPM services | ||
ansible.builtin.shell: | | ||
{{ shell_header }} | ||
{{ oc_header }} | ||
oc apply -f - <<EOF | ||
apiVersion: v1 | ||
kind: ConfigMap | ||
metadata: | ||
name: nova-compute-ffu | ||
namespace: openstack | ||
data: | ||
20-nova-compute-cell1-ffu-cleanup.conf: | | ||
[workarounds] | ||
disable_compute_service_check_for_ffu=false | ||
--- | ||
apiVersion: dataplane.openstack.org/v1beta1 | ||
kind: OpenStackDataPlaneService | ||
metadata: | ||
name: nova-compute-ffu | ||
namespace: openstack | ||
spec: | ||
label: nova.compute.ffu | ||
configMaps: | ||
- nova-compute-ffu | ||
secrets: | ||
- nova-cell1-compute-config | ||
- nova-migration-ssh-key | ||
playbook: osp.edpm.nova | ||
--- | ||
apiVersion: dataplane.openstack.org/v1beta1 | ||
kind: OpenStackDataPlaneDeployment | ||
metadata: | ||
name: openstack-nova-compute-ffu | ||
namespace: openstack | ||
spec: | ||
nodeSets: | ||
- openstack | ||
servicesOverride: | ||
- nova-compute-ffu | ||
EOF | ||
|
||
- name: wait for Nova compute EDPM services to become ready | ||
ansible.builtin.shell: | | ||
{{ shell_header }} | ||
{{ oc_header }} | ||
oc wait --for condition=Ready osdpd/openstack-nova-compute-ffu --timeout=5m | ||
register: nova_ffu_edpm_result | ||
until: nova_ffu_edpm_result is success | ||
retries: 10 | ||
delay: 6 | ||
|
||
- name: run Nova DB migrations to complete Wallaby->antelope FFU | ||
ansible.builtin.shell: | | ||
{{ shell_header }} | ||
{{ oc_header }} | ||
oc rsh nova-cell0-conductor-0 nova-manage db online_data_migrations | ||
oc rsh nova-cell1-conductor-0 nova-manage db online_data_migrations | ||
register: nova_exec_result | ||
until: nova_exec_result is success | ||
retries: 10 | ||
delay: 6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here and below: you don't need an explicit
disable_compute_service_check_for_ffu=false
as false is the default. I suggest to just drop the content ofcustomServiceConfig
field.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The edpm-nova config cleanup bug mentioned below does not effect the k8s control plane.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how am I supposed to drop this section for EDPM side of things? I'm not certain patching osdpservices is a valid approach.
Neither can we do "removing the nova-compute-ffu from the OpenStackDataPlaneService and doing a deployment".
This is the 1st place we refer to it to be deployed, nothing to remove.
We could remove the nova-extra-config service from the existing osdpns, and make it deploying the standard nova servcice. But patching osdpns is antipattern?..