Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nova: workloads adoption, mariadb pre-/post-checks #193

Merged

Conversation

bogdando
Copy link
Contributor

@bogdando bogdando commented Nov 6, 2023

  • MariaDB pre-/post-checks:

Get services topology specific configuration in
pull_openstack_configuration. Add missing role for that as well.

Split maridb source vs podified env vars shell headers to use it
in corresponding places.

Update and fix the composition of the services pre-check list to
execute it before stopping services.

Update and fix the composition of the list of the services to be
stopped (cannot pull data from stopped services).

Verify no dataplane disruptions during the FFU/adoption process.

Verify Nova services still control pre-created VM workload after
FFU/adotpion is done.

Replace oc run with oc exec in post checks. Fix 'oc rsh' which breaks
expected formatting for 'mysql -rs', when it is called from console vs
called by ansible shell module. Use instead 'oc exec' to keep post
checks working the same way in docs and executed via ansible tests.

  • Pre-launch test VM instance in dev env role.

Add a test role for development environment and mention it
in the docs TOC as well.

Defer pingtest until the network adoption done.
Defer cinder volumes until workloads with volumes adoption done.
Defer Nova RBD backend until workloads adoption from ceph to ceph done.

Use a copy of the install_yaml shell script to bootstrap a VM.
We cannot use it via intall_yamls devsetup make target because:

  • we need to wrap it with shell/ocp/etc headers
  • we need to use bash aliases ref to openstack commands instead of
    direct openstack CLI syntax
  • we should avoid SSH-ing to the source cloud VMs (standalone/computes)
    during EDPM adoption
  • install_yaml is going to be deprecated anyway
  • it's FIP ping test doesn't work and depends on the network adoption
    done
  • it's cinder volume/backup/snapshot/attach test doesn't work yet,
    which is OK for this simpe workload adoption case.

Jira: OSPRH-3123

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/473545be17aa41b384f26384d83f777c

data-plane-adoption-github-rdo-centos-9-extracted-crc FAILURE in 39m 19s

@bogdando bogdando force-pushed the mariadb_nova_adoption_checks branch from 07ae90c to 9844aa6 Compare November 8, 2023 16:37
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/376d35127a8b4a8190c8496b80e02030

data-plane-adoption-github-rdo-centos-9-extracted-crc FAILURE in 1h 30m 31s

@bogdando bogdando force-pushed the mariadb_nova_adoption_checks branch 2 times, most recently from 762b0ea to 3a00eeb Compare November 10, 2023 13:38
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/1425ecdf411a405ea509f230e0c6b581

data-plane-adoption-github-rdo-centos-9-extracted-crc FAILURE in 1h 36m 49s

@bogdando bogdando force-pushed the mariadb_nova_adoption_checks branch 4 times, most recently from 4e82073 to dadc737 Compare November 13, 2023 15:34
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/3e81a27e46b94c56aadd3f0ed8ddf18a

data-plane-adoption-github-rdo-centos-9-extracted-crc FAILURE in 1h 28m 47s

@bogdando bogdando force-pushed the mariadb_nova_adoption_checks branch from dadc737 to 5f62ac7 Compare November 14, 2023 16:45
Copy link

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.
Warning:
Error merging github.com/openstack-k8s-operators/data-plane-adoption for 193,5f62ac7fae161dc8d78cb124ecf0260012cdb24d

@bogdando bogdando force-pushed the mariadb_nova_adoption_checks branch from 5f62ac7 to 22e75c5 Compare November 14, 2023 17:00
Copy link

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.
Warning:
Error merging github.com/openstack-k8s-operators/data-plane-adoption for 193,22e75c549f0bf02c320a2427a369ff37875fcb55

@bogdando bogdando force-pushed the mariadb_nova_adoption_checks branch 2 times, most recently from 45827b4 to 56a6f3a Compare November 15, 2023 12:45
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/f9dc476e714f44a6a71634c4b68d7903

data-plane-adoption-github-rdo-centos-9-extracted-crc FAILURE in 1h 36m 03s

@bogdando bogdando force-pushed the mariadb_nova_adoption_checks branch from 56a6f3a to 069c725 Compare November 15, 2023 14:51
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/92c7e343f8ca44bb8c93636b125eb0b1

data-plane-adoption-github-rdo-centos-9-extracted-crc FAILURE in 1h 28m 13s

@bogdando bogdando force-pushed the mariadb_nova_adoption_checks branch 5 times, most recently from 827f4ad to 24c0311 Compare November 16, 2023 16:29
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/5359a9f1f84d4a2d820c8f5c9bc21ae2

data-plane-adoption-github-rdo-centos-9-extracted-crc FAILURE in 1h 35m 18s

@bogdando bogdando force-pushed the mariadb_nova_adoption_checks branch 2 times, most recently from 0131735 to a1240c4 Compare November 17, 2023 12:34
@cescgina
Copy link
Contributor

recheck

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/4f30d597691341bb8fe7c26cc85b468e

data-plane-adoption-osp-17-to-extracted-crc FAILURE in 2h 08m 10s

@fultonj
Copy link
Contributor

fultonj commented Jan 29, 2024

the prev job failure was unrelated " Invalid volume: Volume 00f833f9-8f94-4041-908e-ff1dbf3bf1f3 status must be available, but current status is: backing-up. (HTTP 400) (Request-ID: req-38853a0c-a3f9-47b1-86e0-3efb d41c21f2)", "

created https://issues.redhat.com/browse/OSPRH-4217

@bogdando you aksed me to look at this but would you please give me more context on what is happening here and what you need for me?

Does the following describe it?

  • You have a workload on an overcloud deployed by tripleo which is using ceph RBD for glance, cinder, nova
  • This job runs an adoption to move that overcloud to one manged by k8s-operators and you're testing if the workload survives
  • the results indicate that part of the workload, a cinder volume, has a status of backing up instead of active

Do you have any idea why the volume is in a state of backing up?

@bogdando
Copy link
Contributor Author

This tries to adopt workload form ceph to local storage which wont work.

@bogdando
Copy link
Contributor Author

bogdando commented Jan 30, 2024

Does the following describe it?

  • You have a workload on an overcloud deployed by tripleo which is using ceph RBD for glance, cinder, nova

yes. I need some help with configuring it the same way post-adoption , for EDPM, in order to adopt that workload.
Ideally, I'd defer ceph to ceph adoption of workloads for now, leaving it for a future work, after this simplistic case is done. But I'm not sure how should we modify the install_yaml's make standalone target to configure ceph and local files backend for Nova. Then we need to change the test VM creation script to use local backend instead @SeanMooney FYI.

  • This job runs an adoption to move that overcloud to one manged by k8s-operators and you're testing if the workload survives

correct - either it survives shutdown and power on actions

  • the results indicate that part of the workload, a cinder volume, has a status of backing up instead of active

that happens during an initial bootstrap of a test VM in the source cloud, we can omit this for the time being

@bogdando
Copy link
Contributor Author

recheck

@fultonj
Copy link
Contributor

fultonj commented Jan 30, 2024

This tries to adopt workload form ceph to local storage which wont work.

Right. That should not work. Initially deploying with nova on local storage as per the following should work

openstack-k8s-operators/install_yamls#712

Copy link

This change depends on a change that failed to merge.

Change openstack-k8s-operators/install_yamls#712 is needed.

@fultonj
Copy link
Contributor

fultonj commented Jan 30, 2024

Does the following describe it?

  • You have a workload on an overcloud deployed by tripleo which is using ceph RBD for glance, cinder, nova

yes. I need some help with configuring it the same way post-adoption , for EDPM, in order to adopt that workload. Ideally, I'd defer ceph to ceph adoption of workloads for now, leaving it for a future work, after this simplistic case is done. But I'm not sure how should we modify the install_yaml's make standalone target to configure ceph and local files backend for Nova. Then we need to change the test VM creation script to use local backend instead @SeanMooney FYI.

So we'll initially configure Nova with local disk, not the Ceph VMs pool.

openstack-k8s-operators/install_yamls#712 (review)

  • This job runs an adoption to move that overcloud to one manged by k8s-operators and you're testing if the workload survives

correct - either it survives shutdown and power on actions

  • the results indicate that part of the workload, a cinder volume, has a status of backing up instead of active

that happens during an initial bootstrap of a test VM in the source cloud, we can omit this for the time being

Why would something like this:

 openstack server create --flavor $FLAV --volume $VOL_ID --nic net-id=$NET my-pet-vm

result in something like this?

 openstack volume backup create $VOL_ID

I'm not sure this theory on what's happening is correct.

But as per:

openstack-k8s-operators/install_yamls#712 (review)

We're not going to find out in this PR and we'll investigate what's really going on later via https://issues.redhat.com/browse/OSPRH-4287

@bogdando
Copy link
Contributor Author

recheck

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/74479d574eb64c1193d6a368fca81257

data-plane-adoption-osp-17-to-extracted-crc RETRY_LIMIT in 14m 54s

@bogdando
Copy link
Contributor Author

recheck

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/2d1d8b0cf34b4eb3a32cfe701f884cb3

data-plane-adoption-osp-17-to-extracted-crc FAILURE in 55m 04s

@bogdando
Copy link
Contributor Author

this fails differently now, and very early 'Cloud standalone was not found', for TASK [development_environment : pre-launch test VM instance]

@bogdando
Copy link
Contributor Author

bogdando commented Jan 31, 2024

Why would something like this:

 openstack server create --flavor $FLAV --volume $VOL_ID --nic net-id=$NET my-pet-vm

result in something like this?

 openstack volume backup create $VOL_ID

this is how we are used to do that in the adoption docs historically https://github.com/openstack-k8s-operators/data-plane-adoption/blob/main/docs_dev/assemblies/development_environment.adoc?plain=1#L283 after https://github.com/openstack-k8s-operators/data-plane-adoption/blob/main/docs_dev/assemblies/development_environment.adoc?plain=1#L267

let's leave it out of this PR scope for now. agreed

@fultonj
Copy link
Contributor

fultonj commented Jan 31, 2024

Why would something like this:

 openstack server create --flavor $FLAV --volume $VOL_ID --nic net-id=$NET my-pet-vm

result in something like this?

 openstack volume backup create $VOL_ID

this is how we are used to do that in the adoption docs historically https://github.com/openstack-k8s-operators/data-plane-adoption/blob/main/docs_dev/assemblies/development_environment.adoc?plain=1#L283 after https://github.com/openstack-k8s-operators/data-plane-adoption/blob/main/docs_dev/assemblies/development_environment.adoc?plain=1#L267

Ah ha! Then, one solution is for the job wait for the backup to complete.

let's leave it out of this PR scope for now. agreed

ok.

@bogdando
Copy link
Contributor Author

recheck deps changed

Copy link

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.
Warning:
Error merging github.com/openstack-k8s-operators/data-plane-adoption for 193,f4fa5f86976d172c79f66771d8b59ba8f879f7ab

* MariaDB pre-/post-checks:

Get services topology specific configuration in
pull_openstack_configuration. Add missing role for that as well.

Split maridb source vs podified env vars shell headers to use it
in corresponding places.

Update and fix the composition of the services pre-check list to
execute it before stopping services.

Update and fix the composition of the list of the services to be
stopped (cannot pull data from stopped services).

Verify no dataplane disruptions during the FFU/adoption process.

Verify Nova services still control pre-created VM workload after
FFU/adotpion is done.

Replace oc run with oc exec in post checks. Fix 'oc rsh' which breaks
expected formatting for 'mysql -rs', when it is called from console vs
called by ansible shell module. Use instead 'oc exec' to keep post
checks working the same way in docs and executed via ansible tests.

* Pre-launch test VM instance in dev env role.

Add a test role for development environment and mention it
in the docs TOC as well.

Defer pingtest until the network adoption done.
Defer cinder volumes until workloads with volumes adoption done.
Defer Nova RBD backend until workloads adoption from ceph to ceph done.

Use a copy of the install_yaml shell script to bootstrap a VM.
We cannot use it via intall_yamls devsetup make target because:
* we need to wrap it with shell/ocp/etc headers
* we need to use bash aliases ref to openstack commands instead of
  direct openstack CLI syntax
* we should avoid SSH-ing to the source cloud VMs (standalone/computes)
  during EDPM adoption
* install_yaml is going to be deprecated anyway
* it's FIP ping test doesn't work and depends on the network adoption
  done
* it's cinder volume/backup/snapshot/attach test doesn't work yet,
  which is OK for this simpe workload adoption case.

Signed-off-by: Bohdan Dobrelia <[email protected]>
@bogdando bogdando force-pushed the mariadb_nova_adoption_checks branch from f4fa5f8 to 28faef4 Compare January 31, 2024 15:33
Copy link

This change depends on a change that failed to merge.

Change openstack-k8s-operators/install_yamls#712 is needed.

@bogdando
Copy link
Contributor Author

recheck deps changed

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/aa2de90009cd412fa612bd098a2de78e

data-plane-adoption-osp-17-to-extracted-crc FAILURE in 1h 25m 13s

@bogdando
Copy link
Contributor Author

bogdando commented Feb 1, 2024

recheck error: unable to default to a user name: the server is currently unable to handle the request (get users.user.openshift.io ~)

@bogdando
Copy link
Contributor Author

bogdando commented Feb 1, 2024

This iteration of workloads adoption is good to go! PTAL folks

mkdocs.yml Show resolved Hide resolved
Copy link
Contributor

@jistr jistr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 but leaving space for @gibizer and @SeanMooney to review.

@@ -90,3 +90,65 @@ Once it's done, you should have into your local path a directory per services su
▾ glance/
▾ keystone/
----

== Get services topology specific configuration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JFYI: We'll be probably moving the OS Diff content from the "pull openstack configuration" into a new doc right before "EDPM adoption". This new content would stay here though. cc @matbu @davidjpeacock.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jistr ack ok, no problem from my side.

mkdocs.yml Show resolved Hide resolved
Copy link
Contributor

@gibizer gibizer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my nits can be fixed in a follow up

@jistr jistr merged commit bee0c70 into openstack-k8s-operators:main Feb 2, 2024
3 checks passed
@bogdando bogdando deleted the mariadb_nova_adoption_checks branch March 6, 2024 16:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.