Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deploy an arbitrary number of glanceAPI #384

Merged

Conversation

fmount
Copy link
Contributor

@fmount fmount commented Dec 5, 2023

This patch introduces a new API interface that allows to specify an arbitrary number of glanceAPI. By giving this ability from an interface point of view, the glance-operator is now responsible to reconcile the deployed instances in terms of lifecycle, as well as making sure we always have a selected API registered to the keystone image service.
As part of this change we are now able to:

  • grow or shrink the API deployment
  • update the keystoneEndpoint to select which instance should be registered in the keystone catalog
  • propagate extraVolumes to the single instance as done for both Cinder and Manila

Demo: https://asciinema.org/a/625117

@openshift-ci openshift-ci bot requested a review from lewisdenny December 5, 2023 12:59
@openshift-ci openshift-ci bot added the approved label Dec 5, 2023
@fmount fmount changed the title Deploy an arbitrary number of glanceAPI WIP - Deploy an arbitrary number of glanceAPI Dec 5, 2023
@fmount fmount force-pushed the list_of_glanceapi branch from 6386e06 to 30604dd Compare December 5, 2023 14:23
api/v1beta1/glance_webhook.go Outdated Show resolved Hide resolved
@fmount
Copy link
Contributor Author

fmount commented Dec 5, 2023

/retest-required

api/v1beta1/glance_types.go Outdated Show resolved Hide resolved
api/v1beta1/glance_webhook.go Outdated Show resolved Hide resolved
@fmount fmount force-pushed the list_of_glanceapi branch from 1205701 to da826a4 Compare December 5, 2023 22:28
@fmount
Copy link
Contributor Author

fmount commented Dec 6, 2023

I think there's a valid issue in tempest and it might be related to the endpoint creation:

2023-12-05 23:01:10.201 130 WARNING urllib3.connectionpool [-] Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f5c2f9fa160>: Failed to establish a new connection: [Errno -2] Name or service not known')': /�[00m
2023-12-05 23:01:10.205 130 WARNING urllib3.connectionpool [-] Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f5c2f9fa5e0>: Failed to establish a new connection: [Errno -2] Name or service not known')': /�[00m
2023-12-05 23:01:10.209 130 WARNING urllib3.connectionpool [-] Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f5c2f9fa700>: Failed to establish a new connection: [Errno -2] Name or service not known')': /�[00m
2023-12-05 23:01:10.217 130 ERROR config_tempest.constants [-] Request on service 'image' with url 'http://glance-default-external-public.openstack.svc:9292' failed: urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='glance-default-external-public.openstack.svc', port=9292): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f5c2f9fa880>: Failed to establish a new connection: [Errno -2] Name or service not known'))�[00m
2023-12-05 23:01:10.220 130 CRITICAL tempest [-] Unhandled error: urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='glance-default-external-public.openstack.svc', port=9292): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f5c2f9fa880>: Failed to establish a new connection: [Errno -2] Name or service not known'))
2023-12-05 23:01:10.220 130 ERROR tempest Traceback (most recent call last):
2023-12-05 23:01:10.220 130 ERROR tempest   File "/usr/lib/python3.9/site-packages/urllib3/connection.py", line 169, in _new_conn
2023-12-05 23:01:10.220 130 ERROR tempest     conn = connection.create_connection(
2023-12-05 23:01:10.220 130 ERROR tempest   File "/usr/lib/python3.9/site-packages/urllib3/util/connection.py", line 73, in create_connection
2023-12-05 23:01:10.220 130 ERROR tempest     for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
2023-12-05 23:01:10.220 130 ERROR tempest   File "/usr/lib64/python3.9/socket.py", line 954, in getaddrinfo
2023-12-05 23:01:10.220 130 ERROR tempest     for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
2023-12-05 23:01:10.220 130 ERROR tempest socket.gaierror: [Errno -2] Name or service not known
2023-12-05 23:01:10.220 130 ERROR tempest 
2023-12-05 23:01:10.220 130 ERROR tempest During handling of the above exception, another exception occurred:

@fmount fmount force-pushed the list_of_glanceapi branch 4 times, most recently from 1e6225b to c15cbc5 Compare December 6, 2023 11:15
@fmount fmount force-pushed the list_of_glanceapi branch from 8b62168 to b0aa0f3 Compare December 6, 2023 22:37
Kuttl tests have been updated to reflect the new GlanceAPI definition.
As an additional point, the copy && deploy pattern has been removed and
now, for each sample layout in config/samples an associated kuttl test
can be built and run.
So far only single and multiple have been moved, but the idea is to have
a follow up change to build edge, base and multiple.

Signed-off-by: Francesco Pantano <[email protected]>
The logic that handles the apiDeployment has been reworked and moved to
a dedicated function.
By doing this we're able to clearly have evidence in the code of the
flow in the main reconcile loop.
In addition, webhooks are updated and do not prevent the deployment of
additional glanceAPI later in time.

Signed-off-by: Francesco Pantano <[email protected]>
We have the ability to grow the deployment and add more glanceAPI
instance as needed, as well as select which one should be used as
the main instance registered to keystone catalog. With this patch
we have now the ability to cleanup an instance thar no longer
belongs to the main CR. The endpoints registered in the Status are
removed as well.

Signed-off-by: Francesco Pantano <[email protected]>
If AppSelector does not reflect the general service name, it might
result hard to get all the resources associated to a given Glance
CR. For this reason this patch reworks the labels assignment to
align Glance with the work done in the other operators.

Signed-off-by: Francesco Pantano <[email protected]>
This patch fixes a few issues in the label and selectors assignment and
goes back to the previous deployment workflow so we don't have too much
wrapping in the existing logic.
The delete flow has been updated as well: the Finalizer that prevented
the glanceAPI to be removed is gone and the glanceAPICleanup is not run
within the for loop.

Signed-off-by: Francesco Pantano <[email protected]>
Both internal and public endpoints are based on a pattern that uses
instance.Name. However, this results weird for the end user, as the
url will be seen in the form service-name-type-type.DOMAIN. This
change simplifies the endpoint Name creation removing the additional
extra "type" in the name.

Signed-off-by: Francesco Pantano <[email protected]>
@fmount
Copy link
Contributor Author

fmount commented Dec 13, 2023

/retest-required

@fmount
Copy link
Contributor Author

fmount commented Dec 13, 2023

/test glance-operator-build-deploy-tempest

@fmount
Copy link
Contributor Author

fmount commented Dec 13, 2023

/retest

Now that we have the ability to define multiple Glance instances, we
can propagate extraVolumes not only to the Glance components (or,
logically, the GlanceAPI subgroup), but we can do the same thing by
specifying a defined instance (e.g. api0). As done for both Cinder
and Manila (where this mechanism is already in place), the code has
been updated to reflect this way of propagating volumes.

Signed-off-by: Francesco Pantano <[email protected]>
@fmount
Copy link
Contributor Author

fmount commented Dec 14, 2023

Despite I'm still trying to get the CI green on PR#587 and predict any potential failure
with patch [2], I think this change it's ready to go.
I tested with and without network isolation, with multiple instances deployed through
the openstack-operator (see [1]). As an additional test, I deployed multiple ceph
clusters (in the Pod form [4]) and use extraVolumes to attach each of them to
a different glanceAPI.

As the demo [1] shows:

- api0 => upload an image => ceph0
- patch the `openstackcontrolplane` to switch endpoint to api1
- api1 => upload an image => ceph1

The three steps above are basically the common part of the tests I made. I'm going
to create a doc patch as a follow up to better describe the design choices and how
the whole flow works, for both the glanceAPI lifecycle and the endpoints management.
As a follow up, we might add more conditions for the edge type API, so that we
can skip any public endpoint creation (even if not exposed through a Route).
By doing this we can prevent them to have the ability to be registered in keystone.

@abays if you are ok with this patch, once is approved, we should rebase/merge [2]
to avoid issues and inconsistencies with the API.
Both kuttl and tempest test prove that the covered use cases work as expected,
(see also the Prow job logs in [2]).
I'll keep the do-not-merge/hold label to block merging it until you've checked [2].
(that of course should be rebased on this one, while right now kuttl uses my custom
bundle [4] based on this commit).

[1] https://asciinema.org/a/626515
[2] openstack-k8s-operators/openstack-operator#587
[3] openstack-k8s-operators/install_yamls#654
[4] https://quay.io/repository/fpantano/glance-operator-bundle:3598c3d1fad3

Copy link
Contributor

@abays abays left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Copy link
Contributor

openshift-ci bot commented Dec 14, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abays, fmount

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

fmount added a commit to fmount/data-plane-adoption that referenced this pull request Dec 14, 2023
We recently introduced the ability to deploy a list of glanceAPI
in the OpenStackControlPlane. This feature is not required in the
adoption context, but it's critical to adapt the Glance deployment
to the new api.
It might be relevant in the future as long as DCN environments will
be adopted.

Depends-On: openstack-k8s-operators/glance-operator#384
Depends-On: openstack-k8s-operators/openstack-operator#587

Signed-off-by: Francesco Pantano <[email protected]>
@fmount
Copy link
Contributor Author

fmount commented Dec 15, 2023

Given that this change is ready and the openstack-operator patch is green [1], I'm removing the hold label and merge this change.

[1] https://review.rdoproject.org/zuul/buildset/f120eaadda9344ab8efe4c0db053e15d

@fmount
Copy link
Contributor Author

fmount commented Dec 15, 2023

/unhold

@openshift-merge-bot openshift-merge-bot bot merged commit 9d0a576 into openstack-k8s-operators:main Dec 15, 2023
7 checks passed
fmount added a commit to fmount/install_yamls that referenced this pull request Jan 18, 2024
We recently introduced a feature that allows to deploy an arbitrary numbers
of GlanceAPI [1].
To ease the test and to be closer to an edge scenario simulation, an
approach would be to have the ability to deploy multiple Ceph pods:
they bring different secrets that can be propagated to a subset of
ctlplane components.
This patch introduces the new 'CEPH_CLUSTERS' variable that is used
within the bash script to (eventually) deploy multiple Ceph Pods.

[1] openstack-k8s-operators/glance-operator#384

Signed-off-by: Francesco Pantano <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants