-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean up scaled down services based on hostname #775
Clean up scaled down services based on hostname #775
Conversation
Hi @auniyal61. Thanks for your PR. I'm waiting for a openstack-k8s-operators member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Build failed (check pipeline). Post https://review.rdoproject.org/zuul/buildset/6427ab4248be40098acfafd812f4e77f ✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 49m 03s |
test/functional/base_test.go
Outdated
@@ -177,7 +177,7 @@ func (f *NovaAPIFixture) ServicesList(w http.ResponseWriter, r *http.Request) { | |||
"id": 4, | |||
"binary": "nova-conductor", | |||
"disabled_reason": "test4", | |||
"host": "host2", | |||
"host": "nova-conductor-1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have a single conductor replica then that is always postfixed with -0.
Also if you look at the service list in your deployment:
oc exec openstackclient -- openstack compute service list
+--------------------------------------+----------------+------------------------+----------+---------+-------+----------------------------+
| ID | Binary | Host | Zone | Status | State | Updated At |
+--------------------------------------+----------------+------------------------+----------+---------+-------+----------------------------+
| abb15fa7-380b-4baf-85d1-adf25febae54 | nova-conductor | nova-cell0-conductor-0 | internal | enabled | up | 2024-05-28T16:08:00.000000 |
| 5044f9ba-ace0-407e-b5f3-ae26af8ca53b | nova-scheduler | nova-scheduler-0 | internal | enabled | up | 2024-05-28T16:08:08.000000 |
| 2fdf49f6-65d6-4584-a2f7-82711b717db7 | nova-conductor | nova-cell1-conductor-0 | internal | enabled | up | 2024-05-28T16:08:00.000000 |
+--------------------------------------+----------------+------------------------+----------+---------+-------+----------------------------+
then you will see that there are conductors for each cell (cell0 and cell1 in the default deployment). We have a separate statefulset per cell for the conductors as they need different DB and message bus configuration and they can be scaled independently.
oc get StatefulSet | grep conductor
nova-cell0-conductor 1/1 4m51s
nova-cell1-conductor 1/1 4m31s
oc get pods -lservice=nova-conductor
NAME READY STATUS RESTARTS AGE
nova-cell0-conductor-0 1/1 Running 0 5m24s
nova-cell1-conductor-0 1/1 Running 0 5m4s
Bottom line, to make this test data realistic I would use nova-cell0-conductor-0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Later I just realized that we might have more complications in the test env about the host name of the services.
The hostname of the service in the nova cell DB is coming from the name of the Pod running that service binary. The name of the Pod consists of the name of the StatefulSet and the index of the replica. The name of the StatefulSet is set by the nova-operator. For the scheduler it is set by the scheduler controller in
nova-operator/pkg/novascheduler/deployment.go
Line 105 in 3f20f39
Name: instance.Name, |
nova-operator/controllers/nova_controller.go
Line 1171 in 3f20f39
Name: instance.Name + "-scheduler", |
However in the functional test env we don't run openstack-operator just the nova-operator and the test code creates the nova related CRs directly. To avoid parallel testcase execution to interfere with each other the name of the CRs created by the test is random generated. I.e. here we create a NovaScheduler CR
DeferCleanup(th.DeleteInstance, CreateNovaScheduler(novaNames.SchedulerName, spec)) |
and the passed in novaNames struct has a random uuid generated for each test case to be included in the name of the CRs.
nova-operator/test/functional/base_test.go
Lines 714 to 725 in 3f20f39
func GetNovaNames(novaName types.NamespacedName, cellNames []string) NovaNames { | |
// NOTE(bogdando): use random UUIDs instead of static "nova" part of names. | |
// These **must** replicate existing Nova*/Dataplane controllers suffixing/prefixing logic. | |
// While dynamic UUIDs also provide enhanced testing coverage for "synthetic" cases, | |
// which could not be caught for normal names with static "nova" prefixes. | |
novaAPI := types.NamespacedName{ | |
Namespace: novaName.Namespace, | |
Name: fmt.Sprintf("%s-api", novaName.Name), | |
} | |
novaScheduler := types.NamespacedName{ | |
Namespace: novaName.Namespace, | |
Name: fmt.Sprintf("%s-scheduler", novaName.Name), |
So in the functional env the NovaScheduler CR will have a name like -scheduler, and therefore the StatefulSet name will also be like that.
So most likely we cannot store hard coded test data here any more but generate the test data from the test case where we know what is the current uuid for the test case.
@@ -609,13 +609,14 @@ func (r *NovaConductorReconciler) cleanServiceFromNovaDb( | |||
secret corev1.Secret, | |||
l logr.Logger, | |||
) error { | |||
replica_count := int(*instance.Spec.Replicas) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
golang uses camelCase instead of snake_case. The full style guide is here https://go.dev/doc/effective_go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack, fixed name style
Build failed (check pipeline). Post https://review.rdoproject.org/zuul/buildset/a47d546ccc2047608b0a9b03de6afa68 ✔️ openstack-k8s-operators-content-provider SUCCESS in 3h 20m 22s |
/ok-to-test |
recheck [1] https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openstack-k8s-operators_nova-operator/775/pull-ci-openstack-k8s-operators-nova-operator-main-functional/1795823352099639296 |
Build failed (check pipeline). Post https://review.rdoproject.org/zuul/buildset/604dc6fe07db461d8037cae93d64cf82 ✔️ openstack-k8s-operators-content-provider SUCCESS in 3h 07m 39s |
/test functional @auniyal61 while the patch is in draft the prow jobs are not running automatically when the PR is updated. You can always trigger it manually though. |
without looking at the change I checked the kuttl test results where we have scale down tests and it is passing.
Btw we can simplify the kuttl test now. Previously it needed to wait until the service considered down by nova as only down services is deleted. But we don't need that wait any more. I think step4,5,6 can be deleted form here https://github.com/openstack-k8s-operators/nova-operator/tree/main/test/kuttl/test-suites/default/scale-tests and the content of https://github.com/openstack-k8s-operators/nova-operator/blob/main/test/kuttl/test-suites/default/scale-tests/06-assert.yaml can be added directly to step 3 https://github.com/openstack-k8s-operators/nova-operator/blob/main/test/kuttl/test-suites/default/scale-tests/03-assert.yaml |
controllers/common.go
Outdated
// name index start from 0 | ||
// which means if replicaCount is 1, only nova-scheduler-0 is valid case | ||
// so delete >= 1 | ||
if service.State == "down" || hostIndex >= int(replicaCount) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of handing service in down state we should make sure that cleanServiceFromNovaDb is called after the deployment is fully up to date. You can do that by adding condition before calling cleanServiceFromNovaDb in
nova-operator/controllers/novascheduler_controller.go
Lines 276 to 286 in ab95f15
result, err = r.ensureDeployment(ctx, h, instance, inputHash, serviceAnnotations) | |
if (err != nil || result != ctrl.Result{}) { | |
return result, err | |
} | |
// clean up nova services from nova db should be always a last step in reconcile | |
// to make sure that | |
err = r.cleanServiceFromNovaDb(ctx, h, instance, secret, Log) | |
if err != nil { | |
Log.Error(err, "Failed cleaning services from nova db") | |
} |
instance.Status.ReadyCount = ss.GetStatefulSet().Status.ReadyReplicas |
@@ -609,13 +609,14 @@ func (r *NovaConductorReconciler) cleanServiceFromNovaDb( | |||
secret corev1.Secret, | |||
l logr.Logger, | |||
) error { | |||
replicaCount := int(*instance.Spec.Replicas) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would drop the int cast and instead change the function parameter in cleanNovaServiceFromNovaDb to be int32. But keep the pointer deref so that cleanNovaServiceFromNovaDb does not need to deal with nil pointers.
btw even though instance.Spec.Replicas is a pointer we know it will not be nil as when the Instance is read from the etcd the field is defaulted 1 if not provided. This behavior is defined by the kubebuilder comments on the field definition
nova-operator/api/v1beta1/novascheduler_types.go
Lines 32 to 37 in ab95f15
// +kubebuilder:validation:Optional | |
// +kubebuilder:default=1 | |
// +kubebuilder:validation:Maximum=32 | |
// +kubebuilder:validation:Minimum=0 | |
// Replicas of the service to run | |
Replicas *int32 `json:"replicas"` |
@@ -663,11 +663,12 @@ func (r *NovaSchedulerReconciler) cleanServiceFromNovaDb( | |||
secret corev1.Secret, | |||
l logr.Logger, | |||
) error { | |||
replicaCount := int(*instance.Spec.Replicas) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto about casting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
controllers/common.go
Outdated
hostSplits = hostSplits[len(hostSplits)-2:] | ||
host, hostIndexStr := hostSplits[0], hostSplits[1] | ||
|
||
if strings.Contains(serviceName, host) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not be needed. We already only querying the given service type from nova at L123. So when we call this from the schedule controller we only list the scheduler service form the db.
It is possible that the functional test code did not simulated this precisely I need to check...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah it was a shortcoming of the test code only. I pushed improvements to the nova API fixture to return only the services which uses the requested binary. #778
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
controllers/common.go
Outdated
hostSplits := strings.Split(service.Host, "-") | ||
hostSplits = hostSplits[len(hostSplits)-2:] | ||
host, hostIndexStr := hostSplits[0], hostSplits[1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is OK for the scheduler case as Nova deployment always have a single set of scheduler replicas. However conductor this is different. For each NovaCell we have a separate set of concuctor replicas. And the different cells can have different conductor replica count.
This also means that the delete logic for a given cell should only delete conductor services for the given cell. Right now this code deletes conductor services regardless which cell they are belonging to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E.g. I changed the default deployment in a way that cell0 has 2 conductor replicas and cell1 has 3:
❯ oc get OpenStackControlPlane -o yaml| grep conductorServiceTemplate -B4 -A2
cell0:
cellDatabaseAccount: nova-cell0
cellDatabaseInstance: openstack
cellMessageBusInstance: rabbitmq
conductorServiceTemplate:
customServiceConfig: ""
replicas: 2
--
cell1:
cellDatabaseAccount: nova-cell1
cellDatabaseInstance: openstack-cell1
cellMessageBusInstance: rabbitmq-cell1
conductorServiceTemplate:
customServiceConfig: ""
replicas: 3
❯ oc get pods | grep conductor
nova-cell0-conductor-0 1/1 Running 0 113m
nova-cell0-conductor-1 1/1 Running 0 2m14s
nova-cell1-conductor-0 1/1 Running 0 113m
nova-cell1-conductor-1 1/1 Running 0 2m14s
nova-cell1-conductor-2 1/1 Running 0 2m14s
❯ openstack compute service list
+--------------------------------------+----------------+------------------------+----------+---------+-------+----------------------------+
| ID | Binary | Host | Zone | Status | State | Updated At |
+--------------------------------------+----------------+------------------------+----------+---------+-------+----------------------------+
| 26e569f7-66ab-4a87-9bfa-e48e01835f54 | nova-conductor | nova-cell1-conductor-0 | internal | enabled | up | 2024-05-31T14:05:13.000000 |
| bb3164f0-c655-40c1-beb8-1592df024d10 | nova-conductor | nova-cell1-conductor-2 | internal | enabled | up | 2024-05-31T14:05:11.000000 |
| ced135e2-7242-4e36-bf59-3ae29382b62d | nova-conductor | nova-cell1-conductor-1 | internal | enabled | up | 2024-05-31T14:05:11.000000 |
| 4530443c-b5bc-4cfa-8d50-d880bbe74b09 | nova-conductor | nova-cell0-conductor-0 | internal | enabled | up | 2024-05-31T14:05:19.000000 |
| 396ce09f-cc4f-4e95-a0d3-4535dc2023de | nova-scheduler | nova-scheduler-0 | internal | enabled | up | 2024-05-31T14:05:12.000000 |
| 14bf553c-5958-4d60-bba9-40035f602a8f | nova-conductor | nova-cell0-conductor-1 | internal | enabled | up | 2024-05-31T14:05:11.000000 |
+--------------------------------------+----------------+------------------------+----------+---------+-------+----------------------------+
❯ oc get NovaConductor -o custom-columns=NAME:.metadata.name,CELL:.spec.cellName,REPLICAS:.spec.replicas
NAME CELL REPLICAS
nova-cell0-conductor cell0 2
nova-cell1-conductor cell1 3
Now if I change the conductor replica count in cell0 from 2 to 1 I don't want that the nova-cell1-conductor-1 or nova-cell1-conductor-2 is deleted, only nova-cell0-conductor-1 should be deleted.
So the delete logic needs to depend on the cell name as well. You can read that from the conductor instance as instance.Spec.CellName.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
path 778, fixes the test for names, we do not need to extract host_name anymore, removed.
if instance.Spec.Replicas != &instance.Status.ReadyCount { | ||
return ctrl.Result{}, nil | ||
} | ||
// for instance.Spec.NovaServiceBase.Replicas |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to remove
/test functional |
@@ -676,7 +676,7 @@ var _ = Describe("NovaScheduler controller cleaning", func() { | |||
condition.ReadyCondition, | |||
corev1.ConditionTrue, | |||
) | |||
Expect(novaAPIFixture.HasRequest("DELETE", "/compute/os-services/3", "")).To(BeTrue()) | |||
Expect(novaAPIFixture.HasRequest("DELETE", "/compute/os-services/3", "binary=nova-scheduler")).To(BeTrue()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so this is failing because of our added condition for rapilcaCount.
now control never reaches to cleanServiceFromNovaDb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here you added binary=nova-scheduler but the controller does not send a binary query param with the DELETE request, it only do that in the GET request when querying the list of services.
- GET
nova-operator/controllers/common.go
Lines 119 to 123 in 845b371
opts := services.ListOpts{ Binary: serviceName, } allPages, err := services.List(computeClient, opts).AllPages() - DELETE
nova-operator/controllers/common.go
Line 135 in 845b371
rsp := services.Delete(computeClient, service.ID)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you remove that change and do the fix of the condition above then the functional test passes:
2024-06-04T10:49:59.346+0200 INFO Controllers.NovaScheduler Deployment is not ready {"controller": "novascheduler", "controllerGroup": "nova.openstack.org", "controllerKind": "NovaScheduler", "NovaScheduler": {"name":"435d1006-78aa-481a-bb1f-0-scheduler","namespace":"586295de-5de7-48a9-b231-59ccd7b0b02b"}, "namespace": "586295de-5de7-48a9-b231-59ccd7b0b02b", "name": "435d1006-78aa-481a-bb1f-0-scheduler", "reconcileID": "87264e31-b83b-4dcb-9b52-99db1db29b34", "Status": {"replicas":0,"availableReplicas":0}}
2024-06-04T10:49:59.346+0200 INFO Controllers.NovaScheduler Waiting for ReadyCount to equal Replicas before deleting scaled in services from the nova DB. {"controller": "novascheduler", "controllerGroup": "nova.openstack.org", "controllerKind": "NovaScheduler", "NovaScheduler": {"name":"435d1006-78aa-481a-bb1f-0-scheduler","namespace":"586295de-5de7-48a9-b231-59ccd7b0b02b"}, "namespace": "586295de-5de7-48a9-b231-59ccd7b0b02b", "name": "435d1006-78aa-481a-bb1f-0-scheduler", "reconcileID": "87264e31-b83b-4dcb-9b52-99db1db29b34", "replicas": 1, "ReadyCount": 0}
2024-06-04T10:49:59.430+0200 INFO ---Test--- Simulated statefulset success {"on": {"name":"435d1006-78aa-481a-bb1f-0-scheduler","namespace":"586295de-5de7-48a9-b231-59ccd7b0b02b"}}
2024-06-04T10:49:59.430+0200 INFO ---Test--- ExpectCondition {"type": "Ready", "expected status": "True", "on": {"name":"435d1006-78aa-481a-bb1f-0-scheduler","namespace":"586295de-5de7-48a9-b231-59ccd7b0b02b"}}
2024-06-04T10:49:59.430+0200 INFO Controllers.NovaScheduler Reconciling {"controller": "novascheduler", "controllerGroup": "nova.openstack.org", "controllerKind": "NovaScheduler", "NovaScheduler": {"name":"435d1006-78aa-481a-bb1f-0-scheduler","namespace":"586295de-5de7-48a9-b231-59ccd7b0b02b"}, "namespace": "586295de-5de7-48a9-b231-59ccd7b0b02b", "name": "435d1006-78aa-481a-bb1f-0-scheduler", "reconcileID": "6eddc6b3-2f9d-4879-8284-97db2c579739"}
2024-06-04T10:49:59.440+0200 INFO Controllers.NovaScheduler StatefulSet 435d1006-78aa-481a-bb1f-0-scheduler - updated {"controller": "novascheduler", "controllerGroup": "nova.openstack.org", "controllerKind": "NovaScheduler", "NovaScheduler": {"name":"435d1006-78aa-481a-bb1f-0-scheduler","namespace":"586295de-5de7-48a9-b231-59ccd7b0b02b"}, "namespace": "586295de-5de7-48a9-b231-59ccd7b0b02b", "name": "435d1006-78aa-481a-bb1f-0-scheduler", "reconcileID": "6eddc6b3-2f9d-4879-8284-97db2c579739"}
2024-06-04T10:49:59.440+0200 INFO Controllers.NovaScheduler Deployment is ready {"controller": "novascheduler", "controllerGroup": "nova.openstack.org", "controllerKind": "NovaScheduler", "NovaScheduler": {"name":"435d1006-78aa-481a-bb1f-0-scheduler","namespace":"586295de-5de7-48a9-b231-59ccd7b0b02b"}, "namespace": "586295de-5de7-48a9-b231-59ccd7b0b02b", "name": "435d1006-78aa-481a-bb1f-0-scheduler", "reconcileID": "6eddc6b3-2f9d-4879-8284-97db2c579739"}
2024-06-04T10:49:59.440+0200 INFO ---Test--- OpenStack API request {"method": "GET", "URI": "/identity/"}
2024-06-04T10:49:59.443+0200 INFO ---Test--- OpenStack API request {"method": "GET", "URI": "/compute/os-services/?binary=nova-scheduler"}
2024-06-04T10:49:59.444+0200 INFO ---Test--- OpenStack API request {"method": "DELETE", "URI": "/compute/os-services/3"}
2024-06-04T10:49:59.444+0200 INFO Controllers.NovaScheduler Deleted service {"controller": "novascheduler", "controllerGroup": "nova.openstack.org", "controllerKind": "NovaScheduler", "NovaScheduler": {"name":"435d1006-78aa-481a-bb1f-0-scheduler","namespace":"586295de-5de7-48a9-b231-59ccd7b0b02b"}, "namespace": "586295de-5de7-48a9-b231-59ccd7b0b02b", "name": "435d1006-78aa-481a-bb1f-0-scheduler", "reconcileID": "6eddc6b3-2f9d-4879-8284-97db2c579739", "service": {"binary":"nova-scheduler","disabled_reason":"","forced_down":false,"host":"nova-scheduler-1","state":"down","status":"enabled","zone":"internal"}}
So the test case of nova-scheduler works because when novascheduler controller creates the StatefulSet with Replicas 1 then the ReadyReplicas is 0 and the cleaning does not happen. Then the test code at L672 simulates that the statefulset is ready and that sets the ReadyReplicas to 1 as well. The change of the statefulset will trigger a new reconciles on the NovaScheduler as it owns the statefulset and in that reconcile your new condition (with my fix) will allow the service deletion to happen.
Build failed (check pipeline). Post https://review.rdoproject.org/zuul/buildset/9a72a2a6a9844a538f2aaac3cf477375 ✔️ openstack-k8s-operators-content-provider SUCCESS in 4h 29m 59s |
b8bd2ba
to
36025c8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
couple of fixes but I think we are close to be able to land this.
@@ -784,6 +784,9 @@ var _ = Describe("NovaConductor controller cleaning", func() { | |||
Expect(novaAPIServer.HasRequest("DELETE", "/compute/os-services/5", "")).To(BeTrue()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- ID 5 is nova-cell0-conductor-0 so that should not be deleted by the controller.
controllers/common.go
Outdated
// name index start from 0 | ||
// which means if replicaCount is 1, only nova-scheduler-0 is valid case | ||
// so delete >= 1 | ||
if service.State == "down" || hostIndex >= int(replicaCount) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't have to wait for the service to be marked down by nova. We know that the pod is delete as we waited for the instance.Spec.Replicas == instance.Status.ReadyCount before we started the delete. So you can drop the down check.
This will make the test pass with the asserts I proposed below.
controllers/common.go
Outdated
@@ -115,6 +117,8 @@ func cleanNovaServiceFromNovaDb( | |||
computeClient *gophercloud.ServiceClient, | |||
serviceName string, | |||
l logr.Logger, | |||
replicaCount int32, | |||
conductorCellName ...string, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would name this simply cellName, and would not try to make it optional via ellipsis. In general golang does not support optional parameters and if we do need them then we need to pass structs around with pointer fields for optionality, or use a builder pattern. Here I think it is OK to have a cellName string parameter, the scheduler can pass in "" as it is not in a cell.
controllers/common.go
Outdated
if strings.Contains(service.Host, "conductor") { | ||
if !strings.Contains(service.Host, conductorCellName[0]) { | ||
// delete only if target conductor service is from same target cell | ||
continue | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be simplified to:
replicaCount int32,
- conductorCellName ...string,
+ cellName string,
) error {
opts := services.ListOpts{
Binary: serviceName,
@@ -135,11 +135,10 @@ func cleanNovaServiceFromNovaDb(
}
for _, service := range allServices {
- if strings.Contains(service.Host, "conductor") {
- if !strings.Contains(service.Host, conductorCellName[0]) {
- // delete only if target conductor service is from same target cell
- continue
- }
+ // delete only if serviceHost is for our cell. If no cell is
+ // provided (e.g. scheduler cleanup case) then this check is a noop
+ if !strings.Contains(service.Host, cellName) {
+ continue
}
/test functional got |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have one more request for the test and please amend the commit message and PR description. The current title is not really correct. I suggest to use your first sentence from the main commit message as the title.
- Update service objects in functional test api_fixture - Adds a check to wait for readycount to become equal to replica - Delete service host from nova DB by hostname on replica count update
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Thank you @auniyal61 !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
@@ -280,6 +280,10 @@ func (r *NovaConductorReconciler) Reconcile(ctx context.Context, req ctrl.Reques | |||
return ctrl.Result{}, err | |||
} | |||
|
|||
if *instance.Spec.Replicas != instance.Status.ReadyCount { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in follow up we can handle also generation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I need to look at it again when #739 lands. Either we need a direct generation check here, or we can just rely on DeploymentReady is only set if the generation matches.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: auniyal61, mrkisaolamb The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Build failed (check pipeline). Post https://review.rdoproject.org/zuul/buildset/8e48a4b416894f6195f517b7f2e35544 ✔️ openstack-k8s-operators-content-provider SUCCESS in 4h 43m 18s |
/test functional tempest test test_server_connectivity_live_migration failed while verifying downtime of VM after live migration |
this did not ran rdoproject.org/github-check job again |
recheck
the rdo jobs need "recheck" |
yepp we see that before so that is unrealted |
Build failed (check pipeline). Post https://review.rdoproject.org/zuul/buildset/074a26a3358e4cacbeeaeb08fc5151cf ✔️ openstack-k8s-operators-content-provider SUCCESS in 4h 51m 11s |
recheck SSH issue while logs: |
It seems like CI is failing everywhere. But I haven't checked the other results yet. Let's see if the recheck helps
ovn deployment failed on the EDPM node:
|
Build failed (check pipeline). Post https://review.rdoproject.org/zuul/buildset/9463417c42a34402846f6154cf0c5070 ✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 38m 36s |
recheck |
f61fdd2
into
openstack-k8s-operators:main
Implements: OSPRH-6912