Share cleanup job #337

fmount · 2024-09-23T08:34:55Z

When a backend config is removed, the "service" database entry in manila needs to be adjusted to account for the removal. This patch introduces a Job that runs a manila-manage command every time a share is scaled down. The main job definition is now more flexible and we can pass a TTL in case we want to customize the time the job is kept.

Jira: https://issues.redhat.com/browse/OSPRH-6526

openshift-ci · 2024-09-23T08:35:01Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: fmount

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [fmount]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

fmount · 2024-09-23T08:35:26Z

holding for now to review and discuss more about this.

fmount · 2024-09-23T09:12:36Z

/test functional

silvacarloss · 2024-09-23T19:44:29Z

Looks good from a manila point of view and the removal makes sense! Thank you for working on this. @gouthampacha might want to take a look and share his thoughts too :)

pkg/manila/const.go

When a share is deleted from the top-level CR, its reference is not deleted, and manila still sees it as Down/not available. For this reason, every time a scale down operation is executed, we need to run a job that is supposed to run a service cleanup, removing everything set as down. Jira: https://issues.redhat.com/browse/OSPRH-6526 Signed-off-by: Francesco Pantano <[email protected]>

When a backend config is removed, the "service" database entry in manila needs to be adjusted to account for the removal. This patch introduces a Job that runs a manila-manage command every time a share is scaled down. The main job definition is now more flexible and we can pass a TTL in case we want to customize the time the job is kept. Signed-off-by: Francesco Pantano <[email protected]>

fmount · 2024-10-04T12:26:49Z

pkg/manila/const.go

@@ -48,6 +48,8 @@ const (
 	ManilaPublicPort int32 = 8786
 	// ManilaInternalPort -
 	ManilaInternalPort int32 = 8786
+	// ServiceCleanupDelay -
+	ManilaServiceCleanupDelay int32 = 120


@gouthampacha we delay according to this parameter.

fmount · 2024-10-04T12:27:29Z

pkg/manila/job.go

@@ -62,12 +70,13 @@ func DbSyncJob(instance *manilav1.Manila, labels map[string]string, annotations
 		},
 	}

-	args := []string{"-c", DBSyncCommand}
+	delayCommand := fmt.Sprintf("sleep %d", delay)
+	args := []string{"-c", fmt.Sprintf("%s && %s", delayCommand, jobCommand)}


@gouthampacha the resulting command is the concatenation between the delay and the jobCommand.

fmount · 2024-10-04T12:28:48Z

@gouthampacha @silvacarloss I made a few more local tests related to the fact we need to delay the job execution and here a summary.

## Check Initial deploy

We have two shares, share0 connected to native_cephfs, share1 backed by ceph-nfs

sh-5.1$ openstack share service list
+----+------------------+---------------------+------+---------+-------+----------------------------+
| ID | Binary           | Host                | Zone | Status  | State | Updated At                 |
+----+------------------+---------------------+------+---------+-------+----------------------------+
|  1 | manila-scheduler | hostgroup           | nova | enabled | up    | 2024-10-04T11:41:41.814368 |
|  3 | manila-share     | hostgroup@cephfsnfs | nova | enabled | up    | 2024-10-04T11:41:48.227259 |
|  4 | manila-share     | hostgroup@cephfs    | nova | enabled | up    | 2024-10-04T11:41:43.552743 |
+----+------------------+---------------------+------+---------+-------+----------------------------+


sh-5.1$ openstack share pool list
+----------------------------+-----------+-----------+--------+
| Name                       | Host      | Backend   | Pool   |
+----------------------------+-----------+-----------+--------+
| hostgroup@cephfsnfs#cephfs | hostgroup | cephfsnfs | cephfs |
| hostgroup@cephfs#cephfs    | hostgroup | cephfs    | cephfs |
+----------------------------+-----------+-----------+--------+

We cleanup share-0, removing the native cephfs share.

## Pods (manila-share0 not there anymore)

manila-api-0                                                   2/2     Running     0          20m
manila-db-sync-k5skw                                           0/1     Completed   0          4m3s
manila-scheduler-0                                             2/2     Running     0          9m55s
manila-service-cleanup-n5b5h655-9f7kl                          1/1     Running     0          9s
manila-share-share1-0                                          2/2     Running     0          3m47s

## Cleanup job

The job starts, waits for 120 sec before running the command
[stack@osp-storage-04 ~]$ oc logs -f manila-service-cleanup-n5b5h655-9f7kl
Cleaned up service hostgroup@cephfs
## Check services

$ openstack share service list                                                                                                                                                                                                                                            

+----+------------------+---------------------+------+---------+-------+----------------------------+
| ID | Binary           | Host                | Zone | Status  | State | Updated At                 |
+----+------------------+---------------------+------+---------+-------+----------------------------+
|  1 | manila-scheduler | hostgroup           | nova | enabled | up    | 2024-10-04T11:46:21.871310 |
|  3 | manila-share     | hostgroup@cephfsnfs | nova | enabled | up    | 2024-10-04T11:46:28.299363 |
+----+------------------+---------------------+------+---------+-------+----------------------------+

The previous "manila-share: hostgroup@cephfs" is not there anymore

## Cleanup pool list

Due to an existing bug on cached data, we need to restart manila-scheduler.

$ oc rollout restart statefulset manila-scheduler 

+----------------------------+-----------+-----------+--------+
| Name                       | Host      | Backend   | Pool   |
+----------------------------+-----------+-----------+--------+
| hostgroup@cephfsnfs#cephfs | hostgroup | cephfsnfs | cephfs |
+----------------------------+-----------+-----------+--------+

With the current behavior I think we're good to go. WRT documentation, I think we need to mention that we might need to restart the scheduler. @gouthampacha do we have any upstream bug or Jira to fix the scheduler?

fmount · 2024-10-04T12:40:40Z

/test functional

softwarefactory-project-zuul · 2024-10-04T12:46:17Z

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/a6dbd5c6a6e94ba69592d7a48f9c342e

❌ openstack-k8s-operators-content-provider FAILURE in 8m 59s
⚠️ manila-operator-kuttl SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider (non-voting)
⚠️ manila-operator-tempest SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider (non-voting)

If the share cleanup job is executed right after the share deletion, it might be possible that the other manila components have not yet been notified about the event. The result is that the job has no effect. This patch introduces a parameter for the ManilaJob that is supposed to delay the command. A constant has been defined to properly indentify the delay time associated with the service cleanup. Signed-off-by: Francesco Pantano <[email protected]>

fmount · 2024-10-04T12:53:39Z

/test functional

gouthampacha · 2024-10-07T17:35:05Z

With the current behavior I think we're good to go. WRT documentation, I think we need to mention that we might need to restart the scheduler. @gouthampacha do we have any upstream bug or Jira to fix the scheduler?

Agreed; as a known issue: https://bugs.launchpad.net/manila/+bug/1915094

gouthampacha · 2024-10-07T17:35:13Z

/lgtm

Thank you @fmount

fmount requested review from abays, gouthampacha and silvacarloss September 23, 2024 08:34

openshift-ci bot requested a review from frenzyfriday September 23, 2024 08:35

openshift-ci bot added the approved label Sep 23, 2024

fmount added the do-not-merge/hold label Sep 23, 2024

fmount force-pushed the share_cleanup branch from 25f42a8 to 768f079 Compare September 23, 2024 08:56

fmount commented Oct 4, 2024

View reviewed changes

pkg/manila/const.go Show resolved Hide resolved

fmount force-pushed the share_cleanup branch from 768f079 to 23b5301 Compare October 4, 2024 11:53

fmount added 2 commits October 4, 2024 13:54

fmount removed the do-not-merge/hold label Oct 4, 2024

fmount commented Oct 4, 2024

View reviewed changes

fmount force-pushed the share_cleanup branch from 963e223 to e774ffa Compare October 4, 2024 12:31

fmount force-pushed the share_cleanup branch from e774ffa to 7375047 Compare October 4, 2024 12:46

openshift-ci bot assigned gouthampacha Oct 7, 2024

openshift-ci bot added the lgtm label Oct 7, 2024

openshift-merge-bot bot merged commit e746e34 into openstack-k8s-operators:main Oct 7, 2024
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Share cleanup job #337

Share cleanup job #337

fmount commented Sep 23, 2024 •

edited

Loading

openshift-ci bot commented Sep 23, 2024

fmount commented Sep 23, 2024

fmount commented Sep 23, 2024

silvacarloss commented Sep 23, 2024 •

edited

Loading

fmount Oct 4, 2024

fmount Oct 4, 2024

fmount commented Oct 4, 2024 •

edited

Loading

fmount commented Oct 4, 2024

softwarefactory-project-zuul bot commented Oct 4, 2024

fmount commented Oct 4, 2024

gouthampacha commented Oct 7, 2024

gouthampacha commented Oct 7, 2024

Share cleanup job #337

Share cleanup job #337

Conversation

fmount commented Sep 23, 2024 • edited Loading

openshift-ci bot commented Sep 23, 2024

fmount commented Sep 23, 2024

fmount commented Sep 23, 2024

silvacarloss commented Sep 23, 2024 • edited Loading

fmount Oct 4, 2024

Choose a reason for hiding this comment

fmount Oct 4, 2024

Choose a reason for hiding this comment

fmount commented Oct 4, 2024 • edited Loading

fmount commented Oct 4, 2024

softwarefactory-project-zuul bot commented Oct 4, 2024

fmount commented Oct 4, 2024

gouthampacha commented Oct 7, 2024

gouthampacha commented Oct 7, 2024

fmount commented Sep 23, 2024 •

edited

Loading

silvacarloss commented Sep 23, 2024 •

edited

Loading

fmount commented Oct 4, 2024 •

edited

Loading