Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add volumeattach/volumedetach API #4359

Merged
merged 1 commit into from
Nov 6, 2024

Conversation

zedi-pramodh
Copy link

Longhorn volumes are attached to a node where the app is running. Since the volumes we create in eve are RWO, ie only one node can attach at a time, during failover we need to make sure the node that failed is detached from the volume.

The node that is taking over will automatically attach to the volume if no other node is attached. Also, generally its very safe to detach and then attach to avoid any potential corruption.

@eriknordmark
Copy link
Contributor

I assume there will be a follow-up PR to call this APIs. Is that correct?

@andrewd-zededa can you review?

@zedi-pramodh
Copy link
Author

I assume there will be a follow-up PR to call this APIs. Is that correct?

@andrewd-zededa can you review?

That's correct. We are trying to split PRs so that they are independent and can compile, easy to review too.

Copy link
Contributor

@andrewd-zededa andrewd-zededa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM mostly, I think we need a some comments around the volumeattachment logging to ensure they are not downgraded to Function/Trace in the future.

// Get the Kubernetes clientset
clientset, err := GetClientSet()
if err != nil {
err = fmt.Errorf("failed to get clientset: %v", err)
Copy link
Contributor

@rene rene Oct 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A new error it's being created here, why you just don't propagate the original error (from the GetClientSet() call)? Understood, I was looking to line 355, now I think you should use the same approach used here there... I will mark with a comment....

@zedi-pramodh
Copy link
Author

Can someone bless this one, if there are no further concerns.

@OhmSpectator
Copy link
Member

LGTM mostly, I think we need a some comments around the volumeattachment logging to ensure they are not downgraded to Function/Trace in the future.

@andrewd-zededa, @zedi-pramodh was this comment addressed?

Copy link
Contributor

@eriknordmark eriknordmark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zedi-pramodh
Copy link
Author

LGTM mostly, I think we need a some comments around the volumeattachment logging to ensure they are not downgraded to Function/Trace in the future.

@andrewd-zededa, @zedi-pramodh was this comment addressed?

I don't understand what Andrew meant actually :)

Copy link
Member

@OhmSpectator OhmSpectator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe @andrewd-zededa it's OK, as I cannot review it properly myself. Hence, I am approving. Nevertheless, I left several questions to better understand the code.

return "", err
}
// Get the PersistentVolumeClaim (PVC)
pvc, err := clientset.CoreV1().PersistentVolumeClaims(EVEKubeNameSpace).Get(context.TODO(), pvcName, metav1.GetOptions{})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just curious what is the context.TODO() in this case and will it be changed later...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I'm a little bit worried about the amount of dereferencing in one line. Could be hard to debug if one of them fails.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The err should contain the actual error if something fails.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The err variable will catch only the error returned by the final .Get method call, not by each method in the chain (clientset.CoreV1(), PersistentVolumeClaims(EVEKubeNameSpace)). If there's a subtle issue in one of those intermediate steps, like clientset or EVEKubeNameSpace, you might not catch it right away.
But it's a minor for the PR.

return "", "", err
}
// List all VolumeAttachments and find the one corresponding to the PV
volumeAttachments, err := clientset.StorageV1().VolumeAttachments().List(context.TODO(), metav1.ListOptions{})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a synchronous call to the API? Are we sure we will not have issues with the watchdog in this case if the requests are handled longer than expected? This is a generic question for all API calls here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its is synchronous call, but I do not anticipate it takes even a second to get the result. Number of volumes we have in our deployments are in single digits.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, and what about the other API calls in the file? Like .Delete()... So we are pretty sure all of them run fast enough, yep?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far we have not seen any issues with watchdog for the all kubernetes calls. Also I thought domainmgr/volumemgr executes the jobs in asynchronous worker threads and should not block anything to trigger watchdog. Is that not right ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also Kubernetes API calls execution are asynchronous by default. So only sync call is to issue the command and return back after that any action will happen in background.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any more questions ? Or can this be merged ?

@eriknordmark
Copy link
Contributor

This needs to be rebased on master - the SPDX check workflow has issues with misdetecting issues.

Longhorn volumes are attached to a node where the app is running.
Since the volumes we create in eve are RWO, ie only one node can attach at a time,
during failover we need to make sure the node that failed is detached from the volume.

The node that is taking over will automatically attach to the volume if no other node is
attached. Also, generally its very safe to detach and then attach to avoid any potential corruption.

Signed-off-by: Pramodh Pallapothu <[email protected]>
@zedi-pramodh
Copy link
Author

This needs to be rebased on master - the SPDX check workflow has issues with misdetecting issues.

Rebased to master and amended the commit

Copy link
Contributor

@eriknordmark eriknordmark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-run tests

@OhmSpectator
Copy link
Member

The same Eden issue =((((

@zedi-pramodh
Copy link
Author

The same Eden issue =((((

Those should be completely unrelated. This code path is not even executed yet.

@eriknordmark eriknordmark merged commit c53838a into lf-edge:master Nov 6, 2024
64 of 75 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants