Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[occm] Get IP addresses of neutron subports #2228

Closed
wants to merge 0 commits into from

Conversation

jingczhang
Copy link
Contributor

@jingczhang jingczhang commented May 2, 2023

Vlan-aware VMs are commonly used in telecom.
Those VMs are plugged into flat networks and use neutron trunk.
https://docs.openstack.org/neutron/latest/admin/config-trunking.html

Currently, getAttachedInterfacesByID() only returns neutron ports directly attached to VM.
For vlan-aware VMs, IP addresses are assigned on neutron subports.
Subports are attached to the trunk; they are not attached to the VM directly.
This pull request improves getAttachedInterfacesByID() to return IP addresses of neutron subports when they exist.
Without this change, Kubernetes is unable to get IP addresses of vlan-aware VMs.
This change is transparent to VMs not using neutron trunk.

Special notes for reviewers:
This is a re-attempt of the following PR which was idle out:
#1535
The code has been in production for two years now.

Release note:

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 2, 2023
@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label May 2, 2023
@k8s-ci-robot
Copy link
Contributor

Hi @jingczhang. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label May 2, 2023
@jichenjc
Copy link
Contributor

jichenjc commented May 3, 2023

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 3, 2023
pkg/openstack/instances.go Outdated Show resolved Hide resolved
pkg/openstack/instances.go Outdated Show resolved Hide resolved
@jichenjc
Copy link
Contributor

jichenjc commented May 3, 2023

and do you think we can consider add a functional test to this?

Copy link
Contributor

@mdbooth mdbooth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies if I've missed something really obvious, but this doesn't look right to me. I can't see how getSubInterfaces() in particular can work. Is there some code missing which populates interfaces? Even If it did work it would have potentially catastrophic performance impacts on some clouds due to the nested API calls. I don't think we should merge this with its current code structure.

I wonder if we can do something much more efficient with https://docs.openstack.org/api-ref/network/v2/index.html?expanded=list-trunks-detail,show-trunk-details-detail#show-trunk-details?

@dulek Can you take a look at this when you're around?

pkg/openstack/instances.go Outdated Show resolved Hide resolved
pkg/openstack/instances.go Outdated Show resolved Hide resolved
pkg/openstack/instances.go Outdated Show resolved Hide resolved
pkg/openstack/instances.go Outdated Show resolved Hide resolved
pkg/openstack/instances.go Outdated Show resolved Hide resolved
pkg/openstack/instances.go Outdated Show resolved Hide resolved
pkg/openstack/instances.go Outdated Show resolved Hide resolved
@jingczhang
Copy link
Contributor Author

and do you think we can consider add a functional test to this?

Yes, we will look into adding test case.

Copy link
Contributor

@mdbooth mdbooth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This definitely needs unit tests and comments. It's very hard to understand what it's doing.

pkg/openstack/instances.go Outdated Show resolved Hide resolved
pkg/openstack/instances.go Outdated Show resolved Hide resolved
pkg/openstack/instances.go Outdated Show resolved Hide resolved
@jingczhang
Copy link
Contributor Author

Unit test TestNodeAddressesWithSubports() is added, it will be in the next PR update. I will try to upload the update soon.

@jingczhang
Copy link
Contributor Author

PR is updated to address all the comments received so far. Please kindly review to make sure.
Jing

@jingczhang
Copy link
Contributor Author

Hi @mdbooth, I think the last PR update has addressed all your comments, can you please check?
Hi all reviewers, please let me know if you have additional comments? also, I think I have replied all comments, if there is pending comments I haven't addressed (esp not addressed from the review procedure point of view) please let me know.

@mdbooth
Copy link
Contributor

mdbooth commented May 10, 2023

@jingczhang Hi, thanks for updating. I will try to re-review as soon as possible, hopefully tomorrow.

A quick glance suggests the comments and unit tests aren't quite what I was hoping for. When I review properly I'll try to find some pointers to good examples.

I would like to see us merge a fix for this issue, and I think we can do it in this PR. My concern is that this code runs a lot and any issues in it have the potential to cause issues in the OpenStack control plane. I'm also aware that it can affect different deployments differently (e.g. a small cluster running against neutron config X may be fine, but a large cluster running against neutron config Y may fail catastrophically). I want to be sure we aren't going to break anybody's deployment by adding this, but I'm definitely in favour of where it's going.

Copy link
Contributor

@dulek dulek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks fairly okay after the round of Matt's comments getting addressed. I'm dropping a few comments, but nothing too serious from my side.

pkg/openstack/instances.go Outdated Show resolved Hide resolved
pkg/openstack/instances.go Outdated Show resolved Hide resolved
pkg/openstack/instances.go Outdated Show resolved Hide resolved
pkg/openstack/instances.go Outdated Show resolved Hide resolved
pkg/openstack/instances.go Outdated Show resolved Hide resolved
pkg/openstack/instances.go Outdated Show resolved Hide resolved
pkg/openstack/instances.go Outdated Show resolved Hide resolved
pkg/openstack/instances.go Outdated Show resolved Hide resolved
@k8s-ci-robot
Copy link
Contributor

@jingczhang: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
openstack-cloud-csi-manila-e2e-test 9a7db2c link true /test openstack-cloud-csi-manila-e2e-test

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@mdbooth
Copy link
Contributor

mdbooth commented May 11, 2023

/assign

Copy link
Contributor

@mdbooth mdbooth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still many things here, but the biggest one is: this code should not add cost for users who don't need it. Achieving that may require some refactoring.

pkg/openstack/instances.go Outdated Show resolved Hide resolved
pkg/openstack/instances.go Outdated Show resolved Hide resolved
pkg/openstack/instances.go Outdated Show resolved Hide resolved
pkg/openstack/instances.go Outdated Show resolved Hide resolved
pkg/openstack/instances.go Outdated Show resolved Hide resolved
pkg/openstack/instances.go Outdated Show resolved Hide resolved
pkg/openstack/instances.go Outdated Show resolved Hide resolved
listOpts := trunks.ListOpts{
PortID: iface.PortID,
}
allPages, err := trunks.List(network, listOpts).AllPages()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still need to check if trunks are enabled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already checked, API error is returned if trunk plugin is not enable, the API error stops the flow and returns to caller with no subports.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, thanks for the pointer. The patch is built for trunk plugin is configured (normal case) and trunk is used (vlan-aware VM use case), checking for if trunk plugin is configured was not needed so not considered. It seems this checking won't be necessary after baseline code restructure per @mdbooth.

@@ -708,6 +819,10 @@ func getAttachedInterfacesByID(client *gophercloud.ServiceClient, serviceID stri
if mc.ObserveRequest(err) != nil {
return interfaces, err
}

subInterfaces, err := getSubInterfaces(interfaces, network)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want this to be free if trunks aren't in use, but it's not: it's still adding at least 1 additional API call per attached port (list trunks by port id). This seems like a very high cost for a feature most people aren't going to be using.

I wonder if we do need to refactor all of this to fetch ports by device id instead of attachinterfaces, which would allow us to use trunk details.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not really, the getSubInterfaces() returns upon the first trunk API error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, reading this again and realizing you refer to CPU cycles spent on checking if trunk is configured, this cost depends on number of VMs and ports per VM

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not so much CPU cycles as API calls. Remember we're running this loop for every node in every cluster every 10 minutes by default. At only 100 nodes that's still every 6 seconds on average. Additional API calls here are expensive.

pkg/openstack/openstack_test.go Outdated Show resolved Hide resolved
@jingczhang
Copy link
Contributor Author

Thanks @mdbooth and @dulek for the code comments. I will address them first. Additional unit tests will follow after that.

@jingczhang
Copy link
Contributor Author

PR is updated per last round code comments.
Hi @mdbooth, I need pointer for writing the 1st unit test you've requested, it is not obvious to me how it could be done, please advise.
<===
We need test coverage for getAttachedInterfacesByID(), e.g.:
Trunking is disabled

Copy link
Contributor

@mdbooth mdbooth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The more I think about this, the more I think we should refactor the surrounding code before doing this. The way you've approached this is probably the only reasonable way to do this with the existing code structure. However, it adds an additional api call per attached interface for every node in your cloud, every 10 minutes, whether you're using subports or not. Given that nobody is currently using subports in this way and most people probably never will, this feels like an unreasonably large cost.

I think we need to fetch ports directly from neutron by deviceID in the first instance. This would also show us subports if there are any, meaning the cost of the feature would only be paid by systems which are using it.

In short:

  • Refactor nodeAddresses() as above
  • Rewrite this PR to use the refactored nodeAddresses

I think the feature is good, but I would prefer that this PR didn't merge in its current form due to its impact on the majority of users.

@jingczhang
Copy link
Contributor Author

Hi @mdbooth, thank you very much for your review and plan. We have the interest to upstream this so as to get rid of the "out of tree" rebase trouble. Looking forward to retry this PR after your code restructure then.

@mdbooth
Copy link
Contributor

mdbooth commented May 16, 2023

Unfortunately although I think this code could do with a refactor with or without the addition of subports, it's not on my todo list right now 😞

@mdbooth
Copy link
Contributor

mdbooth commented May 23, 2023

FYI I've added the following:

Additionally, @pierreprinetti has been noodling with a gophercloud/utils library to allow us to fetch multiple specified ports in a single API call: gophercloud/utils#190.

With all of these in place, I believe:

  • subport support adds no overhead for anybody not using it
  • subport support requires only 1 additional API call per nodeAddresses() for anybody who is using it (to fetch all detected subports)

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 31, 2023
@k8s-ci-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@mdbooth
Copy link
Contributor

mdbooth commented Jun 1, 2023

@jingczhang both of the above mentioned changes are now merged, and I've bumped the gophercloud dep to 1.4.0 which includes the trunk_details patch. #2250 has almost certainly caused the merge conflict in this patch so you're probably going to have to re-work it even for your downstream fork. By using trunk_details you should now be able to implement this with zero cost for anybody not using the feature. Are you likely to update the PR?

@jingczhang
Copy link
Contributor Author

@jingczhang both of the above mentioned changes are now merged, and I've bumped the gophercloud dep to 1.4.0 which includes the trunk_details patch. #2250 has almost certainly caused the merge conflict in this patch so you're probably going to have to re-work it even for your downstream fork. By using trunk_details you should now be able to implement this with zero cost for anybody not using the feature. Are you likely to update the PR?

Thank you for the information. Sure, I will update the PR.

@jingczhang jingczhang closed this Jun 6, 2023
@jingczhang jingczhang force-pushed the dev/trunk-support branch 2 times, most recently from 0490264 to fd113d2 Compare June 6, 2023 20:02
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 6, 2023
@jingczhang
Copy link
Contributor Author

Hi @mdbooth, I have rebased the code and created a new PR, please review and comments. We are on an older version of Kubernetes, I am in the process of porting the change and getting it tested.
#2270

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note-none Denotes a PR that doesn't merit a release note. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants