-
Notifications
You must be signed in to change notification settings - Fork 617
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[occm] Get IP addresses of neutron subports #2228
Conversation
Hi @jingczhang. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/ok-to-test |
and do you think we can consider add a functional test to this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies if I've missed something really obvious, but this doesn't look right to me. I can't see how getSubInterfaces()
in particular can work. Is there some code missing which populates interfaces? Even If it did work it would have potentially catastrophic performance impacts on some clouds due to the nested API calls. I don't think we should merge this with its current code structure.
I wonder if we can do something much more efficient with https://docs.openstack.org/api-ref/network/v2/index.html?expanded=list-trunks-detail,show-trunk-details-detail#show-trunk-details?
@dulek Can you take a look at this when you're around?
Yes, we will look into adding test case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This definitely needs unit tests and comments. It's very hard to understand what it's doing.
Unit test TestNodeAddressesWithSubports() is added, it will be in the next PR update. I will try to upload the update soon. |
PR is updated to address all the comments received so far. Please kindly review to make sure. |
Hi @mdbooth, I think the last PR update has addressed all your comments, can you please check? |
@jingczhang Hi, thanks for updating. I will try to re-review as soon as possible, hopefully tomorrow. A quick glance suggests the comments and unit tests aren't quite what I was hoping for. When I review properly I'll try to find some pointers to good examples. I would like to see us merge a fix for this issue, and I think we can do it in this PR. My concern is that this code runs a lot and any issues in it have the potential to cause issues in the OpenStack control plane. I'm also aware that it can affect different deployments differently (e.g. a small cluster running against neutron config X may be fine, but a large cluster running against neutron config Y may fail catastrophically). I want to be sure we aren't going to break anybody's deployment by adding this, but I'm definitely in favour of where it's going. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks fairly okay after the round of Matt's comments getting addressed. I'm dropping a few comments, but nothing too serious from my side.
@jingczhang: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/assign |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still many things here, but the biggest one is: this code should not add cost for users who don't need it. Achieving that may require some refactoring.
pkg/openstack/instances.go
Outdated
listOpts := trunks.ListOpts{ | ||
PortID: iface.PortID, | ||
} | ||
allPages, err := trunks.List(network, listOpts).AllPages() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still need to check if trunks are enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is already checked, API error is returned if trunk plugin is not enable, the API error stops the flow and returns to caller with no subports.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, thanks for the pointer. The patch is built for trunk plugin is configured (normal case) and trunk is used (vlan-aware VM use case), checking for if trunk plugin is configured was not needed so not considered. It seems this checking won't be necessary after baseline code restructure per @mdbooth.
pkg/openstack/instances.go
Outdated
@@ -708,6 +819,10 @@ func getAttachedInterfacesByID(client *gophercloud.ServiceClient, serviceID stri | |||
if mc.ObserveRequest(err) != nil { | |||
return interfaces, err | |||
} | |||
|
|||
subInterfaces, err := getSubInterfaces(interfaces, network) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want this to be free if trunks aren't in use, but it's not: it's still adding at least 1 additional API call per attached port (list trunks by port id). This seems like a very high cost for a feature most people aren't going to be using.
I wonder if we do need to refactor all of this to fetch ports by device id instead of attachinterfaces, which would allow us to use trunk details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not really, the getSubInterfaces() returns upon the first trunk API error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, reading this again and realizing you refer to CPU cycles spent on checking if trunk is configured, this cost depends on number of VMs and ports per VM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not so much CPU cycles as API calls. Remember we're running this loop for every node in every cluster every 10 minutes by default. At only 100 nodes that's still every 6 seconds on average. Additional API calls here are expensive.
PR is updated per last round code comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The more I think about this, the more I think we should refactor the surrounding code before doing this. The way you've approached this is probably the only reasonable way to do this with the existing code structure. However, it adds an additional api call per attached interface for every node in your cloud, every 10 minutes, whether you're using subports or not. Given that nobody is currently using subports in this way and most people probably never will, this feels like an unreasonably large cost.
I think we need to fetch ports directly from neutron by deviceID in the first instance. This would also show us subports if there are any, meaning the cost of the feature would only be paid by systems which are using it.
In short:
- Refactor nodeAddresses() as above
- Rewrite this PR to use the refactored nodeAddresses
I think the feature is good, but I would prefer that this PR didn't merge in its current form due to its impact on the majority of users.
Hi @mdbooth, thank you very much for your review and plan. We have the interest to upstream this so as to get rid of the "out of tree" rebase trouble. Looking forward to retry this PR after your code restructure then. |
Unfortunately although I think this code could do with a refactor with or without the addition of subports, it's not on my todo list right now 😞 |
FYI I've added the following:
Additionally, @pierreprinetti has been noodling with a gophercloud/utils library to allow us to fetch multiple specified ports in a single API call: gophercloud/utils#190. With all of these in place, I believe:
|
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@jingczhang both of the above mentioned changes are now merged, and I've bumped the gophercloud dep to 1.4.0 which includes the trunk_details patch. #2250 has almost certainly caused the merge conflict in this patch so you're probably going to have to re-work it even for your downstream fork. By using trunk_details you should now be able to implement this with zero cost for anybody not using the feature. Are you likely to update the PR? |
Thank you for the information. Sure, I will update the PR. |
0490264
to
fd113d2
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Vlan-aware VMs are commonly used in telecom.
Those VMs are plugged into flat networks and use neutron trunk.
https://docs.openstack.org/neutron/latest/admin/config-trunking.html
Currently, getAttachedInterfacesByID() only returns neutron ports directly attached to VM.
For vlan-aware VMs, IP addresses are assigned on neutron subports.
Subports are attached to the trunk; they are not attached to the VM directly.
This pull request improves getAttachedInterfacesByID() to return IP addresses of neutron subports when they exist.
Without this change, Kubernetes is unable to get IP addresses of vlan-aware VMs.
This change is transparent to VMs not using neutron trunk.
Special notes for reviewers:
This is a re-attempt of the following PR which was idle out:
#1535
The code has been in production for two years now.
Release note: