Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support configurable microversions for optional features like multi-attachable volumes #1448

Closed
lentzi90 opened this issue Jan 17, 2023 · 19 comments · Fixed by #1567
Closed
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@lentzi90
Copy link
Contributor

/kind feature

Specifically we have a use case for running CAPO in an OpenStack environment where all volumes are multi-attachable. This requires Nova microversion >2.73 if I'm not mistaken. (Currently CAPO is hard coded to 2.53.)

Describe the solution you'd like

Ideally, I think CAPO should work similar to other OpenStack clients in that users should be able to specify the default microversions used. This is normally done either with environment variables or in the clouds.yaml:

A user of the REST layer may set the default microversion by setting {service_type}_default_microversion in clouds.yaml or OS_{service_type|upper}_DEFAULT_MICROVERSION environment variable.
https://docs.openstack.org/openstacksdk/rocky/user/microversions.html

However, it seems that gophercloud is currently lacking support for this (gophercloud/utils#178).
But this is the way I would like to see things work. We would add support for it in gophercloud and then use the specified version (if any) in CAPO, instead of hard coding it.

const NovaMinimumMicroversion = "2.53"

If no default version is specified, we could fall back to the minimum supported version in CAPO. Although it would be nice to try to detect the latest version that works instead.

@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Jan 17, 2023
@jichenjc
Copy link
Contributor

Ideally, I think CAPO should work similar to other OpenStack clients in that users should be able to specify the default microversions used

yes, I think it's reasonable approach

But this is the way I would like to see things work. We would add support for it in gophercloud and then use the specified version (if any) in CAPO, instead of hard coding it.

yes, we should add such , guess there are 2 use cases so far

If no default version is specified, we could fall back to the minimum supported version in CAPO. Although it would be nice to try to detect the latest version that works instead.

based on my formal microversion experience, I think it's not that easy to detect the latest version ?
as if we want to compare, we still have to know which version add the support and that's still hard code

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 18, 2023
@mdbooth
Copy link
Contributor

mdbooth commented Apr 18, 2023

I think I've said this elsewhere, but just in case: a brief recap.

Microversions define different APIs with different interfaces and different semantics. They are usually (but not always) pretty similar to what came before, but pretty similar != the same. IOW users don't get to pick supported microversions: developers have to develop against microversions, because they're different APIs.

If we want to support multiple microversions we need to have support in code for multiple microversions. If there's some code which doesn't work with microversion Y then we need to guard that code appropriately.

That isn't to say we shouldn't do it. Just that:

  • It can't be user-configurable
  • It's expensive to support and test

If we do it, we should confine it to specific parts of the code. We don't want to change the base microversion everywhere without a lot of testing.

@dulek
Copy link
Contributor

dulek commented Apr 19, 2023

To me microversions can be used to detect support for a feature. In this case we'd do multi-attachable volumes if available microversion exceeds 2.73 and then we use 2.73 for such requests directly (as technically microversion can remove support for that too in the future). So it's not support for any microversion, it's a support for certain microversion to do a certain thing supported by that microversion.

@jichenjc
Copy link
Contributor

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 20, 2023
@lentzi90
Copy link
Contributor Author

Microversions define different APIs with different interfaces and different semantics. They are usually (but not always) pretty similar to what came before, but pretty similar != the same. IOW users don't get to pick supported microversions: developers have to develop against microversions, because they're different APIs.

If we want to support multiple microversions we need to have support in code for multiple microversions. If there's some code which doesn't work with microversion Y then we need to guard that code appropriately.

That isn't to say we shouldn't do it. Just that:

* It can't be user-configurable

* It's expensive to support and test

If we do it, we should confine it to specific parts of the code. We don't want to change the base microversion everywhere without a lot of testing.

I may not be aware of the full picture here, but my understanding of this is quite different.
Probably the core question here is who the "user" is in this case. The user must be able to set the version, but is it CAPO that is the user or is it the user of CAPO that is the user?

For reference, here is a document about the microversion support in OpenStack. It says the following:

Users of the REST API are able to, on a per request basis, decide which version of the API they want to use (assuming the deployer supports the version they want).

So if CAPO is the user, then CAPO decides the version. But if I'm the user, then I decide the version. In my opinion CAPO is just the tool I use to talk to the API, and so it does not get to decide the version. CAPO may not support all versions though, and that is fine. However, CAPO in itself relies on gophercloud for talking to the API, so I would say that is where the supported/not supported decision should come from.

The other part of this is about support and test. I'm a bit conflicted here honestly, but I'm leaning in the direction of pushing more responsibilities on the user (of CAPO). We can have a default version that is well tested and supported. But users that accept the risk must be able to configure the version without making code changes. There are so many different OpenStack providers out there. Some require a specific minimum version because of their configuration (e.g. required multi-attachable volumes), other run very old versions and cannot support anything recent.

I guess what I'm saying is, please allow the version to be set by the user (through the clouds.yaml). This way of setting the version is already supported by other clients. Just as with these other clients, the user must understand the implications when setting specific versions or risk all kinds of issues. But if we don't allow the user to set the version, then we make CAPO unusable in many environments.

Also, the current behavior is quite surprising if you are used to configuring these things in clouds.yaml. I can set a version, it will not produce any errors, but a completely different version will anyway be used.

@mdbooth
Copy link
Contributor

mdbooth commented Apr 26, 2023

The 'user' in the context of that documentation is the entity making the REST API call, so in that context the user is CAPO. The 'user' (i.e. CAPO) gets to pick the API contract they're expecting.

The problem of giving that to our 'user' is that CAPO expects a certain API contract. If somebody changes that, the CAPO developers (i.e. we) need to have considered that in advance: it's a different API. Setting it in clouds.yaml is useful for a user to define the contract used by their shell tools. They are making their own choice which API contract they are using. But if some tool is going to consume the output then that tool had better support the microversion they picked, or it won't work. That's the situation we're in.

Gophercloud doesn't know about it, btw. Gophercloud just marshals and unmarshals whatever is in the request and response. It basically supports a superset of everything and leaves it up to the caller to know what's required in the request and will be returned in the response for a given microversion.

Here's an idea how we could do this, though:

We define a 'base' microversion. This is the absolute minimum we support at all and the default microversion for all code not covered by a feature gate. It's hard-coded, and not user-configurable.

We define a set of feature gates which require a higher microversion, e.g. tags. Each feature is associated with a microversion. Code for that feature will:

  • Be conditional on the feature being enabled
  • Execute API calls for only that feature with the higher microversion

All feature gates will be enabled by default, but can be disabled individually on the command line.

CI probably only tests 'everything enabled'.

@mdbooth
Copy link
Contributor

mdbooth commented Apr 26, 2023

To me microversions can be used to detect support for a feature. In this case we'd do multi-attachable volumes if available microversion exceeds 2.73 and then we use 2.73 for such requests directly (as technically microversion can remove support for that too in the future). So it's not support for any microversion, it's a support for certain microversion to do a certain thing supported by that microversion.

Good point. If we did the feature gate thing we would already have a map of features to microversions. If we detected the maximum microversion supported by the server it should be trivial to automatically disable non-supported features.

@stephenfin
Copy link
Contributor

stephenfin commented Apr 27, 2023

EDIT: After writing the below, I realised @mdbooth is effectively saying the same thing. Leaving here for posterity.

Microversions define different APIs with different interfaces and different semantics. They are usually (but not always) pretty similar to what came before, but pretty similar != the same. IOW users don't get to pick supported microversions: developers have to develop against microversions, because they're different APIs.
If we want to support multiple microversions we need to have support in code for multiple microversions. If there's some code which doesn't work with microversion Y then we need to guard that code appropriately.
That isn't to say we shouldn't do it. Just that:

* It can't be user-configurable

* It's expensive to support and test

If we do it, we should confine it to specific parts of the code. We don't want to change the base microversion everywhere without a lot of testing.

I may not be aware of the full picture here, but my understanding of this is quite different. Probably the core question here is who the "user" is in this case. The user must be able to set the version, but is it CAPO that is the user or is it the user of CAPO that is the user?

It's a bit weird to think about semantically but I'd argue they're both users. Both openstackclient or openstacksdk allow you to request a microversion, but to actually support this microversion they need to be able to understand it. I think it's the same for CAPO. We can allow humans to set arbitrary microversions but if we do then we're opening ourselves up to bugs. Microversions can change absolutely anything - the fields or headers required in a response and provided in a response, the status codes of responses, or even the existence of the API - and unless we plan to evolve CAPO in lock-step with nova's microversion evolution, we can't be sure that we will make a request in the correct format or understand what we get back.

For reference, here is a document about the microversion support in OpenStack. It says the following:

Users of the REST API are able to, on a per request basis, decide which version of the API they want to use (assuming the deployer supports the version they want).

So if CAPO is the user, then CAPO decides the version. But if I'm the user, then I decide the version. In my opinion CAPO is just the tool I use to talk to the API, and so it does not get to decide the version. CAPO may not support all versions though, and that is fine. However, CAPO in itself relies on gophercloud for talking to the API, so I would say that is where the supported/not supported decision should come from.

You the human aren't talking to the (nova) API: you're talking to CAPO. I think CAPO has to be opinionated for the reasons above.

The other part of this is about support and test. I'm a bit conflicted here honestly, but I'm leaning in the direction of pushing more responsibilities on the user (of CAPO). We can have a default version that is well tested and supported. But users that accept the risk must be able to configure the version without making code changes. There are so many different OpenStack providers out there. Some require a specific minimum version because of their configuration (e.g. required multi-attachable volumes), other run very old versions and cannot support anything recent.

I'm not sure why you would not add that logic to CAPO and upgrade/downgrade capabilities depending on what we're running on? This is what tools like openstackclient and libraries like openstacksdk do. Nova (and virtually all other OpenStack services) expose their supported versions via the API root so it's certainly possible to know what we can do ahead of time. You can check against your own cluster with openstack versions show.

(Also, fwiw, 2.73 is pretty old (from Train) - I'd suspect this feature to be pretty widely supported these days)

I guess what I'm saying is, please allow the version to be set by the user (through the clouds.yaml). This way of setting the version is already supported by other clients. Just as with these other clients, the user must understand the implications when setting specific versions or risk all kinds of issues. But if we don't allow the user to set the version, then we make CAPO unusable in many environments.

Per above, I don't think this is the thing to do. I'd be all for dynamic version configuration but I don't think we should blindly use what a user gives us unless we want to specifically support all microversions (which ideally requires good testing)

Also, the current behavior is quite surprising if you are used to configuring these things in clouds.yaml. I can set a version, it will not produce any errors, but a completely different version will anyway be used.

We should log that we ignore this.

@lentzi90
Copy link
Contributor Author

lentzi90 commented May 2, 2023

Thanks for the comments! I guess I will have to swallow it and start working on those feature gates 🙂

Just for the records though, I'm not saying we should claim to support any user configurable microversion, I'm saying that CAPO already works with multiple versions and that the only thing preventing people from using it is that the version is hard coded. (We have forks for this sole purpose.) So I suggest allowing the user to set the version, while acknowledging that this may break things.

That said, I agree that having feature gates for this sounds like a good way forward to provide explicit support for different versions. My question then would be how to handle versions that doesn't impact CAPO directly and how to indicate the version to use?

As an example, the multi-attachable volumes didn't require any changes to CAPO at all, except bumping the microversion. So let's say there is a server that requires a specific version. This version is not the default hard coded version in CAPO, and it doesn't impact CAPO in itself. Then is there a reason to have a feature gate for it, and if not, how would I go about configuring CAPO to use this version?

Detecting what the server requires is of course one way to know what version to use at least. But it is not clear to me how to handle the feature gates to allow CAPO to go outside of the well-tested hard coded version. Would every microversion get their own feature flag (e.g. --feature-gate-nova-2.73)?

@mdbooth
Copy link
Contributor

mdbooth commented May 2, 2023

Thanks for the comments! I guess I will have to swallow it and start working on those feature gates 🙂

Just for the records though, I'm not saying we should claim to support any user configurable microversion, I'm saying that CAPO already works with multiple versions and that the only thing preventing people from using it is that the version is hard coded. (We have forks for this sole purpose.) So I suggest allowing the user to set the version, while acknowledging that this may break things.

In contrast, when we bumped for tag support I think we had to fix a couple of things. It's really pot luck.

That said, I agree that having feature gates for this sounds like a good way forward to provide explicit support for different versions. My question then would be how to handle versions that doesn't impact CAPO directly and how to indicate the version to use?

As an example, the multi-attachable volumes didn't require any changes to CAPO at all, except bumping the microversion. So let's say there is a server that requires a specific version. This version is not the default hard coded version in CAPO, and it doesn't impact CAPO in itself. Then is there a reason to have a feature gate for it, and if not, how would I go about configuring CAPO to use this version?

We want the microversion in use to be deterministic and tested, so if a section of code executes, it always executes with the same, tested microversion. If that microversion isn't available, it doesn't execute. The microversion used can be set for every individual API call, so different sections of code can and will use different microversions.

Unless we want some kind of microversion guard for all code, we're going to want a 'base' microversion which is the default. All code would execute against this microversion unless specifically raised. This would be the minimum microversion supported under any circumstances.

The multi-attach example is a really interesting edge case, though. From memory, the call itself didn't change, only the semantics of the result. In this case we'd want to support at 2 different microversions, but they could both execute the same code. The microversions supported would still be explicit, deterministic, and tested, though.

Detecting what the server requires is of course one way to know what version to use at least. But it is not clear to me how to handle the feature gates to allow CAPO to go outside of the well-tested hard coded version. Would every microversion get their own feature flag (e.g. --feature-gate-nova-2.73)?

I wonder if rather than a feature gate we'd do better with some framework code for executing a block of code under a different explicit microversion, complete with a guard and options not to execute if it's not supported. We'd want to play with this, though. It could very quickly make lots of code unreadable.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 19, 2024
@lentzi90
Copy link
Contributor Author

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 19, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 18, 2024
@lentzi90
Copy link
Contributor Author

Waiting for gophercloud v2
/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 19, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 18, 2024
@stephenfin
Copy link
Contributor

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 18, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 16, 2024
@lentzi90
Copy link
Contributor Author

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 24, 2024
@github-project-automation github-project-automation bot moved this from In Progress to Done in CAPO Roadmap Nov 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

7 participants