[New ephemeral]: aws_eks_cluster_auth should be turned into an ephemeral resource #40343

erpel · 2024-11-28T09:10:31Z

Description

The data source aws_eks_cluster_auth causes the token to be saved in the plan and potentially expiring before the apply. This is discussed in #13189. The new ephemeral resources in terraform 1.10 should address this perfectly:

Requested Resource(s) and/or Data Source(s)

ephemeral aws_eks_cluster_auth

Potential Terraform Configuration

data "aws_eks_cluster" "example" {
  name = "example"
}

ephemeral "aws_eks_cluster_auth" "example" {
  name = "example"
}

provider "kubernetes" {
  host                   = data.aws_eks_cluster.example.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.example.certificate_authority[0].data)
  token                  = ephemeral.aws_eks_cluster_auth.example.token
}

References

Original issue: #13189.

Documentation:

Example of other ephemeral resources in terraform-provider-aws:

Would you like to implement a fix?

None

The text was updated successfully, but these errors were encountered:

github-actions · 2024-11-28T09:10:44Z

Community Note

Voting for Prioritization

Please vote on this issue by adding a 👍 reaction to the original post to help the community and maintainers prioritize this request.
Please see our prioritization guide for information on how we prioritize.
Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request.

Volunteering to Work on This Issue

If you are interested in working on this issue, please leave a comment.
If this would be your first contribution, please review the contribution guide.

bryantbiggs · 2024-12-05T18:47:46Z

shouldn't you use the exec() method of the respective provider (kubernetes, helm, etc.)? that solves two problems - no credentials persisted to statefile, and no stale credentials

erpel · 2024-12-05T19:31:27Z

Using the exec parameter does solve the issue. I imagine that this is what many do today, including my organization. It does add reliance on an additional executable though and therefore reduces portability. The executable ist another component that needs to be maintained in CI environments where it might otherwise not be needed.
I believe implementing this using ephemeral resources would be worthwhile.

bryantbiggs · 2024-12-05T19:32:54Z

do users (humans) access your clusters today (i.e. - kubectl)? and what do you use for CI today?

TechIsCool · 2024-12-06T00:16:56Z

At a different company but we have a thin wrapper around kubectl and use both GitHub Actions and Jenkins to manage our clusters. We would love to see this support as we have multiple times had tokens expire at first time build of a heavy EKS cluster that takes hours to provision.

bryantbiggs · 2024-12-06T00:42:54Z

If you're using kubectl, are you using aws eks update-kubeconfig --name xxx as well?

gamunu · 2024-12-11T06:05:03Z

@bryantbiggs Running aws cli is not possible in managed services like Terraform Cloud.

erpel · 2024-12-11T06:34:05Z

Users do have access using kubectl and whatever else they like to use that can work with provided kubeconfig files. We're not using aws cli to manage kubeconfig files though, but not for technical reasons iirc.

CI is mainly Atlantis plus some pipelines running on GitLab CI. Both of them are possible to use with additional tooling but the simpler solution is preferable to me.

bschaatsbergen · 2024-12-20T18:14:00Z

Hi!

Thank you for reporting this issue, Terraform (1.10) supports referencing ephemeral resource attributes directly in providers. And, having an ephemeral variant available of aws_eks_cluster_auth would improve the security posture of Terraform users working with Amazon EKS and the Kubernetes or Helm provider as the temporary obtained IAM token is no longer persisted to the state.

If I recall correctly, both Helm and Kubernetes provide an exec block that allows you to obtain a token by invoking a binary at runtime, and I believe this is considered a good practice nowadays too (I'm no Amazon EKS expert, unlike @bryantbiggs)

The exec block typically looks like this:

exec {
  api_version = "client.authentication.k8s.io/v1alpha1"
  args        = ["eks", "get-token", "--cluster-name", data.aws_eks_cluster.cluster.name]
  command     = "aws"
}

However, this isn’t very native to Terraform, as CI/CD environments must manage an additional binary and ensure it’s available at runtime—one more thing to keep track of!

I’m not a subject matter expert on the eks command-line utility, but based on its implementation https://github.com/aws/aws-cli/blob/develop/awscli/customizations/eks/get_token.py#L129, it seems quite similar to ours. Do you know if there are any differences, @bryantbiggs? I’d be happy to dive deeper into this as we should weigh these things carefully to provide the most secure and robust authentication mechanism possible.

FWIW, using an ephemeral resource:

ephemeral "aws_eks_cluster_auth" "example" {
  name = data.aws_eks_cluster.example.id
}

provider "kubernetes" {
  host                   = data.aws_eks_cluster.example.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.example.certificate_authority[0].data)
  token                  = ephemeral.aws_eks_cluster_auth.example.token
}

provider "helm" {
  kubernetes {
    host                   = data.aws_eks_cluster.example.endpoint
    cluster_ca_certificate = base64decode(data.aws_eks_cluster.example.certificate_authority[0].data)
    token                  = ephemeral.aws_eks_cluster_auth.example.token
  }
}

Over in #40660 I'm prototyping this, and will provide an update soon. Thanks again!

bryantbiggs · 2024-12-20T18:30:51Z

in the respective providers, the exec() method is what is defined as the prferred approach for managed offerings which have short token expirations https://registry.terraform.io/providers/hashicorp/kubernetes/latest/docs#exec-plugins

The implementation looks similar to what is done in the awscli, but the risk is that the implementation changes but the method doesn't. The contract to users is that aws eks update-kubeconfig ... is stable, but what happens in the background is an implementation detail that is free to change, and most likely will across our offerings of EKS, local EKS clusters on Outpost, etc.

lorengordon · 2024-12-20T23:21:09Z

The exec method also is irritating in that it creates a tight coupling between the environment's shell setup for aws profiles and the kubernetes provider. I often have several aws providers and lots of profiles, and a different aws credential in my shell than what is being used by the desired aws provider. So, no. I don't care that shelling out with exec is the "preferred approach". It's wrong, and a bad option. The real problem is that there isn't yet a mechanism for renewing the credential via the aws_eks_cluster_auth data source and the kubernetes and helm providers. At least making the data source ephemeral buys us a little time, since the value would update between plan-time vs apply-time.

bryantbiggs · 2024-12-20T23:32:11Z

That just sounds like you need to rethink your configs and maybe refactor a bit

lorengordon · 2024-12-20T23:33:41Z

That just sounds like you need to rethink your configs and maybe refactor a bit

Negative. The configs are fine. It's just a lot of environments and customers. Please don't deflect.

bryantbiggs · 2024-12-20T23:43:51Z

So why are profiles causing an issue - you can specify a profile in the AWS provider and you can pass "--profile", "foo" to the exec()

I'm not seeing the issue

bryantbiggs · 2024-12-20T23:48:36Z

At least making the data source ephemeral buys us a little time, since the value would update between plan-time vs apply-time.

This doesn't make sense.

The difference between the exec() in the provider versus the data source is - using the data source, that GET request is made ASAP, which may be long before its needed. Using the provider, the GET request isn't made until the resource for the provider is first encountered in the graph.

To give a more concrete example - cluster upgrades. If you use the static token route, you are almost guaranteed to hit the expired token issue because the CP takes some time to update (8-10 minutes in the case of EKS), and then some additional time for node groups/etc. before you get down to the point where you start trying to use the token for Kubernetes/Helm resources. If you use the exec() method, that token isn't requested until after the CP has upgraded and at the point at which it first needs to be used

lorengordon · 2024-12-20T23:56:54Z

So why are profiles causing an issue - you can specify a profile in the AWS provider and you can pass "--profile", "foo" to the exec()

That's the exact definition of the tight coupling I'm talking about. Everyone (and every CI service) that runs the config then needs the exact named profile defined in their shell config (and of course, as everyone else is complaining about, also needs the aws cli installed). It doesn't work for teams larger than a single person. Rather than simply chaining from the aws provider config, which already has loads of options for resolving an aws credential.

At least making the data source ephemeral buys us a little time, since the value would update between plan-time vs apply-time.

This doesn't make sense.

It does make sense. The ephemeral data source resolves one credential at plan time, and then executes again at apply time and resolves another credential with a new expiration period. For execution models that save the plan and then run an apply with a saved plan at a future time, this is a very big deal.

bryantbiggs · 2024-12-21T00:16:12Z

So you said:

I often have several aws providers and lots of profiles, and a different aws credential in my shell than what is being used by the desired aws provider.

So you are already "tightly coupled" between your AWS profiles and the AWS provider, no? Because you have to specify the profile you want to use on the respective provider, such as:

provider "aws" {
  profile = "foo"
  region  = "us-east-1"
}

With an exec provider that looks like:

provider "kubernetes" {
  host                   = yyy
  cluster_ca_certificate = base64decode(xxx)
  exec {
    api_version = "client.authentication.k8s.io/v1beta1"
    args        = ["eks", "get-token", "--cluster-name", "example", "--profile", "foo"]
    command     = "aws"
  }
}

Or if you are managing your profiles out of band (i.e. - loading the respective creds in the environment using the profile), its just:

provider "aws" {
  region  = "us-east-1"
}

Which means your exec is:

provider "kubernetes" {
  host                   = yyy
  cluster_ca_certificate = base64decode(xxx)
  exec {
    api_version = "client.authentication.k8s.io/v1beta1"
    args        = ["eks", "get-token", "--cluster-name", "example"]
    command     = "aws"
  }
}

So I don't see how the exec() method creates a "tighter coupling" than what is already present

lorengordon · 2024-12-21T16:13:32Z

That again requires the aws cli to be present (simply setting the profile in aws provider does not require the aws cli), and specific named profiles to be configured on every system that executes that config. That's not reasonable. More typically, the resolution of the credential is up to the user or the executing environment. The "profile" is just an input variable, not hardcoded. Maybe it's null, maybe it's not. It is up to the user how to resolve the credential for the target account and role. Or perhaps the aws provider is configured to support assume_role blocks, and chained roles. The AWS credential chain can be very complicated (as can other cloud providers), and simply saying "shell out to an external command" imposes significantly more limited restrictions on that credential chain, as well as imposing weird external requirements for the execution environment.

I get the hesitation and push back on supporting options native to the cloud providers... The Kubernetes provider doesn't want to have to bundle SDKs for every cloud provider and maintain those code paths, and the Terraform community doesn't want to introduce cross-provider dependencies. But the limitation just makes the user experience kinda awful and limits the use cases.

Regardless, that's all off topic. An ephemeral data source for aws_eks_cluster_auth does have value, since it would return a different token at plan time and apply time. That definitely addresses a couple use cases that currently fall over when configuring the Kubernetes provider using aws_eks_cluster_auth.

github-actions bot added service/eks Issues and PRs that pertain to the eks service. needs-triage Waiting for first response or review from a maintainer. labels Nov 28, 2024

johnsonaj added new-ephemeral-resource Introduces a new ephemeral resource. and removed needs-triage Waiting for first response or review from a maintainer. labels Dec 2, 2024

bschaatsbergen linked a pull request Dec 23, 2024 that will close this issue

eks: add ephemeral aws_eks_cluster_auth resource #40660

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[New ephemeral]: aws_eks_cluster_auth should be turned into an ephemeral resource #40343

[New ephemeral]: aws_eks_cluster_auth should be turned into an ephemeral resource #40343

erpel commented Nov 28, 2024

github-actions bot commented Nov 28, 2024

bryantbiggs commented Dec 5, 2024

erpel commented Dec 5, 2024

bryantbiggs commented Dec 5, 2024 •

edited

Loading

TechIsCool commented Dec 6, 2024

bryantbiggs commented Dec 6, 2024

gamunu commented Dec 11, 2024

erpel commented Dec 11, 2024

bschaatsbergen commented Dec 20, 2024 •

edited

Loading

bryantbiggs commented Dec 20, 2024

lorengordon commented Dec 20, 2024 •

edited

Loading

bryantbiggs commented Dec 20, 2024

lorengordon commented Dec 20, 2024 •

edited

Loading

bryantbiggs commented Dec 20, 2024

bryantbiggs commented Dec 20, 2024

lorengordon commented Dec 20, 2024 •

edited

Loading

bryantbiggs commented Dec 21, 2024

lorengordon commented Dec 21, 2024

[New ephemeral]: aws_eks_cluster_auth should be turned into an ephemeral resource #40343

[New ephemeral]: aws_eks_cluster_auth should be turned into an ephemeral resource #40343

Comments

erpel commented Nov 28, 2024

Description

Requested Resource(s) and/or Data Source(s)

Potential Terraform Configuration

References

Would you like to implement a fix?

github-actions bot commented Nov 28, 2024

Community Note

bryantbiggs commented Dec 5, 2024

erpel commented Dec 5, 2024

bryantbiggs commented Dec 5, 2024 • edited Loading

TechIsCool commented Dec 6, 2024

bryantbiggs commented Dec 6, 2024

gamunu commented Dec 11, 2024

erpel commented Dec 11, 2024

bschaatsbergen commented Dec 20, 2024 • edited Loading

bryantbiggs commented Dec 20, 2024

lorengordon commented Dec 20, 2024 • edited Loading

bryantbiggs commented Dec 20, 2024

lorengordon commented Dec 20, 2024 • edited Loading

bryantbiggs commented Dec 20, 2024

bryantbiggs commented Dec 20, 2024

lorengordon commented Dec 20, 2024 • edited Loading

bryantbiggs commented Dec 21, 2024

lorengordon commented Dec 21, 2024

bryantbiggs commented Dec 5, 2024 •

edited

Loading

bschaatsbergen commented Dec 20, 2024 •

edited

Loading

lorengordon commented Dec 20, 2024 •

edited

Loading

lorengordon commented Dec 20, 2024 •

edited

Loading

lorengordon commented Dec 20, 2024 •

edited

Loading