Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for EC2 IAM roles in a way that allows us to safely share version history #3023

Open
1 task
vito opened this issue Jan 10, 2019 · 22 comments
Open
1 task
Labels
core/creds enhancement goal A long-term goal identifying outcomes and prerequisites. security

Comments

@vito
Copy link
Member

vito commented Jan 10, 2019

Prerequisites:

What challenge are you facing?

With #2386 we started on a bunch of work to reduce the footprint of resources by sharing check containers and version history globally for equivalent resource definitions.

Resource definitions are considered equivalent if their source: (interpolated with credentials and hashed for safety) and type are equivalent.

However, as was uncovered in #3002, there is one situation where the hashed interpolated source is not enough to determine whether version history should be shared: resources using IAM roles. These resources forego putting credentials in source: in favor of using EC2-configured IAM roles to grant anything that runs on the workers access to the AWS resource automatically.

This has pretty scary implications on version history sharing. Because the source: does not contain the credentials, all it would take is one person that does have access (via their own workers) to successfully check, and then anyone else could configure the same source: and see the same version history without even having to configure the IAM roles.

Thankfully, they at least won't have access to the fetched bits. The get step would have to run on their own workers, which wouldn't have access, and so there would be no cache to re-use and no ability to fetch the bits. However, this is still a dangerous information leak.

What would make this better?

I'm not sure! But the way resources use IAM roles today flies in the face of a couple Concourse anti-patterns:

  • anti/worker-state: because the operator is configuring the workers specifically for particular workloads. This could backfire if they start to use those same workers to run un-trusted code like pull requests. Running on a worker should not automatically grant access to sensitive data!
  • anti/multi-source-of-truth with a hint of anti/contributor-burden: each resource that deals with AWS now needs to support two ways of being configured: IAM roles and static configuration.

Is there some way we can make this relationship with IAM roles more explicit?

For example, instead of configuring the workers, could the operator configure the ATC with named IAM roles and explicitly permit certain teams to access those IAM roles, perhaps leveraging our existing credential manager support?

To be honest, my understanding of IAM roles is fairly loose, but as long as they can be named, we should be able to support this even in a multi-tenant environment, by configuring the ATC with something like this:

concourse web \
  --iam-secrets-team-roles team-name:role1

Then, assuming that the operator has already configured the web EC2 to have those named roles, pipelines belonging to team-name could use this credential manager like so:

resources:
- name: my-bucket
  type: s3
  source:
    access_key_id: ((role1.access_key_id))
    secret_access_key: ((role1.secret_access_key))

This credential manager would interpolate by fetching from the named role.

Assuming this works (big assumption as I have no experience with IAM roles), this would fix both anti-patterns:

  • no more anti/worker-state because all configuration is centralized to the web node and explicitly managed by the ATC.
  • no more anti/multi-source-of-truth because the resource only has to care about statically configured credentials, just as it does today.

One obvious downside is that Concourse currently only supports configuring one credential manager at a time. But that is probably something to discuss in concourse/rfcs#5.

@vito
Copy link
Member Author

vito commented Jan 10, 2019

@jduv Hey! I'd be especially interested in your thoughts on this as you submitted both concourse/semver-resource#85 and concourse/s3-resource#115 for enabling IAM roles in two of our resource types (thanks! 🙂). Unfortunately I'm not comfortable with merging either until we have a plan for the security challenge described in this issue.

Supporting IAM roles directly from resources has been a controversial topic in the past, but I understand that there's a big need for it, so it'd be great if we could find common ground (either with this proposal, if it even works, or some other approach).

@tlwr I noticed in #2951 that you use team-scoped workers so that you can safely use IAM roles, so this may be of interest. Would the proposed solution make sense for you?

@vito vito removed the triage label Jan 10, 2019
@vito vito changed the title Explicit support for things like IAM roles in a way that allows us to safely share version history Support for IAM roles in a way that allows us to safely share version history Jan 10, 2019
@vito vito pinned this issue Jan 10, 2019
@vito
Copy link
Member Author

vito commented Jan 10, 2019

Looks like EC2 only allows one IAM role per instance, which kind of ruins this idea as you would then only be able to have one role for all teams. At least with per-team workers with their own IAM roles, you could have one per team, so this isn't as flexible. Hmm...

@vito
Copy link
Member Author

vito commented Jan 10, 2019

Maybe what we're really missing here is a credential manager for generating these temporary credentials based on configured policies. Similar to what could be done with Vault's AWS backend: https://www.vaultproject.io/docs/secrets/aws/index.html ... only without requiring them to deploy and maintain a Vault instance.

@tlwr
Copy link
Contributor

tlwr commented Jan 10, 2019

Background

Our relationship with concourse has its genesis from GOV.UK PaaS which is a Cloud Foundry deployment for UK government services, it is specifically Cloud Foundry running on AWS. Our multitenant concourse is separate from the PaaS project and is internal to our org.

As such we recommend to our users that they should use forked resources, for instance:

These forks have support for using IAM instance profiles.


Sharing version history

I don't think I have enough context to appreciate why sharing version history is useful (other than general efficiency). Although I can imagine it would be useful for Wings or other large Concourse deployments where the teams are less isolated from each other. For our use-case we will not have a worker which is not allocated to a team.


Antipatterns

anti/worker-state - I would argue that this isn't worker state as such, but team-state or team-isolation. Workers aren't really stateful when using IAM instance profiles, it is more about worker-authorization which is desirably in a multi-tenant environment where workers are isolated from each other using teams and separate VMs.

As an org we have guidance that is we should not have things that run untrusted code (i.e. not on master or production branches) on the same VMs as trusted code. We use concourse-teams more like roles, so teamA-deploy takes code from master and deploys through environments all the way to production. teamA-build is a separate team with separate VMs and a separate IAM instance profile.

Doing something like this:

concourse web \
  --iam-secrets-team-roles team-name:role1

and then

resources:
- name: my-bucket
  type: s3
  source:
    access_key_id: ((role1.access_key_id))
    secret_access_key: ((role1.secret_access_key))

as you suggested would work. ATC can act as some form of IAM proxy but it would have to be implemented through the EC2 node (or ECS task) assuming roles; i.e.

[atc] team-a-s3-resource needs to be checked
[atc] looks up role_name for team-a => $team_a_role_name
[atc] makes call to AWS STS assume-role $team_a_role_name (returns key_id, secret_key, session_token)
[atc] schedules check on available worker using sts assume role credentials

The impact of this is that the node running ATC needs to be able to assume all of the tenant IAM roles, although this is no big deal.

The resource also has to include the session token:

resources:
- name: my-bucket
  type: s3
  source:
    access_key_id: ((aws_sts.access_key_id))
    secret_access_key: ((aws_sts.secret_access_key))
    session_token: ((aws_sts.session_token))

Contributor burden

The above suggestion is perhaps too AWS specific but would make it easier for people writing resources and does solve the problem at scale.

If this is a feature/pattern/mechanism that would be appreciated I would be happy to contribute towards it, although my knowledge and context is mainly AWS specific and I am not that up to speed with the codebase other than the minimum required to write alphagov/terraform-provider-concourse.

Obviously it is onerous for resources to have to implement the wheel for each cloud provider. I would suggest tenant-burden as another anti-pattern of which to be cognisant.

Having these credentials injected (as you described - with my modification to include the session token) would make it easy to write resources which interact with AWS services, but also for tenants writing pipelines.

Being able to automatically generate a set of STS credentials within a pipeline (that works for each cloud provider) would be quite helpful within tasks as well as resources.

@analytically
Copy link

@vito K8s faced a similar issue, check out https://github.com/jtblin/kube2iam#readme

@analytically
Copy link

Also check out https://github.com/uswitch/kiam

@vito
Copy link
Member Author

vito commented Jan 12, 2019

@tlwr Thanks! That makes sense - so it sounds like this would be an STS credential manager backend of some sort.

@analytically Thanks for the pointers! I think we might be able to just prevent resource containers from reaching any local network addresses once we have a solution here and have resources consistently just using their JSON input as the source of truth. That would be a separate challenge to tackle later though since I'm sure there are resources which require it at the moment.

@tlwr
Copy link
Contributor

tlwr commented Jan 13, 2019

It would be nice to segregate the IAM role of the worker instance from the IAM permissions which the containers run in. This would, for people running concourse with VM multitenancy, allow the convenience of IAM roles instead of having to do secrets management manually.

I.e. when configuring concourse you set (as above) --team-iam-role team-a:team-a-iam-role-name or similar. Following AWS convention, when scheduling a resource/task then the scheduler would have to generate a set of STS credentials and pass them in using the environment variables AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN.

This approach does bring with it the can-of-worms of STS token durations - i.e. how long to generate a token for, and how to configure this - if possible?

As Mathias mentioned above this is basically kube2iam but for Concourse.


Would it sensible to split this out into two separate issues:

  • Segregate tenants to separate VM/workers (it sounds like 5.0 release perhaps regresses existing functionality - although I'm not sure)
  • Support for STS and IAM roles when starting containers

@vito
Copy link
Member Author

vito commented Jan 15, 2019

@tlwr A kube2iam style solution won't solve the problem of the source: not containing credentials, and I'm still wary of having resources configured in two different ways. To be honest I'm not wild about the idea as it sounds like a lot of AWS-specific work to be implemented with a lot of moving parts. 🙁

So far I'm still kinda sold on the idea of an STS credential manager backend. It's a bit of work upfront but it doesn't feel like that much more work compared to configuring team workers with IAM roles.

  • it sounds like 5.0 release perhaps regresses existing functionality - although I'm not sure

Not sure what you mean - #2951 will be part of 5.0 and should preserve existing functionality.

@tlwr
Copy link
Contributor

tlwr commented Jan 15, 2019

Good to know about #2951 I wasn't sure if it was going to be part of 5.0, but now double checking it is part of the 5.0 milestone 😅

STS credential manager backend would be excellent, not sure what the API would be like on the pipeline level (I'm not sure about other cloud providers and am unsure if they have a similar API to AWS STS) but it would make writing resources much easier.

Also having STS credentials with the ((aws.access_key_id)) (or whatever it ends up being) means that there is a lot less magic going on which is good, it makes it easier to understand what is going on.

@vito
Copy link
Member Author

vito commented Jan 15, 2019

Good to know about #2951 I wasn't sure if it was going to be part of 5.0, but now double checking it is part of the 5.0 milestone 😅

Phew, just checking as that's the only one that sprung to mind. 🙂

STS credential manager backend would be excellent, not sure what the API would be like on the pipeline level (I'm not sure about other cloud providers and am unsure if they have a similar API to AWS STS) but it would make writing resources much easier.

Hmm, I suppose the simplest thing would be ((role_name.access_key_id)). One 'pro' is it leaves the satisfying of the var flexible, since it doesn't explicitly mention STS - there could just as easily be a role_name provided by Vault or local vars. As for which team can access which role, I figure that can just be statically configured on boot-up as part of the STS credential manager configuration (similar to the flags in the initial issue body).

Now that I think of it, though, this would at minimum require Concourse only fetches the role_name credential once per source. Take the following example:

source:
  access_key_id: ((role_name.access_key_id))
  secret_access_key: ((role_name.secret_access_key))
  session_token: ((role_name.session_token))

As of today, that will actually fetch role_name 3 times, thus ending up with mismatched access keys, secret keys, and session tokens. So, totally useless, heh. This is something we planned to do but never got around to doing (and apparently never wrote an issue for). Vault has a need for this, too, not just STS.

Also having STS credentials with the ((aws.access_key_id)) (or whatever it ends up being) means that there is a lot less magic going on which is good, it makes it easier to understand what is going on.

Agreed! 🙌

@linusguan
Copy link

Just my two cents about supporting IAM Role:

This feature request is not about couple services into any vendor's infrastructure, it is about following security best practice when you do support a new resource no matter if it is from Cloud Foundry or AWS or others.

Concourse's Credential management backed by Amazon SSM or Amazon Secrets Manager has implemented support for IAM role and I think the default behavior of using IAM role if no access keys is set is great.


A quick intro for anyone that is not familiar with AWS security. Using access key and secrets for a server to access AWS resources is a huge red flag when you are being reviewed by cyber security. The reason is that access key and secret pair is designed for human users for making calls using their local machine not for production servers.
For servers on premise or on AWS, AWS Security Token Service should be used and IAM Role(Instance Profile is a container for an IAM Role to be assigned to a EC2 VM) is an incarnation of that. IAM Role increases security by passing you a temporary credential for each api call and it will rotate the temporary credential from every 1 hour to every 12 hours (configurable).
AWS provide sdk to handle this so you don't have to interact with low level apis with STS.

@tlwr
Copy link
Contributor

tlwr commented Jan 19, 2019

Yeah there are a few tensions:

  • Concourse workers ideally having no state whatsoever - simplifies the scheduling

  • Security best practice

  • Trying to avoid resources having to re-implement credential handling

Although

source:
  access_key_id: ((syntax_for_fetching_from_dynamic_cred_handler)
  ...etc

seems like a good solution to all three, at the expense of some Cloud provider specific implementations within Concourse

The easiest implementation of above would probably use the IAM role of the web EC2 node, which would assume a role for each team, so there aren't long-lived credentials being passed around.

@eedwards-sk
Copy link

The reason is that access key and secret pair is designed for human users for making calls using their local machine not for production servers.
For servers on premise or on AWS, AWS Security Token Service should be used and IAM Role(Instance Profile is a container for an IAM Role to be assigned to a EC2 VM) is an incarnation of that. IAM Role increases security by passing you a temporary credential for each api call and it will rotate the temporary credential from every 1 hour to every 12 hours (configurable).
AWS provide sdk to handle this so you don't have to interact with low level apis with STS.

I don't believe this is entirely correct. The credentials you get from the host is really just a call to the special local metadata service (169.etc.etc) which results in an aws access key id and secret access key. So machines absolutely should and do use aws AKID/SAK in production.

I agree though that when possible, they should be temporary and come from a source that can rotate them.

As for which team can access which role, I figure that can just be statically configured on boot-up as part of the STS credential manager configuration (similar to the flags in the initial issue body).

Right now you can somewhat dynamically control which teams can access which vault params through vault itself. Once the credential management setup is done, I control team param access by storing it in appropriate paths, e.g.:

/concourse/TEAM_NAME/foo_param

which, while not ideal because I have to copy the same secret to multiple paths if I want that same secret to be shared by some teams but not others...

does still mean I can change which teams can use which params purely through vault configuration and without any configuration of the web server

If you have mapping of roles to teams in the static config, now you have to update its config and restart the web server to change that, which seems like a regression in terms of how I would want to use a credential manager.


While people are definitely using IAM roles on the host today to break some concourse best practices, alignment on the right way to get akid/sak into resources/tasks would be cool.

If the web server is going to possibly talk to STS or grab credentials from roles to pass to workers, it's going to need the credentials to do so (a role which has a bunch of AssumeRole permissions).

Ideally web itself would support getting its credential from a host role, so that I don't have to configure a static credential in the web process' environment.

Similar to how web supports approle with vault today, ideally concourse will manage its authentication with the cred provider (vault/sts) by rotating its underlying credential automatically.

@vito
Copy link
Member Author

vito commented Mar 15, 2019

For those following along at home, I've proposed concourse/rfcs#21 which hopes to lay the groundwork for this, but I do need to get more hands-on with IAM/STS in particular and try and crank out an example to see if team vs. global configuration is something that works well, particular in regard to EC2 IAM roles.

@itsdalmo
Copy link
Contributor

itsdalmo commented Mar 22, 2019

For our Concourse we use a lambda function to dump temporary STS credentials for each team into Secrets Manager. Works nicely for STS credentials, but it would of course be better if this was baked into Concourse 😄

One thing I'm a bit wary of is adding more privileges to the ATC's instance profile, since the same process/instance is also hosting the web interface and is exposed to public/end-user traffic. Would it be more secure if the ATC (scheduler, secrets managers etc) was running on a separate instance from the web interface?

@martin82
Copy link

The reason is that access key and secret pair is designed for human users for making calls using their local machine not for production servers.
For servers on premise or on AWS, AWS Security Token Service should be used and IAM Role(Instance Profile is a container for an IAM Role to be assigned to a EC2 VM) is an incarnation of that. IAM Role increases security by passing you a temporary credential for each api call and it will rotate the temporary credential from every 1 hour to every 12 hours (configurable).
AWS provide sdk to handle this so you don't have to interact with low level apis with STS.

I don't believe this is entirely correct. The credentials you get from the host is really just a call to the special local metadata service (169.etc.etc) which results in an aws access key id and secret access key. So machines absolutely should and do use aws AKID/SAK in production.

I agree though that when possible, they should be temporary and come from a source that can rotate them.

I think credentials you get from the host through the local metadata service are temporary and automatically rotated. See
Retrieving Security Credentials from Instance Metadata

@vito vito mentioned this issue Jan 7, 2020
10 tasks
@vito vito added the goal A long-term goal identifying outcomes and prerequisites. label Feb 24, 2020
@vito vito changed the title Support for IAM roles in a way that allows us to safely share version history Support for EC2 IAM roles in a way that allows us to safely share version history Feb 24, 2020
@vito
Copy link
Member Author

vito commented Mar 25, 2020

Here are my current thoughts on this:

If we allow var_sources to be implemented by Prototypes (concourse/rfcs#37), we could implement an IAM var source prototype which acquires AWS credentials through the worker's EC2 IAM role:

prototypes:
# pull in prototype which supports `read` action
- name: ec2-iam-role
  type: registry-image
  source:
    repository: some-generous-soul/ec2-iam-role-prototype

var_sources:
# define a var source which will run on workers which have EC2 IAM role access
- name: worker-iam
  type: ec2-iam-role
  # tags: [...] if needed

resources:
- name: some-artifact
  type: s3
  source:
    bucket: some-artifacts
    regexp: artifact-(.*).tgz
    access_key_id: ((worker-iam:s3access.access_key_id)) # reads credentials using worker-iam var source
    secret_access_key: ((worker-iam:s3access.secret_access_key))
    session_token: ((worker-iam:s3access.session_token))

This fixes the biggest problem with the proposal as it was before, because now the EC2 IAM role configuration is set up on the workers - which can be per-team - rather than globally on the web node, which affects the entire cluster.

This idea would require us to have a secure method for Prototypes to return credentials to Concourse - so we may have to amend the RFC to not use files on disk, and instead use a secured protocol.

@vito
Copy link
Member Author

vito commented Jun 9, 2020

Follow-up: a few days ago I pushed a revision to the Prototypes interface that - I hope - will allow it to be safely used for credential management. The idea is to just encrypt sensitive information in the response using a key provided in the request.

The Encryption section goes over the mechanics.

Looking forward to feedback on this approach! If it sounds good I think we're pretty much unblocked, and just need to get all of these RFCs merged. 🙂

@djoyahoy
Copy link

djoyahoy commented Aug 6, 2021

Hi all. I'm wondering how IAM Roles for Service Accounts might fit into this discussion: https://aws.amazon.com/blogs/opensource/introducing-fine-grained-iam-roles-service-accounts/?

Curious if you all have any thoughts.

Thats currently how we have AWS configured to talk to the credential manager (AWS Secrets Manager).

surminus added a commit to ably-forks/registry-image-resource that referenced this issue Jul 21, 2022
Concourse do not allow using instance IAM roles because it conflicts
with their multi-tenancy designs[1]. Multiple "teams" can use the same
instance, and using IAM instance roles means that this could be
considered insecure.

This is not applicable to our use case, so we are fine to use it. This
removes the requirement to pass in access keys and just assume we want
to use instance roles instead.

[1] concourse/concourse#3023
@ChrisJBurns
Copy link

I too am wondering about the IRSA integration. I have raised it here:#8716 but it would be great to get this as a feature.

@jduv
Copy link

jduv commented Jul 20, 2023

I'm not sure if this helps at all (I haven't dug into the complete thread above), but we routinely assume other roles in containers to do work like taking DB snapshots in RDS pre-migration. It's done via the CLI as so:

      temp_role=$(aws sts assume-role \
                          --role-arn "arn:aws:iam::$DEPLOY_TO_ACCOUNT:role/<role_name>" \
                          --role-session-name "<session name>")
      export AWS_ACCESS_KEY_ID=$(echo $temp_role | jq .Credentials.AccessKeyId | xargs)
      export AWS_SECRET_ACCESS_KEY=$(echo $temp_role | jq .Credentials.SecretAccessKey | xargs)
      export AWS_SESSION_TOKEN=$(echo $temp_role | jq .Credentials.SessionToken | xargs)

As such, I'm sure there are APIs that would allow you to do this in code and utilize the AWS auth toolchain. We do this for other AWS accounts and our own accounts. Perhaps you pass the role you want to use to the resource and it's up to the caller to ensure it's well formed?

I'm also very versed in IAM roles and temporary security tokens in AWS, so happy to provide any context I can.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core/creds enhancement goal A long-term goal identifying outcomes and prerequisites. security
Projects
None yet
Development

No branches or pull requests

10 participants