Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move to IAM instance roles, instead of copying local creds to EC2 #59

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ssm1th
Copy link

@ssm1th ssm1th commented Jul 9, 2020

Switch to using IAM instance roles in the interest of security, and to better support federated AWS API access.

Tested with main repo Centos8 branch and AWS CLI installed on mgmt node. Will need further testing and updated to Ansible provisioning scripts.

@milliams
Copy link
Member

milliams commented Jul 9, 2020

This looks good and I agree that using IAM roles is definitely a more secure solution.

However, the reservation I had before and the reason we didn't got his way in the first place is that the management node is also the login node. This means that if the authentication is only on the node-level then that role can be assumed by any Unix user on that machine. While we don't expect any users of these machines to be malicious, it feels like a potential security hole.

Is there a way with IAM instance roles to have a second layer of authorisation such that only, say, root can assume the role but other users can't? I imagine this may be possible with some firewalld rules but is there a more natural way?

@milliams milliams added the enhancement New feature or request label Jul 9, 2020
@ssm1th
Copy link
Author

ssm1th commented Jul 9, 2020

Hey Matt, your point on it being more secure is really the key here.

As it stands:

  • An access key/secret key that is tied to an individual, that likely has elevated IAM privileges associated, is being copied to a machine in EC2. This could in theory be intercepted or read by other elevated users or malicious services on the machine in question, which could in turn cause a whole host of problems - spinning up resources, revoking access, and so on. Or, these hard coded credentials could be moved around or leaked by mistake by the user of the mgmt node. The shared credentials piece is also outlined in AWS documentation: https://docs.aws.amazon.com/IAM/latest/UserGuide/best-practices.html#sharing-credentials

The proposed change:

  • IAM authentication is handled by EC2 instance roles for the mgmt node, meaning hard coded credentials are not required on the EC2 machine itself. The privileges associated with this role are restricted (granting least privilege access), meaning any malicious users/services on the machine will only be able to perform actions outlined in that restricted role should they assume it. As it stands, this could actually probably be further refined by predetermining the process for spinning up worker/compute nodes, meaning the privileges associated with the EC2 instance role could be even more restricted, although I'm sure this would require further workflow changes, it is arguably a good next step for investigation (e.g. the instance role could only change the instance count in a predetermined autoscaling group which has a predetermined max number of instances, instead of now where it can run a generic ec2:RunInstances). The onus in this case is with the user to ensure their SSH keys are secure (as per the current setup), and that they aren't granting access to their mgmt nodes to potentially malicious users/services, which again still applies in the current set up.

There isn't really an 'AWS' method to restrict what can/can't assume a role from an instance, AWS doesn't have that insight, and restricting using methods on the instance itself in the OS could probably be achieved to some degree as you say, but again it is something that could in theory be worked around.

I think the best approach is to use an IAM instance role for the mgmt node, and update the workflow to allow that instance role to be even more prescriptive and restricted.

The final thing you could look in to is using Systems Manager to handle connectivity and for running commands on the mgmt node, instead of standard SSH - but that is a whole separate discussion :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants