Skip to content

Latest commit

 

History

History
623 lines (518 loc) · 25.1 KB

README.md

File metadata and controls

623 lines (518 loc) · 25.1 KB

David Lin's Cloud Custodian Policies

Policies in Production

Policy Description
offhours.yml
Starts and stops instances during offhours via Lambda function. Instances filtered on presence of maid_offhours tag or StartAfterHours/StopAfterHours custom tags. (See Offhour Examples)
unused-sgroup-audit.yml
Retrieves all unused security groups that match regex, deletes them, then sends notifications.
s3-public-audit.yml
Sends notification when public S3 bucket is created.
copy-instance-tags.yml
Periodically copies tags from EC2 instances to respective EBS volumes.
public-instance-audit.yml
Sends notification when EC2 instance is launched with a Public IP address or attached to a Public subnet.
mfa-audit.yml Sends reminder to Slack channel so users who are in the Administrators group don't forget to enable MFA to comply with business security policies. If MFA remains disabled after 5 days of the user create date, console access is disabled and access keys are deleted.
termination-protection-audit.yml Sends email and Slack notification when EC2 instances in whitelist are found with termination protection disabled.

Policies in Test

Policy Description
new-user-audit.yml Retrieves iam users in specified group with MFA disabled in the last 30 days
termination-protection-list.yml Retrieves list of all EC2 instances with termination protection enabled.
security-groups-unused.yml Retrieves unused security groups using regex
stopped-instances.yml Retrieves list of all stopped instances in specific VPC. Can be further customized to match other criteria.
security-groups-unused-notify.yml Retrieves unused security groups using regex and notifies via email
iam.yml Retrieves iam users using regex
mfa.yml Retrieves iam users with MFA enabled
roles.yml Retrieves unused roles on EC2, Lambda, and ECS
admin-group.yml Retrieves users in the group named 'Administrators'
mfa-unused.yml Retrieves users who have MFA disabled in the group named 'Administrators'
emailer.yml Sends email notification via Simple Email Service (SES) using notify action
ebs-garbage-collection.yml Deletes all unattached volumes
ebs-garbage-collection-lambda.yml Deletes all unattached volumes using Lambda function
public-subnet-instance-audit-notify.yml Sends email notification via SES when EC2 instance launches in a public subnet
public-subnet-instance-audit-whitelist.yml Lambda that sends email notification via SES when EC2 instance launches in a public subnet and is NOT in the whitelist
mark-unused-sgroups.yml Mark unused security groups for deletion after N days ; to be used with delete-marked-sgroups.yml
delete-marked-sgroups.yml Unmarks used security groups that were marked for deletion then deletes remaining marked security
slack-notify.yml Slack example

Cloud Custodian Architecture and AWS Services

Getting Started

Quick Install
*** Install repository***
$ git clone https://github.com/capitalone/cloud-custodian

*** Install dependencies (with virtualenv) ***
$ virtualenv c7n_mailer
$ source c7n_mailer/bin/activate
$ cd cloud-custodian/tools/c7n_mailer
$ sudo pip install -r requirements.txt
$ sudo pip install sendgrid

*** Install extensions ***
$ sudo python setup.py develop

*** Verify Installation ***
$ c7n-mailer
$ custodian

*** Upgrade AWS CLI ***
$ sudo pip install awscli --upgrade

For more info, check out Cloud Custodian in GitHub

Usage

Getting Started
Cloud Custodian must be run within a virtual environment.

$ cd ~
$ virtualenv c7n_mailer/bin/activate
$ cd cloudcustodian  (this is the IE/cloudcustodian repo where all the policies reside)

As a test, try
$ custodian run -s out mfa.yml
$ custodian report -s out mfa.yml --format grid

Cloud Custodian will create a log file in the ~/cloudcustodian/out/ subdirectory IF there are any matches. 

Environment Settings

mailer.yml
# Which queue should we listen to for messages
queue_url: https://sqs.us-east-1.amazonaws.com/1234567890/sandbox

# Default from address
from_address: [email protected]

# Tags that we should look at for address infomation
contact_tags:
  - OwnerContact
  - OwnerEmail
  - SNSTopicARN

# Standard Lambda Function Config
region: us-east-1
role: arn:aws:iam::1234567890:role/CloudCustodianRole
slack_token: xoxb-bot_token_string_goes_here
Cloud Custodian Lambda AWS Role
Note: Based on your use case, additional permissions may be needed. 
Cloud Custodian will generate a msg if that is the case after invocation.

Trust relationship:
"Service": "lambda.amazonaws.com"

General policy permissions:
iam:PassRole
iam:ListAccountAliases
iam:ListUsers
iam:GetCredentialReport
iam:GenerateCredentialReport
ses:SendEmail
ses:SendRawEmail
lambda:CreateFunction
lambda:ListTags
lambda:GetFunction
lambda:AddPermission
lambda:ListFunctions
lambda:UpdateFunctionCode
events:DescribeRule
events:PutRule
events:ListTargetsByRule
events:PutTargets
events:ListTargetsByRule
tag:GetResources
cloudwatch:CreateLogGroup
cloudwatch:CreateLogStream
autoscaling:DescribeLaunchConfigurations
s3:GetBucketLocation
s3:GetBucketTagging
s3:GetBucketPolicy
s3:GetReplicationConfiguration
s3:GetBucketVersioning
s3:GetBucketNotification  
s3:GetLifeCycleConfiguration
s3:ListAllMyBuckets
s3:GetBucketAcl
s3:GetBucketWebsite
s3:GetBucketLogging 
s3:DeleteBucket 
Slack Oauth Permissions for App with Bot User
incoming-webhook
channels:history
channels:read
chat:write:bot
chat:write:user
groups:history
groups:read
im:write
users:read
users:read.email

Schemas Used

security-group
(custodian) [hostname]$ custodian schema security-group
aws.security-group:
  actions: [auto-tag-user, delete, invoke-lambda, mark, mark-for-op, normalize-tag,
    notify, patch, put-metric, remove-permissions, remove-tag, rename-tag, tag, tag-trim,
    unmark, untag]
  filters: [and, default-vpc, diff, egress, event, ingress, json-diff, locked, marked-for-op,
    not, or, stale, tag-count, unused, used, value]
iam-user
(custodian) [hostname]$ custodian schema iam-user
aws.iam-user:
  actions: [delete, invoke-lambda, notify, put-metric, remove-keys]
  filters: [access-key, and, credential, event, group, mfa-device, not, or, policy,
    value]
iam-role
(custodian) [hostname]$ custodian schema iam-role
aws.iam-role:
  actions: [invoke-lambda, notify, put-metric]
  filters: [and, event, has-inline-policy, has-specific-managed-policy, no-specific-managed-policy,
    not, or, unused, used, value]
ec2

(custodian) [hostname]$ custodian schema ec2 aws.ec2: actions: [auto-tag-user, autorecover-alarm, invoke-lambda, mark, mark-for-op, modify-security-groups, normalize-tag, notify, put-metric, reboot, remove-tag, rename-tag, resize, set-instance-profile, snapshot, start, stop, tag, tag-trim, terminate, unmark, untag] filters: [and, default-vpc, ebs, ephemeral, event, health-event, image, image-age, instance-age, instance-uptime, marked-for-op, metrics, network-location, not, offhour, onhour, or, security-group, singleton, state-age, subnet, tag-count, termination-protected, value]

Artifacts

security-groups-unused.yml
(custodian) [hostname]$ custodian run --dryrun -s . security-groups-unused.yml
2018-04-13 20:02:01,043: custodian.policy:INFO policy: security-groups-unused resource:security-group region:us-east-1 count:29 time:0.30

(custodian) [hostname]$ more ./security-groups-unused/resources.json | grep 'GroupName\|GroupId'
(custodian) [hostname]$ more ./security-groups-unused/resources.json | grep GroupName\"\:
    "GroupName": "rds-launch-wizard-5",
    "GroupName": "rds-launch-wizard",
    "GroupName": "rds-launch-wizard-2",
    "GroupName": "launch-wizard-17",
    "GroupName": "launch-wizard-5",
    "GroupName": "launch-wizard-7",
    "GroupName": "launch-wizard-6",
    "GroupName": "launch-wizard-1",
    "GroupName": "rds-launch-wizard-4",
    "GroupName": "launch-wizard-4",
    "GroupName": "launch-wizard-2",
    "GroupName": "launch-wizard-3",
    etc.
iam.yml
(custodian) [ec2-user@ip-10-100-0-195 custodian]$ custodian run --dryrun -s . iam.yml
2018-04-13 22:51:05,472: custodian.policy:INFO policy: iam-user-filter-policy resource:iam-user region:us-east-1 count:1 time:0.01

(custodian) [hostname]$ more ./iam-user-filter-policy/resources.json | grep UserName\"\:
    "UserName": "david.lin",
mfa.yml
(custodian) [hostname]$ custodian run --dryrun mfa.yml -s .
2018-04-13 23:47:40,901: custodian.policy:INFO policy: mfa-user-filter-policy resource:iam-user region:us-east-1 count:15 time:0.01

(custodian) [hostname]$ more ./mfa-user-filter-policy/resources.json | grep UserName\"\:
    "UserName": "username_1",
    "UserName": "username_2,
    "UserName": "username_3",
    "UserName": "username_4",
     etc.
roles.yml
(custodian) [hostname]$ custodian run --dryrun roles.yml -s .
2018-04-14 07:11:22,425: custodian.policy:INFO policy: iam-roles-unused resource:iam-role region:us-east-1 count:55 time:1.92

(custodian) [hostname]$ more ./iam-roles-unused/resources.json | grep RoleName
    "RoleName": "AmazonSageMaker-ExecutionRole-20180412T161207",
    "RoleName": "autotag-AutoTagExecutionRole-KA3LH5ARKJ2E",
    "RoleName": "autotag-AutoTagMasterRole-3VSL2AF3480E",
    "RoleName": "AWS-Cloudera-Infrastructu-ClusterLauncherInstanceR-1HUTDQJUYVGVE",
    etc.
admin-group.yml
(custodian) [hostname]$ custodian run --dryrun admin_group.yml -s .
2018-04-14 07:54:08,198: custodian.policy:INFO policy: iam-users-in-admin-group resource:iam-user region:us-east-1 count:14 time:3.67

(custodian) [hostname]$ more ./iam-users-in-admin-group/resources.json | grep UserName
    "UserName": "username_1",
    "UserName": "username_2",
    "UserName": "username_3",
    "UserName": "username_4",
    etc.
mfa-unused.yml
(custodian) [hostname]$ custodian run --dryrun mfa-unused.yml -s .
2018-04-14 08:13:07,214: custodian.policy:INFO policy: mfa-unused resource:iam-user region:us-east-1 count:2 time:2.54

(custodian) [ec2-user@ip-10-100-0-195 custodian]$ more ./mfa-unused/resources.json | grep UserName
    "UserName": "username_1",
    "UserName": "username_2"
emailer.yml
(custodian) [hostname]$ custodian run -s . emailer.yml
2018-04-23 22:25:12,614: custodian.policy:INFO policy: mfa-unused resource:iam-user region:us-east-1 count:2 time:8.41
2018-04-23 22:25:12,812: custodian.actions:INFO sent message:71ba67dd-731a-4734-bf63-15991754249e policy:mfa-unused template:default.html count:2
2018-04-23 22:25:12,813: custodian.policy:INFO policy: mfa-unused action: notify resources: 2 execution_time: 0.20
public-subnet-instance-audit-notify.yml
(custodian) $ custodian run -s . public-subnet-instance-audit-notify.yml
2018-05-04 01:07:56,937: custodian.policy:INFO Provisioning policy lambda public-subnet-instance-audit-notification

Usage Considerations

Offhour Examples
-------------------------------------------------------
Option 1: Using a Single Tag with key = "maid_offhours"
-------------------------------------------------------
# up mon-fri from 7am-7pm; eastern time
off=(M-F,19);on=(M-F,7);tz=est
# up mon-fri from 6am-9pm; up sun from 10am-6pm; pacific time
off=[(M-F,21),(U,18)];on=[(M-F,6),(U,10)];tz=pt



---------------------------------------------------------------------
Option 2: Using Tags with Names "StartAfterHours" and "StopAfterHours"
---------------------------------------------------------------------
# Using key "StartAfterHours"
# up mon-fri starting 7am; eastern time
on=(M-F,7);tz=est

#Using key "StopAfterHours"
# off mon-fri after 5pm; pacific time
off=(M-F,17);tz=pt



Important Note: When you stop an instance, the data on any instance store volumes is erased. 
                Therefore, if you have any data on instance store volumes that you want to 
                keep, be sure to back it up to persistent storage.

More Examples : http://capitalone.github.io/cloud-custodian/docs/quickstart/offhours.html#offhours
Other Misc Usage Considerations

copy-tag and tag-team policies require addtional enhancements that were added to c7n/tags.py. A modified version that tracks these changes can be found here.

emailer.yml requires the custodian mailer described here.

ebs-garbage-collection.yml can be run across all regions with the --region all option.

For example:

 custodian run --dryrun -s out --region all ebs-garbage-collection.yml
More

offhours.yml is run as a Lambda with CloudWatch periodic scheduler. It filters for EC2 instances tagged with "maid_offhours" and obeys rules set forth in the corresponding value pair per Cloud Custodian Offhours Policy. When specifying on/off/tz values, the values in the policies are overrided by the EC2 instance maid_offhours tag. So you can set the onhour/offhour to anything in the policy and it will not do anything.

emailer.yml requires the custodian mailer described here.

ebs-garbage-collection.yml can be run across all regions with the --region all option.

For example:

 custodian run --dryrun -s out --region all ebs-garbage-collection.yml

Troubleshooting Tips

Use 'custodian validate' to find syntax errors
Check 'name' of policy doesn't contain spaces
Check SQS to see if Custodian payload is entering the queue
Check cloud-custodian-mailer lambda CloudWatch rule schedule (5 minute by default)
Check Lambda error logs (this requires CloudWatch logging)
Check role for lambda(s) have adequate permissions
Remember to update the cloud-custodian-mailer lambda when making changes to a policy that uses notifications
Clear the cache if you encounter errors due to stale information (rm ~/.cache/cloud-custodian.cache)

Log Messages

If you see the following CloudWatch log when sending notifications via Slack, ignore it:

[WARNING]	2018-06-06T23:42:21.321Z	413b5506-69e3-11e8-8a8c-6f167e23dc1a	Error: An error occurred (InvalidCiphertextException) when calling the Decrypt operation: Unable to decrypt slack_token with kms, will assume plaintext.

Canned Code Cheatsheet

Invoking Lambda Funtions
mode:
  type: cloudtrail
  role: arn:aws:iam::929292782238:role/CloudCustodian
  events:
    - CreateBucket
mode:
  type: periodic
  role: arn:aws:iam::929292782238:role/CloudCustodian
  schedule: "rate(15 minutes)"```
mode:
  type: periodic
  schedule: "rate(1 day)"
  role: arn:aws:iam::123456789012:role/lambda-role
  execution-options:
    assume_role: arn:aws:iam::123123123123:role/target-role
    metrics_enabled: false
Sending Notifications via SES and Slack
actions:
 - type: notify
   template: default.html
   slack_template: slack-default
   template_format: 'html'
   priority_header: '5'
   subject: 'Security Audit: Unused Security Groups'
   to:
     - <your-email-address-goes-here>
     - slack://#<slack-channel-name>
   owner_absent_contact:
     - <your-emails-address-goes-here>
   transport:
     type: sqs
     queue: https://sqs.us-east-1.amazonaws.com/1234567890/cloud-cloudcustodian
Filtering with regex and whitelist
filters:
  - not:
    - type: value
      key: "tag:Name"
      value: (MyJenkinsInstance|MyCloudCustodianInstance)
      op: regex
  - and:
    - type: subnet 
      key: "tag:Name"
      value: "david.lin-subnet" 

Updating Latest Merges to Master

From your virtualenv

cd ~/cloud-custodian
git pull
python setup.py install

This will reflect changes in your virtualenv Python lib such that the schema validation uses the latest fixes/updates.

Running Policy as Cron Job

See Example

crontab

  $ crontab -l
  # Run job every day at 5 pm PST.
  # Clean log at 23:00 pm PST every month to save disk space.
  * 17 * * * /home/ubuntu/cloudcustodian/cron/mfa-audit.sh > /home/ubuntu/cloudcustodian/logs/mfa-audit.log 2>&1
  * 23 * 1-12 * /home/ubuntu/cloudcustodian/cron/cleanlogs.sh
  

mfa-audit.sh

  $ pwd
  /home/ubuntu/cloudcustodian-policies/cron
  $ more mfa-audit.sh
  #!/bin/bash
 PATH=/home/ubuntu/bin:/home/ubuntu/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
  export PATH
  source c7n_mailer/bin/activate
  echo "Running policy..."
  c7n-mailer --config /home/ubuntu/cloudcustodian-policies/mailer.yml --update-lambda && custodian run -c /home/ubuntu/cloudcustodian-policies/mfa-audit.yml -s output
  echo "MFA policy run completed"
  

cleanlogs.sh

  $ pwd
  /home/ubuntu/cloudcustodian-policies/cron
  $ more cleanlogs.sh
  #!/bin/bash
      PATH=/home/ubuntu/bin:/home/ubuntu/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
  export PATH
  echo "Cleaning logs ..."
  rm /home/ubuntu/cloudcustodian/logs/mfa-audit.log
  echo "Log files deleted!"
  

Useful Tool: Quick simple editor for cron schedule expressions.

Lambda Policies 101

Lambda policies can get confusing in a hurry. My advice. RTFM before diving into the weeds!

Lambda Policies

Supported Lambda Mode Types:

  • cloudtrail
  • ec2-instance-state
  • periodic
  • config-rule
When using execution-options:
- Metrics are pushed using the assumed role which may or may not be desired. Use 'metrics_enabled: false' to disable if not desired.
- The mode must be periodic as there are restrictions on where policy executions can run according to the module:
  -- Config: May run in a different region but NOT cross-account
  -- Event: Only run in the SAME region and account
  -- Periodic: May run in a different region AND different account (this is the most flexible)

Cross-Account Notes

  • Cross account is supported in the c7n_org tool via the c7n-org CLI command.
  • c7n-org supports multiple regions via the --region option (i.e. --region all).
  • c7n-org support scheduled and event based runs across multiple accounts concurrently.
  • c7n-org can manage policies across different accounts and restrict the execution of policy by tag (and type like "dev" or "prod").
  • c7n_org includes a tool that auto generates the config file c7n-org uses for accounts using the aws organizations API.
  • To run policies across multiple AWS accounts, create roles in the cross-accounts that trust a 'primary/governance' account and from the primary/governance account create an instance profile that has the STS assume role to switch to N other accounts.
  • c7n-org gets credentials from the [default] section of the ~/.aws/credentials and ~/.aws/config files. Support for profile as part of the account config was later introduced in Feb 2018.
    • how about using profiles within the config file?
    • how about using an instance profile if attached to EC2 instance that c7n-org is run on?
    • The cache file can handle multiple regions but you need a separate cache for each account (i.e. --cache /home/custodian/.accountname.cache)
  • Policies can be run locally on EC2 instance or via Lambdas (or containers on k8s/ECS although I haven't tried this)

Cross-Account Questions

  • How are Lambda policies run across accounts?
  • How is Lambda policy sprawl managed across accounts?

General Policy Notes

Cloud Custodian policies can be run

  • serverless as separate Lambdas per account per region
  • as EC2 instance via cron job
  • as EC2 instance via c7n-org
  • as container via ECS Fargate c7n-org
  • Cross account Lambda policies are not supported per Issue #1071 But was recently support per Issue #2533
  • Support for cross-account CloudWatch events is supported per Issue #2005 but requires an AWS CloudWatch footprint in each cross-account which can be stood up using CloudFormation

Resources

Custom msg-templates for c7n_mailer
Slack API and Token
Using ec2-instance-state, lessons around roles, how to view lambda logs, and more
How does garbage collection get enforced?
EC2 Offhours Support
Example offhours support
Lambda Support
AWS CloudWatch Schedule Rules
iam-user feature enhancement
Offhours Examples
CloudWatch Rules Expressions
Adding Custom Fields to Reports