Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[IT-3984] Update for Schematic app #4

Merged
merged 11 commits into from
Nov 12, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 0 additions & 4 deletions .github/workflows/aws-deploy.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -25,9 +25,6 @@ on:
required: true
type: string
default: "dev"
secrets-location:
type: string
default: "local"

jobs:
deploy:
Expand Down Expand Up @@ -55,4 +52,3 @@ jobs:
run: cdk deploy --all --concurrency 5 --require-approval never
env:
ENV: ${{ inputs.environment }}
SECRETS: ${{ inputs.secrets-location }}
1 change: 0 additions & 1 deletion .github/workflows/deploy-dev.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,4 +16,3 @@ jobs:
role-to-assume: "arn:aws:iam::631692904429:role/sagebase-github-oidc-sage-bionetworks-schematic-infra"
role-session-name: ${{ github.repository_owner }}-${{ github.event.repository.name }}-${{ github.run_id }}
environment: "dev"
secrets-location: "ssm"
1 change: 0 additions & 1 deletion .github/workflows/deploy-prod.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,4 +16,3 @@ jobs:
role-to-assume: "arn:aws:iam::878654265857:role/sagebase-github-oidc-sage-bionetworks-schematic-infra"
role-session-name: ${{ github.repository_owner }}-${{ github.event.repository.name }}-${{ github.run_id }}
environment: "prod"
secrets-location: "ssm"
1 change: 0 additions & 1 deletion .github/workflows/deploy-stage.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,4 +16,3 @@ jobs:
role-to-assume: "arn:aws:iam::878654265857:role/sagebase-github-oidc-sage-bionetworks-schematic-infra"
role-session-name: ${{ github.repository_owner }}-${{ github.event.repository.name }}-${{ github.run_id }}
environment: "stage"
secrets-location: "ssm"
146 changes: 39 additions & 107 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

AWS CDK app for deploying [Schematic](schematic.api.sagebionetworks.org).

# Perequisites
# Prerequisites

AWS CDK projects require some bootstrapping before synthesis or deployment.
Please review the [bootstapping documentation](https://docs.aws.amazon.com/cdk/v2/guide/getting_started.html#getting_started_bootstrap)
Expand Down Expand Up @@ -83,7 +83,7 @@ execute the validations by running `pre-commit run --all-files`.
Verify CDK to Cloudformation conversion by running [cdk synth]:

```console
cdk synth
ENV=dev cdk synth
```

The Cloudformation output is saved to the `cdk.out` folder
Expand All @@ -99,25 +99,21 @@ python -m pytest tests/ -s -v

# Environments

Deployment context is set in the [cdk.json](cdk.json) file. An `ENV` environment variable must be
set to tell the CDK which environment's variables to use when synthesising or deploying the stacks.
An `ENV` environment variable must be set when running the `cdk` command tell the
CDK which environment's variables to use when synthesising or deploying the stacks.

Set an environment in cdk.json in `context` section of cdk.json:
Set environment variables for each environment in the [app.py](./app.py) file:

```json
"context": {
"dev": {
"VPC_CIDR": "10.254.192.0/24",
"FQDN": "dev.schematic.io"
},
"prod": {
"VPC_CIDR": "10.254.194.0/24",
"FQDN": "prod.schematic.io"
},
}
```python
environment_variables = {
"VPC_CIDR": "10.254.192.0/24",
"FQDN": "dev.schematic.io",
"CERTIFICATE_ARN": "arn:aws:acm:us-east-1:XXXXXXXXXXX:certificate/0e9682f6-3ffa-46fb-9671-b6349f5164d6",
"TAGS": {"CostCenter": "NO PROGRAM / 000000"},
}
```

For example, using the `prod` environment:
For example, synthesis with the `prod` environment variables:

```console
ENV=prod cdk synth
Expand All @@ -127,111 +123,47 @@ ENV=prod cdk synth

Certificates to set up HTTPS connections should be created manually in AWS certificate manager.
This is not automated due to AWS requiring manual verification of the domain ownership.
Once created take the ARN of the certificate and add it to a context in cdk.json.

```json
"context": {
"dev": {
"CERTIFICATE_ARN": "arn:aws:acm:us-east-1:XXXXXXXXX:certificate/76ed5a71-4aa8-4cc1-9db6-aa7a322ec077"
}
}
```
Once created take the ARN of the certificate and set that ARN in environment_variables.

![ACM certificate](docs/acm-certificate.png)

# Secrets

Secrets can be stored in one of the following locations:
* AWS SSM parameter store
* Local context in [cdk.json](cdk.json) file

## Loading directly from cdk.json

Set secrets directly in cdk.json in `context` section of cdk.json:

```text
"context": {
"secrets": {
"MARIADB_PASSWORD": "Dummy",
"MARIADB_ROOT_PASSWORD": "Dummy",
"GIT_HOST_KEY": "Host123",
"GIT_PRIVATE_KEY": "-----BEGIN OPENSSH PRIVATE KEY-----\nDUMMY_GIT_PRIVATE_KEY\n-----END OPENSSH PRIVATE KEY-----",
"AWS_LOADER_S3_ACCESS_KEY_ID": "AccessKey123",
"AWS_LOADER_S3_SECRET_ACCESS_KEY": "SecretAccessKey123",
"SECURITY_KEY": "SecurityKey123"
}
}
```

## Loading from ssm parameter store

Set secrets to the SSM parameter names in `context` section of cdk.json:

```text
"context": {
"secrets": {
"MARIADB_PASSWORD": "/openchallenges/MARIADB_PASSWORD",
"MARIADB_ROOT_PASSWORD": "/openchallenges/MARIADB_ROOT_PASSWORD",
"GIT_HOST_KEY": "/openchallenges/GIT_HOST_KEY",
"GIT_PRIVATE_KEY": "/openchallenges/GIT_PRIVATE_KEY",
"AWS_LOADER_S3_ACCESS_KEY_ID": "/openchallenges/AWS_LOADER_S3_ACCESS_KEY_ID",
"AWS_LOADER_S3_SECRET_ACCESS_KEY": "/openchallenges/AWS_LOADER_S3_SECRET_ACCESS_KEY",
"SECURITY_KEY": "/openchallenges/SECURITY_KEY"
}
}
```
Secrets can be manually created in the
[AWS Secrets Manager](https://docs.aws.amazon.com/secretsmanager/latest/userguide/create_secret.html)

where the values of these KVs (e.g. `/openchallenges/MARIADB_PASSWORD`) refer to SSM parameters that
must be created manually.
To pass secrets to a container set the secrets manager `secret name`
when creating a ServiceProp objects:

![AWS secrets manager](docs/aws-parameter-store.png)

## Specify secret location

Set the `SECRETS` environment variable to specify the location where secrets should be loaded from.

Load secrets directly from cdk.json file:

```console
SECRETS=local cdk synth
```python
app_service_props = ServiceProps(
"app", 443, 1024, f"ghcr.io/sage-bionetworks/app:v1.0", container_env_vars={},
container_secret_name="app/dev/DATABASE"
)
```

Load secrets from AWS SSM parameter store:

```console
AWS_PROFILE=<your-aws-profile> AWS_DEFAULT_REGION=us-east-1 SECRETS=ssm cdk synth
For example, the KVs for `app/dev/DATABASE` could be:
```json
{
"DATABASE_USER": "maria",
"DATABASE_PASSWORD": "password"
}
```

> [!NOTE]
> Setting `SECRETS=ssm` requires access to an AWS account

## Override secrets from command line

The CDK CLI allows overriding context variables:

To load secrets directly from passed in values:

```console
SECRETS=local cdk --context secrets='{"MARIADB_PASSWORD": "Dummy", "MARIADB_ROOT_PASSWORD": "Dummy", ..}' synth
```

To load secrets from SSM parameter store with overridden SSM parameter names:

```console
SECRETS=ssm cdk --context "secrets"='{"MARIADB_PASSWORD": "/test/mariadb-root-pass", "MARIADB_ROOT_PASSWORD": "/test/mariadb-root-pass", ..}' synth
```
> Retrieving secrets requires access to the AWS Secrets Manager

# Deployment

## Bootstrap

There are a few items that need to be manually bootstrapped before deploying the application.

* Add application [secrets](#Secrets) to either the cdk.json or the AWS System Manager parameter store
* Add secrets to the AWS Secrets Manager
* Create an [ACM certificate for the application](#Certificates) using the AWS Certificates Manager
* Add the Certificate ARN to the cdk.json
* Update environment_variables in [app.py](app.py) with variable specific to each environment.
* Update references to the docker images in [app.py](app.py)
(i.e. `ghcr.io/sage-bionetworks/schematic-xxx:<tag>`)
(i.e. `ghcr.io/sage-bionetworks/app-xxx:<tag>`)
* (Optional) Update the `ServiceProps` objects in [app.py](app.py) with parameters specific to
each container.

Expand Down Expand Up @@ -273,7 +205,7 @@ Deployment requires setting up an [AWS profile](https://docs.aws.amazon.com/cli/
then executing the following command:

```console
AWS_PROFILE=itsandbox-dev AWS_DEFAULT_REGION=us-east-1 ENV=dev SECRETS=ssm cdk deploy --all
AWS_PROFILE=itsandbox-dev AWS_DEFAULT_REGION=us-east-1 ENV=dev cdk deploy --all
```

## Force new deployment
Expand All @@ -294,9 +226,9 @@ Example to get an interactive shell run into a container:

```console
AWS_PROFILE=itsandbox-dev AWS_DEFAULT_REGION=us-east-1 aws ecs execute-command \
--cluster SchematicEcs-ClusterEB0386A7-BygXkQgSvdjY \
--cluster AppEcs-ClusterEB0386A7-BygXkQgSvdjY \
--task a2916461f65747f390fd3e29f1b387d8 \
--container schematic-mariadb \
--container app-mariadb \
--command "/bin/sh" --interactive
```

Expand All @@ -310,8 +242,8 @@ The workflow for continuous integration:
* Create PR from the git dev branch
* PR is reviewed and approved
* PR is merged
* CI deploys changes to the dev environment (dev.schematic.io) in the AWS dev account.
* CI deploys changes to the dev environment (dev.app.io) in the AWS dev account.
* Changes are promoted (or merged) to the git stage branch.
* CI deploys changes to the staging environment (stage.schematic.io) in the AWS prod account.
* CI deploys changes to the staging environment (stage.app.io) in the AWS prod account.
* Changes are promoted (or merged) to the git prod branch.
* CI deploys changes to the prod environment (prod.schematic.io) in the AWS prod account.
* CI deploys changes to the prod environment (prod.app.io) in the AWS prod account.
91 changes: 55 additions & 36 deletions app.py
Original file line number Diff line number Diff line change
@@ -1,35 +1,64 @@
import aws_cdk as cdk

from os import environ
from src.network_stack import NetworkStack
from src.ecs_stack import EcsStack
from src.service_stack import LoadBalancedServiceStack
from src.load_balancer_stack import LoadBalancerStack
from src.service_props import ServiceProps
import src.utils as utils

cdk_app = cdk.App()
# get the environment and set environment specific variables
VALID_ENVIRONMENTS = ["dev", "stage", "prod"]
environment = environ.get("ENV")
match environment:
case "prod":
environment_variables = {
"VPC_CIDR": "10.254.194.0/24",
"FQDN": "prod.schematic.io",
"CERTIFICATE_ARN": "arn:aws:acm:us-east-1:878654265857:certificate/d11fba3c-1957-48ba-9be0-8b1f460ee970",
"TAGS": {"CostCenter": "NO PROGRAM / 000000"},
}
case "stage":
environment_variables = {
"VPC_CIDR": "10.254.193.0/24",
"FQDN": "stage.schematic.io",
"CERTIFICATE_ARN": "arn:aws:acm:us-east-1:878654265857:certificate/d11fba3c-1957-48ba-9be0-8b1f460ee970",
"TAGS": {"CostCenter": "NO PROGRAM / 000000"},
}
case "dev":
environment_variables = {
"VPC_CIDR": "10.254.192.0/24",
"FQDN": "dev.schematic.io",
"CERTIFICATE_ARN": "arn:aws:acm:us-east-1:631692904429:certificate/0e9682f6-3ffa-46fb-9671-b6349f5164d6",
"TAGS": {"CostCenter": "NO PROGRAM / 000000"},
}
case _:
valid_envs_str = ",".join(VALID_ENVIRONMENTS)
raise SystemExit(
f"Must set environment variable `ENV` to one of {valid_envs_str}"
)

# get the environment
environment = utils.get_environment()
stack_name_prefix = f"schematic-{environment}"
image_version = "0.0.11"
environment_tags = environment_variables["TAGS"]

# get VARS from cdk.json
env_vars = cdk_app.node.try_get_context(environment)
fully_qualified_domain_name = env_vars["FQDN"]
subdomain, domain = fully_qualified_domain_name.split(".", 1)
vpc_cidr = env_vars["VPC_CIDR"]
certificate_arn = env_vars["CERTIFICATE_ARN"]
# Define stacks
cdk_app = cdk.App()

# get secrets from cdk.json or aws parameter store
secrets = utils.get_secrets(cdk_app)
# recursively apply tags to all stack resources
brucehoff marked this conversation as resolved.
Show resolved Hide resolved
if environment_tags:
for key, value in environment_tags.items():
cdk.Tags.of(cdk_app).add(key, value)

network_stack = NetworkStack(cdk_app, f"{stack_name_prefix}-network", vpc_cidr)
network_stack = NetworkStack(
cdk_app, f"{stack_name_prefix}-network", environment_variables["VPC_CIDR"]
)

ecs_stack = EcsStack(
cdk_app, f"{stack_name_prefix}-ecs", network_stack.vpc, fully_qualified_domain_name
cdk_app,
f"{stack_name_prefix}-ecs",
network_stack.vpc,
environment_variables["FQDN"],
)
ecs_stack.add_dependency(network_stack)

# From AWS docs https://docs.aws.amazon.com/AmazonECS/latest/developerguide/service-connect-concepts-deploy.html
# The public discovery and reachability should be created last by AWS CloudFormation, including the frontend
Expand All @@ -39,36 +68,26 @@
cdk_app, f"{stack_name_prefix}-load-balancer", network_stack.vpc
)

apex_service_props = ServiceProps(
"schematic-apex",
8000,
200,
f"ghcr.io/sage-bionetworks/schematic-apex:{image_version}",
{
"API_DOCS_HOST": "schematic-api-docs",
"API_DOCS_PORT": "8010",
"API_GATEWAY_HOST": "schematic-api-gateway",
"API_GATEWAY_PORT": "8082",
"APP_HOST": "schematic-app",
"APP_PORT": "4200",
"THUMBOR_HOST": "schematic-thumbor",
"THUMBOR_PORT": "8889",
"ZIPKIN_HOST": "schematic-zipkin",
"ZIPKIN_PORT": "9411",
},
app_service_props = ServiceProps(
"schematic-app",
443,
1024,
"ghcr.io/sage-bionetworks/schematic:v0.1.90-beta",
{},
f"{stack_name_prefix}-DockerFargateStack/{environment}/ecs",
)
Comment on lines +71 to 78
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some questions on the current infra deployed compared to this new logic:

  1. How are containers horizontally scaled in this logic? Currently the schematic API containers scale between 3 and 5 instances depending on 50% CPU/memory scaling
  2. Currently schematic is running 3 Tasks, each with 4 vCPU and 8 GB of memory. There are no requests set on the containers. My concern here is that I see the schematic container has a "Memory hard limit" of 1 GB, however, from what we pulled up in cloudwatch - The container was reaching close to 40% (~4GB) of memory regularly. Can the memory limits be bumped up to address for this concern. No concerns on the CPU drop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR already has a lot going on so to make reviewing easier I was planning to address scaling parameters for on a follow on PR. I can add it in this PR if you want though. Anyways we don't need to set to production scale until we deploy to production.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was planning to address scaling parameters for on a follow on PR. I can add it in this PR if you want though.

As long as it's not going to change the structure a lot then I'd be fine with this to be completed in a separate PR. To note: In parallel I am creating a branch off your changes here so I can get the ball rolling on deploying out the Opentelemetry collector. I can handle large structural changes, but try to keep them to a minimum.

zaro0508#1

Anyways we don't need to set to production scale until we deploy to production.

Should the configuration of resources be extracted to environment specific if this is the case? That way we can set dev to be the lower values (For now), prod/stage to the higher values, then if/when we need to we can always copy the values down to dev. But that way - at least, the values are set up ahead of time.

Copy link
Contributor Author

@zaro0508 zaro0508 Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the configuration of resources be extracted to environment specific if this is the case?

the pattern established for the OC assumes that deployments to each environment should be the same because we use the promotion workflow dev->stage->prod. That workflow assumes that we want each environment to be the same so that when we promote the apps/container we get the same result. The Schematic deployment can work differently if you prefer. It's really up to you and what workflow you want to establish.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the pattern established for the OC assumes that deployments to each environment should be the same because we use the promotion workflow dev->stage->prod. That workflow assumes that we want each environment to be the same so that when we promote the apps/container we get the same result.

I am on-board with this pattern.


app_service_stack = LoadBalancedServiceStack(
cdk_app,
f"{stack_name_prefix}-app",
network_stack.vpc,
ecs_stack.cluster,
apex_service_props,
app_service_props,
brucehoff marked this conversation as resolved.
Show resolved Hide resolved
load_balancer_stack.alb,
certificate_arn,
environment_variables["CERTIFICATE_ARN"],
health_check_path="/health",
health_check_interval=5,
)
app_service_stack.add_dependency(load_balancer_stack)

# Generate stacks
cdk_app.synth()
Loading
Loading