Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vention app v0.0.1 #1

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
# Local .terraform directories
**/.terraform/*
**/.terraform.lock.hcl

# .tfstate files
*.tfstate
Expand All @@ -13,8 +14,8 @@ crash.*.log
# password, private keys, and other secrets. These should not be part of version
# control as they are data points which are potentially sensitive and subject
# to change depending on the environment.
*.tfvars
*.tfvars.json
#*.tfvars
#*.tfvars.json

# Ignore override files as they are usually used to override resources locally and so
# are not checked in
Expand Down
60 changes: 60 additions & 0 deletions OBSERVABILITY.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# Observability Guide

Observability is key for diagnosing and improving application performance. This guide assumes familiarity with DevOps practices and focuses on tooling and metrics for enhanced observability.

## Observability Objectives

- **Visibility:** Access to real-time internal application states.
- **Traceability:** Tracing transactions across components to identify issues.
- **Debuggability:** Quick issue identification and resolution.
- **Performance Monitoring:** Ensuring application meets performance benchmarks.

## Tools and Practices

- **Logging:** Essential for historical analysis and debugging. **Recommendation:** ELK Stack for aggregation and visualization.

- **Monitoring:** Critical for real-time health checks. **Recommendation:** Use Prometheus for metrics collection and Grafana for dashboarding.

- **Traceability:** Key for microservices architecture. **Recommendation:** Zipkin for distributed tracing.

- **Alerting:** Necessary for proactive issue management. **Integration:** Utilize Prometheus and Grafana's alerting capabilities.

## Key Performance Indicators

- **Latency:** Time for request processing.
- **Throughput:** Request processing capacity.
- **Error Rate:** Percentage of failed requests.
- **Resource Utilization:** CPU, memory, and disk metrics.
- **User Satisfaction:** Apdex score for user experience.

Leveraging ELK Stack, Prometheus, Grafana, and Zipkin enables comprehensive observability, essential for maintaining and optimizing application performance.

## Distributed Tracing with Zipkin

Distributed tracing, particularly with Zipkin, offers detailed insights into request flows across microservices, aiding in pinpointing latency and failures.

### Integration Steps

- **Instrumentation:** Add tracing data collection to application services, either manually or via OpenTelemetry-compatible libraries.
- **Propagation:** Ensure trace IDs are forwarded across service calls.
- **Collection & Storage:** Aggregate tracing data in Zipkin for analysis.
- **Analysis & Visualization:** Use Zipkin UI for trace data insights, focusing on request paths and operation durations.

### Proposed stack

A good obserbavility stack should incorporate the following components:

- **Zipkin**
- **ELK**
- **Prometheus**
- **Graphana**

Seeing as this would be implemented on AWS, going into more specifics probably using native AWS tools would provide less friction. So here is a sugestion for replacing the previous stack with AWS Services:

- **Amazon ES**: Provides funtionality equivalent to Elasticsearch and Kibana
- **Amazon Kinesis Data Firehose**: this would replace the logstash functionality
- **Cloudwatch**: provides similar functionality to Prometheus and Graphana combined
#### alternatively:
- **Amazon Managed Prometheus**
- **Amazon Managed Graphana**
- **AWS X-Ray**: Equivalent to Zipkin
118 changes: 118 additions & 0 deletions TO-DO.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
# IaC - NOTES
<!--
In the project, you will see a `terraform` folder. We want you to create the needed open tofu
code to create the necessary infrastructure to host a small application. The requirements to run
this application are as follows:

- Must run in AWS
- Must have an ECR instance with a docker repository for our `demo_app`
- Must run in an ECS Fargate environment
- The application needs to be reachable from the outside world
- The application must have an auto-scalable policy
- Need to revert to the previous version if the latest version fails to deploy properly
- Must have an aurora-pgsql database
- The database should not be accessible from the outside world
- A list of Vention members must have read access to the AWS project, except IAM and
billing policies. Vention members should only be allowed to view things, not modify or
add new AWS services.
- [email protected]
- [email protected]
-->

Currently, this project is able to deploy to AWS, the basic parts. -- please view the README inside the terraform/stack directory for ussage.

ECR is succesfully provisioned with terrafom
**TODO**: Review the policy to make sure it actually ensures that the repository is indeed restricted to the account.
**TODO**: ECS needs a cluster, farcgate uses ECS tasks since it is serverless. still pending
**TODO**: ECS Fargate instances need to be placed behind a loadbalancer in the public subnets.
**TODO**: Investigate how can we automate reverting previos version of the application.
THOUGTS: wouldnt this be a very edge case? the CI/CD pipeline and the review environment should ensure no bad version makes it to production.
**TODO**: Finish creating module to properly provision postgreSQL aurora. Currently is half-baked.
**TODO**: Investigate how to automate creating users since terraform and aws does not allow for automatic user registration. Possible path, create a lambda funtion to automate that operation seeing that terraform is unable to send the email with the credentials ( it wont trigger aws email sending )
**TODO**: Test policies to restrict users are actually working.
**TODO**: Change state to be stored on a centralized registry ( S3, Vault, etc )

# CI/CD
<!--
In the project, you will see a folder demo_app. Your task here is to create a CI/CD pipeline to test our demo_app, create a docker image and push that image to an AWS ECR repository.

This is left intentionally vague, so you must understand what is in this demo_app and what is required to have the most efficient CI/CD pipeline.

- Create a feature branch workflow
- Each commit to the feature branch must run the CI pipeline
- Each commit to the feature branch should create a review environment if the CI pipeline succeeds
- Each merge to main should run the CI pipeline and CD pipeline
- CD pipeline should generate a docker image with the latest tag and an associated version tag
-->

**TODO**: Dockerize the demo_app
**TODO**: Write all github actions for both application and terraform code.
## BASIC OUTLINE

Here's a brief overview of the tasks needed to set up a CI/CD pipeline using GitHub Actions:

1. **Create a feature branch workflow**: This involves setting up a Git workflow where each new feature or bug fix is developed on a separate branch. This can be enforced through a combination of team discipline and GitHub settings (like branch protection rules).

2. **Set up the CI pipeline**: Create a new GitHub Actions workflow file (e.g., `.github/workflows/ci.yml`). This workflow should be triggered on each push to any feature branch. The workflow should include steps to check out the code, set up the necessary environment (like Node.js or Python), and run your tests.

3. **Create a review environment on successful CI**: This could be a new step in your CI workflow that is only run if the previous steps (your tests) succeed. The specifics of this step depend on your application and where you want to host these review environments. For example, you could deploy the application to a platform like Heroku or AWS, or even spin up a new Docker container. In Vention particular case, probably does not make sense to provision and tear-down a review environment each single time. We could have the actions here deploy to the existing review environment.

4. **Set up the CD pipeline**: Create another GitHub Actions workflow file (e.g., `.github/workflows/cd.yml`). This workflow should be triggered on each push to the main branch. The workflow should include steps to check out the code, build a Docker image, and push it to a Docker registry. You should tag this image with both `latest` and the current version of your application. I have reservations regarding the latest tag as this usually brings issues. I would feel more confortable setting the repository to IMMUTABLE and ensuring the deployments are done for specific tags. Tags should never be overwritten. having the latest tag forces us to run the repository as MUTABLE.

5. **Versioning**: Implement a system for versioning your application. This could be as simple as manually updating a version number in your package.json file, or you could use a tool like `npm version` or `semantic-release` to automate this process.

6. **Documentation**: Document the workflow and how to use it in your project's README file. This should include instructions on how to create a feature branch, how to trigger the CI pipeline, and what to expect when code is merged into the main branch.

The specifics of these tasks will depend on the programming language and framework you're using, we probably will need different tasks for terraform and demo_app ( `.github/workflows/terraform_cd.yml` and `.github/workflows/app_cd.yml` ).

Each of these steps need to be detailed to make sense. They are very high-level and need to be explored in more detail. As an example a more detailed plan for setting up the Continuous Integration (CI) pipeline for a Node.js application using GitHub Actions would follow (in general terms) the following plan:
**NOTE**: Please excuse the code examples as they where taken or adapted from the documentation. they might be complete nonsense as i have not worked with github actions before

1. **Create a new workflow file**: In your repository, create a new file at `.github/workflows/ci.yml`.

2. **Define the trigger for the workflow**: At the top of the `ci.yml` file, specify that this workflow should run on each push to any branch except the main branch. This can be done with the `on` keyword:

```yaml
on:
push:
branches-ignore:
- 'main'
```

3. **Define the jobs for the workflow**: Under the `jobs` keyword, define the steps that make up your CI pipeline. For a Node.js application, this might include:

- **Checkout the code**: Use the `actions/checkout@v2` action to checkout your code onto the runner.

- **Set up Node.js**: Use the `actions/setup-node@v2` action to set up the specified version of Node.js on the runner.

- **Install dependencies**: Run `npm ci` to install your project's dependencies.

- **Run tests**: Run `npm test` to execute your project's test suite.

Here's what these steps might look like in the `ci.yml` file:

```yaml
jobs:
build:
runs-on: ubuntu-latest

steps:
- name: Checkout code
uses: actions/checkout@v2

- name: Use Node.js
uses: actions/setup-node@v2
with:
node-version: '14'

- name: Install dependencies
run: npm ci

- name: Run tests
run: npm test
```

4. **Handle test failures**: By default, if any step in a GitHub Actions job fails, the job is stopped and marked as failed. This is usually what you want for a CI pipeline - if the tests fail, you want to know about it!

5. **Create a review environment on successful CI**: This could be a new step in your CI workflow that is only run if the previous steps (your tests) succeed.

9 changes: 9 additions & 0 deletions terraform/env/dev/dev.auto.tfvars
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
project_name = "vention"
environment = "dev"
region = "us-east-1"
vpc_cidr = "10.0.0.0/16"

tags = {
Terraform = "true"
Environment = "dev"
}
Empty file.
Empty file.
54 changes: 54 additions & 0 deletions terraform/module/ecr/main.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
data "aws_caller_identity" "current" {}

resource "aws_ecr_repository" "main" {
name = "${var.project_name}-${var.environment}"

image_tag_mutability = "MUTABLE"

image_scanning_configuration {
scan_on_push = false
}


tags = merge(
{
Name = "${var.project_name}-${var.environment}"
},
var.tags
)
}



data "aws_iam_policy_document" "vention-ecr-policy" {
statement {
effect = "Allow"

principals {
type = "AWS"
identifiers = [data.aws_caller_identity.current.account_id]
}

actions = [
"ecr:GetDownloadUrlForLayer",
"ecr:BatchGetImage",
"ecr:BatchCheckLayerAvailability",
"ecr:PutImage",
"ecr:InitiateLayerUpload",
"ecr:UploadLayerPart",
"ecr:CompleteLayerUpload",
"ecr:DescribeRepositories",
"ecr:GetRepositoryPolicy",
"ecr:ListImages",
"ecr:DeleteRepository",
"ecr:BatchDeleteImage",
"ecr:SetRepositoryPolicy",
"ecr:DeleteRepositoryPolicy",
]
}
}

resource "aws_ecr_repository_policy" "example" {
repository = aws_ecr_repository.main.name
policy = data.aws_iam_policy_document.vention-ecr-policy.json
}
3 changes: 3 additions & 0 deletions terraform/module/ecr/outputs.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
output "repository_url" {
value = aws_ecr_repository.main.repository_url
}
17 changes: 17 additions & 0 deletions terraform/module/ecr/variables.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
variable "project_name" {
type = string
default = "vention"
}

variable "environment" {
type = string
default = "dev"
}

variable "tags" {
type = map(any)
default = {
Terraform = ""
Environment = ""
}
}
40 changes: 40 additions & 0 deletions terraform/module/pgsql/main.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@


# Create a security group for the Aurora cluster
resource "aws_security_group" "aurora_sg" {
vpc_id = aws_vpc.my_vpc.id

ingress {
from_port = 5432
to_port = 5432
protocol = "tcp"
cidr_blocks = ["10.0.0.0/16"]
}

egress {
from_port = 5432
to_port = 5432
protocol = "tcp"
cidr_blocks = ["10.0.0.0/16"]
}
}

# Create an Aurora cluster
resource "aws_rds_cluster" "aurora_cluster" {
storage_encrypted = true
cluster_identifier = "my-aurora-cluster"
engine = "aurora-postgresql"
engine_version = "11.9"
database_name = "my_database"
master_username = "admin"
master_password = "password"
backup_retention_period = 7
vpc_security_group_ids = [aws_security_group.aurora_sg.id]
db_subnet_group_name = aws_db_subnet_group.aurora_subnet_group.name
}

# Create a subnet group for the Aurora cluster
resource "aws_db_subnet_group" "aurora_subnet_group" {
name = "aurora-subnet-group"
subnet_ids = [aws_subnet.private_subnet_1.id, aws_subnet.private_subnet_2.id]
}
Empty file.
17 changes: 17 additions & 0 deletions terraform/module/pgsql/variables.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
variable "project_name" {
type = string
default = "vention"
}

variable "environment" {
type = string
default = "dev"
}

variable "tags" {
type = map(any)
default = {
Terraform = ""
Environment = ""
}
}
Loading