-
Notifications
You must be signed in to change notification settings - Fork 114
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
chaos engineering section in user guides: FIS experiment for devs; FI…
…S experiment for architects; outages for infra team (WIP); FIS on webapp template (pending UI PR)
- Loading branch information
Showing
9 changed files
with
655 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
--- | ||
title: "Chaos Engineering" | ||
linkTitle: "Chaos Engineering" | ||
weight: 11 | ||
description: > | ||
Chaos Engineering with LocalStack enables you to build resilient systems early on in the development phase. | ||
cascade: | ||
type: docs | ||
--- | ||
|
||
## Introduction | ||
|
||
Chaos engineering with LocalStack presents a proactive approach to building resilient systems by introducing | ||
controlled disruptions. This versatile practice varies in its application; for software developers, it might | ||
mean application behavior and error handling, for architects, ensuring the robustness of system design, and for | ||
operations teams, examining the reliability of infrastructure provisioning. By integrating chaos experiments early | ||
in the development cycle, teams can uncover and address potential weaknesses, forging systems that withstand | ||
turbulent conditions. In this section's subchapters, we will have a look at some of these scenarios using examples: | ||
|
||
- **Software behavior and error handling** using Fault Injection Simulator experiments. | ||
- **Robust architecture** as a result or Route53 failover tested with FIS experiments. | ||
- **Infrastructure provisioning reliability** when faced with outages and anomalies, as part of automated provisioning processes. |
Binary file added
BIN
+250 KB
content/en/user-guide/chaos-engineering/fis-experiments/fis-experiment-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+350 KB
content/en/user-guide/chaos-engineering/fis-experiments/fis-experiment-2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
280 changes: 280 additions & 0 deletions
280
content/en/user-guide/chaos-engineering/fis-experiments/index.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,280 @@ | ||
--- | ||
title: "Fault Injection Simulator Experiments" | ||
linkTitle: "Fault Injection Simulator Experiments" | ||
weight: 1 | ||
description: Perform controlled experiments on your AWS infrastructure, allowing you to simulate faults and observe their impact to build more resilient applications. | ||
--- | ||
|
||
## Introduction | ||
|
||
AWS Fault Injection Simulator (FIS) is a service that facilitates controlled chaos engineering experiments on AWS | ||
infrastructure to identify weaknesses and enhance system resilience. It provides a framework for injecting failures | ||
and monitoring their effects, enabling developers to proactively prepare for real-world outages. | ||
|
||
## Getting started | ||
|
||
This guide is designed for users new to the Fault Injection Simulator and assumes basic knowledge of the AWS CLI and our | ||
[`awslocal`](https://github.com/localstack/awscli-local) wrapper script. To read extensively about the FIS service, please | ||
refer to the dedicated [documentation page](/user-guide/aws/fis/). | ||
|
||
|
||
In this example of utilizing AWS Fault Injection Simulator (FIS) to cause controlled outages to a DynamoDB database we will | ||
demonstrate testing software behavior and error handling. This kind of test helps to ensure that the software can handle | ||
database downtime gracefully by implementing strategies such as queuing requests to prevent data loss. This proactive error | ||
handling ensures that the system can maintain its operations despite partial failures. You can follow along with the full solution | ||
in this GitHub [repository](https://github.com/localstack-samples/samples-chaos-engineering/tree/main/FIS-experiments). | ||
|
||
Start LocalStack using the `docker-compose.yml` file from the repository and make sure you provide your API key as an environment | ||
variable: | ||
|
||
{{< command >}} | ||
$ LOCALSTACK_API_KEY=<YOUR_LOCALSTACK_API_KEY> | ||
$ docker compose up | ||
{{< /command >}} | ||
|
||
{{< figure src="fis-experiment-1.png" >}} | ||
|
||
The resources will be created upon the LocalStack start. | ||
|
||
## Creating an experiment template | ||
|
||
Before creating any FIS experiments, let's make sure our system works as expected by creating an entity and persist it. | ||
We'll call the API Gateway endpoint for the POST method via cURL: | ||
|
||
```bash | ||
$ curl --location 'http://12345.execute-api.localhost.localstack.cloud:4566/dev/productApi' \ | ||
--header 'Content-Type: application/json' \ | ||
--data '{ | ||
"id": "prod-2004", | ||
"name": "Ultimate Gadget", | ||
"price": "49.99", | ||
"description": "The Ultimate Gadget is the perfect tool for tech enthusiasts looking for the next level in gadgetry. Compact, powerful, and loaded with features." | ||
} | ||
' | ||
|
||
Product added/updated successfully. | ||
``` | ||
|
||
We create a file containing the FIS experiment called `experiment-ddb.json`. This has a JSON configuration that will be utilized | ||
during the subsequent invocation of the `CreateExperimentTemplate` API in the FIS resource. | ||
|
||
```bash | ||
$ cat experiment-ddb.json | ||
{ | ||
"actions": { | ||
"Test action 1": { | ||
"actionId": "localstack:generic:api-error", | ||
"parameters": { | ||
"service": "dynamodb", | ||
"api": "all", | ||
"percentage": "100", | ||
"exception": "DynamoDbException", | ||
"errorCode": "500" | ||
} | ||
} | ||
}, | ||
"description": "Template for interfering with the DynamoDB service", | ||
"stopConditions": [{ | ||
"source": "none" | ||
}], | ||
"roleArn": "arn:aws:iam:000000000000:role/ExperimentRole" | ||
} | ||
``` | ||
|
||
With this template definition we are targeting all APIs of the DynamoDB resource. Specific operations, such as `PutItem` or `GetItem` can also | ||
be specified, but in this case, we just want to cut off the database completely. This configuration will result in a 100% failure rate | ||
for all API calls, each accompanied by an HTTP 500 status code, with a DynamoDbException. | ||
|
||
```bash | ||
$ awslocal fis create-experiment-template --cli-input-json file://experiment-ddb.json | ||
{ | ||
"experimentTemplate": { | ||
"id": "895591e8-11e6-44c4-adc3-86592010562b", | ||
"description": "Template for interfering with the DynamoDB service", | ||
"actions": { | ||
"Test action 1": { | ||
"actionId": "localstack:generic:api-error", | ||
"parameters": { | ||
"service": "dynamodb", | ||
"api": "all", | ||
"percentage": "100", | ||
"exception": "DynamoDbException", | ||
"errorCode": "500" | ||
} | ||
} | ||
}, | ||
"stopConditions": [ | ||
{ | ||
"source": "none" | ||
} | ||
], | ||
"creationTime": 1699308754.415716, | ||
"lastUpdateTime": 1699308754.415716, | ||
"roleArn": "arn:aws:iam:000000000000:role/ExperimentRole" | ||
} | ||
} | ||
``` | ||
|
||
We take note of the template ID for the next command. | ||
|
||
## Starting the experiment | ||
|
||
Based on the experiment template that was just created, a new experiment can be started, using the template ID. | ||
|
||
```bash | ||
$ awslocal fis start-experiment --experiment-template-id 895591e8-11e6-44c4-adc3-86592010562b | ||
{ | ||
"experiment": { | ||
"id": "1b1238fd-316d-4956-93e7-5ada677a6f69", | ||
"experimentTemplateId": "895591e8-11e6-44c4-adc3-86592010562b", | ||
"roleArn": "arn:aws:iam:000000000000:role/ExperimentRole", | ||
"state": { | ||
"status": "running" | ||
}, | ||
"actions": { | ||
"Test action 1": { | ||
"actionId": "localstack:generic:api-error", | ||
"parameters": { | ||
"service": "dynamodb", | ||
"api": "all", | ||
"percentage": "100", | ||
"exception": "DynamoDbException", | ||
"errorCode": "500" | ||
} | ||
} | ||
}, | ||
"stopConditions": [ | ||
{ | ||
"source": "none" | ||
} | ||
], | ||
"creationTime": 1699308823.74327, | ||
"startTime": 1699308823.74327 | ||
} | ||
} | ||
``` | ||
|
||
## The outage | ||
|
||
Now that the experiment is started, the database will be inaccessible, meaning the user can't retrieve and can't add any new | ||
products. The API Gateway will return an Internal Server Error. This is obviously problematic, as anyone who has ever worked | ||
with enterprise applications can tell you, downtime and data loss are two things crucial to avoid. | ||
Luckily, this potential issue has been caught early enough in the development phase, that the developer can include proper error handling and a mechanism | ||
that prevents data loss in case of an outage of the database. This of course is not limited to DynamoDB, an outage can be | ||
simulated for any storage resource. | ||
|
||
## The solution | ||
|
||
![fis-experiment-2](fis-experiment-2.png) | ||
|
||
The potential solution could be deploying an SNS topic, an SQS queue and a Lambda function that will pick up the queued element and retry the | ||
`PutItem` operation on the database. In case DynamoDB is still unavailable, the item will be re-queued. | ||
|
||
```bash | ||
$ curl --location 'http://12345.execute-api.localhost.localstack.cloud:4566/dev/productApi' \ | ||
--header 'Content-Type: application/json' \ | ||
--data '{ | ||
"id": "prod-1003", | ||
"name": "Super Widget", | ||
"price": "29.99", | ||
"description": "A versatile widget that can be used for a variety of purposes. Durable, reliable, and affordable." | ||
} | ||
' | ||
|
||
A DynamoDB error occurred. Message sent to queue.⏎ | ||
|
||
``` | ||
|
||
If we check the logs, we can see that the `DynamoDbException` is handled gracefully: | ||
|
||
```bash | ||
2023-11-06T22:21:40.789 DEBUG --- [ asgi_gw_2] l.services.fis.handler : FIS handler called with configs: {'dynamodb': {None: [(100, 'DynamoDbException', '500')]}} | ||
2023-11-06T22:21:40.789 INFO --- [ asgi_gw_2] localstack.request.aws : AWS dynamodb.PutItem => 500 (DynamoDbException) | ||
2023-11-06T22:21:40.834 DEBUG --- [ asgi_gw_4] l.services.sns.publisher : Topic 'arn:aws:sns:us-east-1:000000000000:ProductEventsTopic' publishing '5520d37a-fc21-4a73-b1bf-f9b9afce5908' to subscribed | ||
'arn:aws:sqs:us-east-1:000000000000:ProductEventsQueue' with protocol 'sqs' (subscription 'arn:aws:sns:us-east-1:000000000000:ProductEventsTopic:0a4abf8c-744a-404a-9ff9-f132e25d1b30') | ||
``` | ||
|
||
Now this element sits in the queue, until the outage is over. | ||
|
||
## Stopping the experiment | ||
|
||
We can stop the experiment by using the following command: | ||
|
||
```bash | ||
$ awslocal fis stop-experiment --id 1b1238fd-316d-4956-93e7-5ada677a6f69 | ||
{ | ||
"experiment": { | ||
"id": "1b1238fd-316d-4956-93e7-5ada677a6f69", | ||
"experimentTemplateId": "895591e8-11e6-44c4-adc3-86592010562b", | ||
"roleArn": "arn:aws:iam:000000000000:role/ExperimentRole", | ||
"state": { | ||
"status": "stopped" | ||
}, | ||
"actions": { | ||
"Test action 1": { | ||
"actionId": "localstack:generic:api-error", | ||
"parameters": { | ||
"service": "dynamodb", | ||
"api": "all", | ||
"percentage": "100", | ||
"exception": "DynamoDbException", | ||
"errorCode": "500" | ||
}, | ||
"startTime": 1699308823.750742, | ||
"endTime": 1699309736.259625 | ||
} | ||
}, | ||
"stopConditions": [ | ||
{ | ||
"source": "none" | ||
} | ||
], | ||
"creationTime": 1699308823.74327, | ||
"startTime": 1699308823.74327, | ||
"endTime": 1699309736.259646 | ||
} | ||
} | ||
``` | ||
|
||
The experiment ID comes from the prior used `start-experiment` command. | ||
The experiment has been stopped, meaning that the Product that initially has not reached the database, has finally reached | ||
the destination. We can verify that by scanning the database: | ||
|
||
```bash | ||
$ awslocal dynamodb scan --table-name Products | ||
{ | ||
"Items": [ | ||
{ | ||
"name": { | ||
"S": "Super Widget" | ||
}, | ||
"description": { | ||
"S": "A versatile widget that can be used for a variety of purposes. Durable, reliable, and affordable." | ||
}, | ||
"id": { | ||
"S": "prod-1003" | ||
}, | ||
"price": { | ||
"N": "29.99" | ||
} | ||
}, | ||
{ | ||
"name": { | ||
"S": "Ultimate Gadget" | ||
}, | ||
"description": { | ||
"S": "The Ultimate Gadget is the perfect tool for tech enthusiasts looking for the next level in gadgetry. Compact, powerful, and loaded with features." | ||
}, | ||
"id": { | ||
"S": "prod-2004" | ||
}, | ||
"price": { | ||
"N": "49.99" | ||
} | ||
} | ||
], | ||
"Count": 2, | ||
"ScannedCount": 2, | ||
"ConsumedCapacity": null | ||
} | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
--- | ||
title: "WebApp Fault Injection Simulator" | ||
linkTitle: "WebApp Fault Injection Simulator" | ||
weight: 1 | ||
description: WebApp Fault Injection Simulator | ||
--- | ||
|
||
## Introduction |
14 changes: 14 additions & 0 deletions
14
content/en/user-guide/chaos-engineering/outages-extension/index.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
--- | ||
title: "Outages Extension" | ||
linkTitle: "Outages Extension" | ||
weight: 1 | ||
description: Outages Extension | ||
--- | ||
|
||
## Introduction | ||
|
||
Outages Extension | ||
|
||
## Getting started | ||
|
||
<Demo coming> |
Oops, something went wrong.