Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: DevOps and AWS course task 9 #1736

Merged
merged 1 commit into from
Dec 8, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
81 changes: 44 additions & 37 deletions devops/modules/4_monitoring-configuration/task_9.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,56 +2,63 @@

## Objective

In this task, you will configure Alertmanager to send alerts for specific events in your Kubernetes (K8s) cluster and verify that the alerts are received.
In this task, you will configure Grafana Alerting to send alerts for specific events in your Kubernetes (K8s) cluster and verify that the alerts are received.

## Steps

1. **Configure Alertmanager**

- Follow the instructions to configure Alertmanager. Refer to the [Alertmanager documentation](https://prometheus.io/docs/alerting/latest/alertmanager/) for more details.
1. **Configure SMTP for Grafana**
- Configure SMTP server
- For local setup you can consider any SMTP server
- To send emails in AWS consider using Amazon SES (Simple Email Service)
- Configure Grafana SMTP settings
- Local setup will need only host:port and probably skipVerify to bypass tls verification
- AWS SES will need host:port, authentication details and "from address" which must be verified in SES
2. **Configure Contact points**
- Add your email as a contact point. (this email should be also verified if you are using AWS SES)
3. **Configure Alert Rules**
- Configure alerts for the following events:
- High CPU utilization on any node of the cluster.
- Lack of CPU cores capacity on any node of the cluster.
- Lack of RAM capacity on any node of the cluster.
- Ensure alerts are delivered to your email address.

2. **Verify Alerts**

- Simulate any failure from the list above and verify that alerts are received.

3. **Store Artifacts in Git**

- Store the configuration files for Alertmanager in a new git repository.

4. **Additional Tasks**
- Document the Alertmanager setup and alert configuration in a README file.
4. **Verify Alerts**
- Simulate CPU and memory stress on a Kubernetes node using tools like `stress` or `sysbench`.
5. **Additional Tasks**
- Document the setup and alert configuration in a README file.

## Submission

- Provide a PR with the configuration files for Alertmanager in a new repository.
- Ensure that alerts are configured and delivered to your email address.
- Provide evidence of received alerts (e.g., screenshots of email notifications).
- Provide a PR with the GHA pipeline code for Alertmanager deployment.
- Provide a README file documenting the Alertmanager setup and alert configuration.
- Provide a PR with the changes in configuration files.
- Include into PR (description or in changes) screenshots of:
- Contact Points.
- Alert Rules in normal and firing state.
- Alert Rules configuration.
- Received emails.
- Provide a README file documenting the setup and alert configuration.
**Note:** Ensure that all personal data, such as email addresses and SMTP credentials, are hidden in screenshots, code and documentation.

## Evaluation Criteria (100 points for covering all criteria)

1. **Alertmanager Configuration (60 points)**

- Alertmanager is configured to send alerts for the following events:
1. **Contact Points created (10 points)**
2. **Alert Rules created (40 points)**
- Alert Rules are configured to send alerts for the following events:
- High CPU utilization on any node of the cluster.
- Lack of CPU cores capacity on any node of the cluster.
- Lack of RAM capacity on any node of the cluster.
- Alerts are configured to be delivered to your email address.

2. **Alert Verification (10 points)**

- Simulate any failure from the list above and verify that alerts are received.

3. **Repository Submission (10 points)**

- A repository is created with the configuration files for Alertmanager.

4. **Additional Tasks (20 points)**
3. **Alert Rules are working as expected (20 points)**
- Alert Rules are firing when the specified events occur.
4. **Email is received (10 points)**
5. **Additional Tasks (20 points)**
- **Documentation (10 points)**
- The Alertmanager setup and alert configuration are documented in a README file.
- **Additional Alerting Channels (10 points)**
- Configure additional alerting channels (e.g., Slack, PagerDuty) for Alertmanager.
- **Configuration is done completely in code (10 points)**
- Alert Rules, Contact Points, and SMTP settings are configured using YAML files or other code-based methods.

## Additional Resources

- [Grafana Alerting Documentation](https://grafana.com/docs/grafana/latest/alerting/)
- AWS restricts outgoing connections for sending emails: [AWS Documentation](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-resource-limits.html#port-25-throttle)
- Domain for AWS SES can be anything, it won't require verification to send emails to verified email addresses.
- Simple SMTP server deployment [docker-postfix](https://github.com/bokysan/docker-postfix)
- [Tool to impose load](https://linux.die.net/man/1/stress)
- [Helm Chart for Grafana](https://github.com/bitnami/charts/tree/main/bitnami/grafana)
- [Use configuration files to provision alerting resources](https://grafana.com/docs/grafana/latest/alerting/set-up/provision-alerting-resources/file-provisioning/)
Loading