This repository provides configurable one-shot deployment for common infrastructure such as ingress, storage, message broker, logging, monitoring, and distributed tracing on Kubernetes.
- Ingress controller (
Traefik
)- With distributed tracing enabled using Jaeger
- Send spans
- To Tempo:
http://otel-agent.tracing:14268/api/traces?format=jaeger.thrift
(default) - To Jaeger collector:
http://jaeger-collector.tracing:14268/api/traces?format=jaeger.thrift
- To Tempo:
- Logging stack
Loki
: log aggregation system- Components
- Core: Distributor, Ingester, Ruler, Table manager (log retention: 72 hours), Querier, Querier frontend
- Compactor: Dedup the index on S3 and merging all the files to a single file per table every 5 minutes
- Components
Promtail
: daemonset that tails logs from stdout and stderr of all pods
- Monitoring stack
Kubelet Cadvisor
: exposes container metricsPrometheus node exporter
: scraps node-level metrics and sends to Prometheus serverPrometheus blackbox exporter
: allows blackbox probing of endpoints over HTTP, HTTPS, DNS, TCP and ICMP. It is used to monitor K8s services here.Kube-state-metrics
: server that listens to the Kubernetes API server and generates metrics about the state of Kubernetes components, such as number of running jobs, available replicas, and number of running/stopped/terminated pods, by polling Kubernetes APIPrometheus
: metrics server that collects metrics from Cadvisor, Prometheus node exporter, Kube-state-metrics server and Kubelet metrics- Metrics retention: 3 days
Grafana
: web UI that visualizes collected metrics (version 9.3.6)Alertmanager
: server that sends alert to sysadmins when alert conditions are met, based on Prometheus metricsThanos components
(optional): a set of components that can be composed into a highly available metric system with unlimited storage capacity- Sidecar
- The main component that runs along Prometheus
- Reads and archives data on the object store
- Manages Prometheus’s configuration and lifecycle
- Injects external labels into the Prometheus configuration to distinguish each Prometheus instance
- Listens in on Thanos gRPC protocol and translates queries between gRPC and REST
- Ruler
- Basically does the same thing as the querier but for Prometheus’ rules. The only difference is that it can communicate with Thanos components.
- Rule results are written back to disk in the Prometheus 2.0 storage format.
- Rule nodes at the same time participate in the system as source store nodes, which means that they expose StoreAPI and upload their generated TSDB blocks to an object store.
- Store
- Implements the Store API on top of historical data in an object storage bucket
- Acts primarily as an API gateway and therefore does not need significant amounts of local disk space
- Joins a Thanos cluster on startup and advertises the data it can access
- Keeps a small amount of information about all remote blocks on a local disk in sync with the bucket
- This data is generally safe to delete across restarts at the cost of increased startup times
- Querier
- Listens in on HTTP and translates queries to Thanos gRPC format
- Aggregates and deduplicate the query result from different sources
- Can read data from Sidecar, Ruler, and Store
- Compactor
- applies the compaction procedure of the Prometheus 2.0 storage engine to block data stored in object storage
- Generally not concurrent with safe semantics and must be deployed as a singleton against a bucket
- Responsible for downsampling data: 5 minute downsampling after 40 hours and 1 hour downsampling after 10 days
- Two buckets:
prometheus-long-term
andthanos-ruler
on S3
- Sidecar
- Tracing components
Jaeger
: distributed tracing system for monitoring and troubleshooting microservices-based distributed systems- Collector: tracing span collector
- Query: web UI
- To connect to web UI from
localhost:30188
, runkubectl -n tracing port-forward deploy/jaeger-query 30188:16686 --address 0.0.0.0
- To connect to web UI from
Elasticsearch
: storage backend of JaegerElasticsearch exporter
: server that exports metrics of the Elasticsearch clusterElasticsearch index cleaner
: cronjob that delete any indices older than 1 day
Tempo v2
: high-volume distributed tracing system that takes advantage of 100% sampling, and only requires an object storage backend- Distributor
- Accepts spans in multiple formats including Jaeger, OpenTelemetry, Zipkin
- Routes spans to ingesters by hashing the traceID and using a distributed consistent hash ring
- Ingester
- Batches trace into blocks, creates bloom filters and indexes, and then flushes it all to the backend
- Querier
- Responsible for finding the requested trace id in either the ingesters or the backend storage
- Depending on parameters it will query both the ingesters and pull bloom/indexes from the backend to search blocks in object storage
- Query Frontend
- Queries should be sent to the Query Frontend; responsible for sharding the search space for an incoming query
- Internally, the Query Frontend splits the blockID space into a configurable number of shards and queues these requests; queriers connect to the Query Frontend via a streaming gRPC connection to process these sharded queries
- Compactor
- Block retention: 144 hours
- Streams blocks to and from the backend storage to reduce the total number of blocks
- Vulture
- Monitors Grafana Tempo's performance; it pushes traces, queries Tempo, and metrics 404s and traces with missing spans
- Distributor
In addition, some common application deployment templates are provided:
- Kafka + Zookeeper
- Log retention: 72 hours
- Kafka KRaft
- Minio
- 8 replica servers, each with 10Gi storage
- NATS JetStream
- Redis cluster
- Cassandra cluster
- Kibana
- Can be used to visualize data in
elasticsearch.tracing
- To connect to web UI from
localhost:30100
, runkubectl -n kibana port-forward deploy/kibana 30100:5601 --address 0.0.0.0
- Can be used to visualize data in
- A Kubernetes cluster with version
1.18+
- Slack webhook URL
Take monitoring-thanos
for example. You need to replace slack_api_url
with your webhook URL in monitoring-thanos/alertmanager-configmap.yaml
. You should also change the slack channel name to yours. In addition, you could set up emails to which Alertmanager sends notification.
- Dynamic volume provisoner (optional)
The default storageclass is local-path
in this template. You need to change it to the storageclass of your provisioner in patch.yaml
. However, K3s has a local path provisioner out-of-the-box. So if you are using K3s, you don't have to do any modification.
- Service load balancer (optional)
The service load balancer will create a daemonset with each pod listening on node ports specified by sevices of type LoadBalancer
(eg. Traefik service). Those daemon pods will proxy external traffic to these services. If you are using K3s, you will have a service LB out-of-the-box. However, service load balancer is optional. You could add your external IPs to Traefik service declaration (in ingress/traefik.yaml
). This way, your Traefik instance can receive external traffics without service LB.
You need to disable Traefik, which is deployed by default.
kubectl -n kube-system delete helmcharts.helm.cattle.io traefik
sudo systemctl stop k3s
sudo vim /etc/systemd/system/k3s.service # add '--no-deploy traefik' to ExecStart
sudo rm /var/lib/rancher/k3s/server/manifests/traefik.yaml
sudo systemctl daemon-reload
sudo systemctl restart k3s
Or you can start K3s server without deploying Traefik by add --no-deploy traefik
.
- (Optional) You can enable/disable the read-only port for the Kubelet to serve on with no authentication/authorization (set to
0
to disable). Default: 10255.
# K3s server
server --kubelet-arg "read-only-port=10255"
# K3s agent
agent --kubelet-arg "read-only-port=10255"
Check whether coreDNS works:
kubectl apply -f testing/dnstest.sh
./dnstest.sh
Check whether cluster nodes can ping each other:
kubectl apply -f testing/overlaytest.sh
./pingtest.sh
You could adjust the resources of each container as well as storage class and mounted volume size by simply modifying patch.yaml
.
kustomize build ingress | kubectl apply -f -
# logging, thanos and tempo need s3 as storage backend
kustomize build app/minio | kubectl apply -f -
# remember to create needed buckets in minio beforehand (loki)
kustomize build logging | kubectl apply -f -
# remember to create needed buckets in minio beforehand (prometheus-long-term, thanos-ruler)
kustomize build monitoring-thanos | kubectl apply -f -
# use tempo as tracing platform
# remember to create needed buckets in minio beforehand (tempo)
kustomize build tracing-tempo-v2 | kubectl apply -f -
# or use jaeger as tracing platform
kustomize build tracing | kubectl apply -f -
Build apps:
kustomize build app/kafka | kubectl apply -f -
kustomize build app/minio | kubectl apply -f -
kustomize build app/nats-jetstream | kubectl apply -f -
kustomize build app/redis-cluster | kubectl apply -f -
kustomize build app/cassandra | kubectl apply -f -
kustomize build app/kibana | kubectl apply -f -
- Traefik admin web will listen on port
8082
- Alertmanager web will listen on node port
30615
- Grafana web will listen on node port
31565
- Prometheus web will listen on node port
30830
- Thanos querier web will listen on node port
30831
- Thanos ruler web will listen on node port
30832
- Minio web will listen on node port
30120
Default account: admin
.
Default password: admin
.
Add the following data sources in Grafana:
- Monitoring
- Prometheus (Thanos):
http://thanos-querier:10902
(datasource name:prometheus
)- Thanos does deduplication and uses external labels for identifying Prometheus replicas.
- If you use
http://prometheus:9090
as data source, then metrics may duplicate and external labels will become invisible since external labels are added to time series or alerts only when communicating with external systems, such as federation, remote storage, and Alertmanager.
- Prometheus (Thanos):
- Tracing
- Jaeger:
http://jaeger-query.tracing:16686
- Tempo
- Grafana version 7.5.x or higher:
http://query-frontend.tracing:3200
(which is the case in this repository) - Grafana version 7.4.x or lower:
http://query-frontend.tracing:16686
(needtempo-query
as adapter)
- Grafana version 7.5.x or higher:
- Jaeger:
- Logging
- Loki:
http://loki-headless.logging:3100
- Loki:
Thanos and Tempo data sources are added to Grafana by default. If you are using different data sources, such as Jaeger for tracing, you can modify Grafana data sources configuration.
To obtain correlation between logs and traces, one can use Loki derived fields to parse traceID from logs and link to the tracing web UI. For example, we can link to Jaeger Explore in Grafana:
Dashboards:
- Elasticsearch Exporter
- MinIO Overview
- Cassandra Metrics
- Kafka Exporter
- NATS Exporter
- Redis Exporter
- Zookeeper Exporter
- Container Metrics
- Prometheus Stats
- Kube-State Metrics
- Node Exporter Full
- Jaeger
- Tempo
Note that Prometheus data source name should be set to prometheus
for all templates provided in this repository.