Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(anynines-klutch): initial commit #1070

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 25 additions & 0 deletions charts/anynines-klutch/.helmignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Patterns to ignore when building packages.
# This supports shell glob matching, relative path matching, and
# negation (prefixed with !). Only one pattern per line.
.DS_Store
# Common VCS dirs
.git/
.gitignore
.bzr/
.bzrignore
.hg/
.hgignore
.svn/
# Common backup files
*.swp
*.bak
*.tmp
*.orig
*~
# Various IDEs
.project
.idea/
*.tmproj
.vscode/
README.md.gotmpl
.helmignore
6 changes: 6 additions & 0 deletions charts/anynines-klutch/Chart.lock
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
dependencies:
- name: common
repository: oci://ghcr.io/teutonet/teutonet-helm-charts
version: 1.2.0
digest: sha256:62ef92fb03b60b1bf481b96b8b856f3b3156c10cc50a50e3604c8b679ef71497
generated: "2024-07-29T11:41:05.632726065+02:00"
20 changes: 20 additions & 0 deletions charts/anynines-klutch/Chart.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
apiVersion: v2
name: anynines-klutch
type: application
version: 0.1.0
icon: https://docs.k8s.anynines.com/img/favicon.ico
maintainers:
- name: cwrau
email: [email protected]
- name: marvinWolff
email: [email protected]
- name: tasches
email: [email protected]
sources:
- https://docs.k8s.anynines.com/docs/platform-operator/central-management-cluster-setup
home: https://teuto.net
description: Installs the anynines klutch platform
dependencies:
- name: common
version: 1.2.0
repository: oci://ghcr.io/teutonet/teutonet-helm-charts
313 changes: 313 additions & 0 deletions charts/anynines-klutch/README.md.gotmpl
Original file line number Diff line number Diff line change
@@ -0,0 +1,313 @@
[modeline]: # ( vim: set ft=markdown: )
{{ template "chart.header" . }}
{{ template "chart.deprecationWarning" . }}

{{ template "chart.badgesSection" . }}

{{ template "chart.description" . }}

{{ template "chart.homepageLine" . }}

{{ template "chart.maintainersSection" . }}

## Cluster bootstrap

```sh
# always be git 😁
git init

# create empty cluster HelmRelease;
flux create helmrelease --export base-cluster -n flux-system --source HelmRepository/teuto-net.flux-system --chart base-cluster --chart-version 5.x.x > cluster.yaml

# maybe use the following name for your cluster;
kubectl get node -o jsonpath='{.items[0].metadata.annotations.cluster\.x-k8s\.io/cluster-name}'

# configure according to your needs, at least `.global.clusterName` is needed
# additionally, you should add your git repo to `.flux.gitRepositories`, see [the documentation](https://github.com/teutonet/teutonet-helm-charts/tree/main/charts/base-cluster#81--property-base-cluster-configuration--flux--gitrepositories)
# make sure to use the correct url format, see [the documentation](https://github.com/teutonet/teutonet-helm-charts/tree/main/charts/base-cluster#81112-property-base-cluster-configuration--flux--gitrepositories--additionalproperties--allof--item-0--oneof--item-1)
vi cluster.yaml

# create HelmRelease for flux to manage itself
kubectl create namespace flux-system --dry-run=client -o yaml > flux.yaml
flux create source helm --url https://fluxcd-community.github.io/helm-charts flux -n flux-system --export >> flux.yaml
flux create helmrelease --export flux -n flux-system --source HelmRepository/flux.flux-system --chart flux2 --chart-version 2.x.x >> flux.yaml

# add, commit and push resources
git add cluster.yaml flux.yaml
git commit cluster.yaml flux.yaml
git push

# after this you should be on the KUBECONFIG for the cluster
# we explicitly do not use `flux bootstrap` or `flux install` as this creates kustomization stuff and installs flux manually
kubectl apply --server-side -f flux.yaml # ignore the errors about missing CRDs
helm install -n flux-system flux flux2 --repo https://fluxcd-community.github.io/helm-charts --version 2.x.x --atomic

# manual initial installation of the chart, afterwards the chart takes over
# after the installation finished, follow the on-screen instructions to configure your flux, distribute KUBECONFIGs, ...
helm install -n flux-system base-cluster oci://ghcr.io/teutonet/teutonet-helm-charts/base-cluster --version 4.x.x --atomic --values <(cat cluster.yaml | yq -y .spec.values)

# you can use this command to get the instructions again
# e.g. when adding users, gitRepositories, ...
helm -n flux-system get notes base-cluster
```

> ⚠️ Due to various reasons, it's not possible to cleanly uninstall this
via a normal `kubectl delete`, `helm uninstall` or via flux deletion.
[See the corresponding issue](https://github.com/teutonet/teutonet-helm-charts/issues/28)

## Cluster components

### Component [backup](#backup)

[velero](https://velero.io) takes care of backing up your PVCs.

### Component [cert-manager](#certManager)

[cert-manager](https://cert-manager.io) takes care of creating SSL certificates
for your Ingresses (and [other needs](https://cert-manager.io/docs/usage))

1. set `.certManager.email` to your email for the Let's Encrypt account to enable
certificates

To create wildcard certificates, you need to enable a [DNS Provider](#component-dns)

Then you can just create a [`Certificate`](https://cert-manager.io/docs/usage/certificate)
resource.

### Component [descheduler](#descheduler)

The [descheduler](https://github.com/kubernetes-sigs/descheduler) runs periodically
and tries to average the load across the nodes by deleting pods on fuller nodes
so the kube-scheduler can, hopefully, schedule them on nodes with more space.

Additionally, the descheduler also tries to reconcile `topologySpreadConstraints`
and affinities.

If the cluster is _semi_ underspecced or the individual applications have unperfect
resource requests, the descheduler might lead to period restarting of random pods.

In that case you should disable the descheduler.

### Component [dns](#dns)

The [external-dns](https://github.com/kubernetes-sigs/external-dns) creates, updates,
deletes and syncs DNS records for your Ingresses.

1. set `.dns.provider.<provider>` to your implementation:
- cloudflare: `.dns.provider.cloudflare.apiToken`

If you need a different provider than cloudflare, please open a ticket for one of
the [supported ones](https://github.com/kubernetes-sigs/external-dns#status-of-providers)
which is also supported by [cert-manager](https://cert-manager.io/docs/configuration/acme/dns01/#supported-dns01-providers)

### Component [ingress](#ingress)

The included [`nginx` ingress-controller](https://docs.nginx.com/nginx-ingress-controller)
only works for the `IngressClassName: nginx`.

#### TLS

1. add `kubernetes.io/tls-acme: "true"` to your Ingress's annotations
- additionally, although not advised unless you know what you're doing,
you can explicitly choose the Issuer by using these annotations:
- `cert-manager.io/cluster-issuer: letsencrypt-staging`
- `cert-manager.io/cluster-issuer: letsencrypt-production`

#### IP Address

If you want to make sure that, in the event of a catastrophic failure, you keep the
same IP address, you should roll this out, get the assigned IP
(`kubectl -n ingress-nginx get svc ingress-nginx-controller -o jsonpath='{.status.loadBalancer.ingress}'`)
and set `.ingress.IP=<ip>` in the values. This makes sure the IP is kept in your
project (may incur cost!), which means you can reuse it later or after recovery.

### Component [flux](#flux)

[Flux](https://fluxcd.io) is used to deploy resources to your cluster and to
keep them in sync.

Flux can also auto-update images and HelmReleases.

You can create any number of gitRepository connections, with SSH(recommended)
or https checkout, with or without SOPS, ... .

### Component [kyverno](#kyverno)

You can optionally enable [kyverno](https://kyverno.io), which is a policy
system, allowing you to specify in-depth policies to prevent or force certain
things in your cluster.

### Component [monitoring](#monitoring)

#### Sub-Component [prometheus](#monitoring_prometheus)

[Prometheus](https://prometheus.io) takes care of scraping metrics and alerting.

#### Sub-Component [grafana](#monitoring_grafana)

[Grafana](https://grafana.com) is used to create dashboards to visualize your
metrics and the health of your cluster and applications.

#### Sub-Component [loki](#monitoring_loki)

[Loki](https://grafana.com/oss/loki) collects logs from across the cluster and
allows to have a centralized, non-CLI, view of the logs and to create alerting
based on them.

#### Sub-Component [metrics-server](#monitoring_metricsServer)

[Metrics Server](https://github.com/kubernetes-sigs/metrics-server) implements
the [kubernetes Metrics API](https://github.com/kubernetes/metrics) to allow
for [Horizontal Autoscaling](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale),
[Vertical Autoscaling](https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-autoscaler)
and to allow `kubectl top` and tools like [k9s](https://k9scli.io) to show
resource usage for your pods and nodes.

#### Sub-Component [securityScanning](#monitoring_securityScanning)

The included [trivy](https://aquasecurity.github.io/trivy-operator) scans the
running workload in your cluster for CVEs and creates
[Custom Resources](https://aquasecurity.github.io/trivy-operator/v0.12.1/docs/crds)
to present the results.

#### Sub-Component [tracing](#monitoring_tracing)

The included [OpenTelemetry Collector](https://opentelemetry.io/docs/collector/)
collects traces via otlp-grpc on every node via the `open-telemetry-collector-opentelemetry-collector.monitoring` service.
These traces are then sent to [Grafana Tempo](https://grafana.com/oss/tempo/),
which is included as a datasource in Grafana by default.

##### Usage Example

In your deployment/statefulset/daemonset/... add the following config;

```yaml
spec:
template:
spec:
containers:
- env:
- name: OTEL_HOST <- change this to your framework's environment variable
value: open-telemetry-collector-opentelemetry-collector.monitoring
- name: OTEL_PORT
value: "4317"
```

The supported protocols are;

- jaeger
- grpc: 14250
- thrift_http: 14268
- thrift_compact: 6831
- otlp
- grpc: 4317
- http: 4318
- zipkin: 9411

### Component [storage](#storage)

The included [NFS Ganesha server and external provisioner](https://github.com/kubernetes-sigs/nfs-ganesha-server-and-external-provisioner)
provides rudimentary support for RWM volumes if needed.

> ⚠️ This is _not_ highly available, and the software itself _does not_ support
it. You should only use this if there is no other choice and make sure you're
cloud provider knows about this, because a node rotation _will_ result in a
downtime for all attached applications!

### Component [rbac](#rbac)

This chart gives you the ability to create serviceAccounts, roles, roleBindings,
[namespaces](#miscellaneous) and KUBECONFIG files with a, hopefully easy to
understand, DSL.

After configuring your stuff you can fetch the KUBECONFIGs with the help of the
output of `helm -n flux-system get notes base-cluster`

### Miscellaneous

- You can create [`HelmRepositoy`s](#global); `.global.helmRepositories.<name>.url=<url>`
- You can create [cluster-wide certificates](#global_certificates); `.global.certificates.<name>.dnsNames=[<domain>]`
- You can create [namespaces](#global_namespaces); `.global.namespaces.<name>={}`
- You can create [cluster-wide imageCredentials](#global); `.global.imageCredentials.<name>.{host,username,password}`

{{ template "chart.sourcesSection" . }}

{{ template "chart.requirementsSection" . }}

This helm chart requires [flux v2 to be installed](https://fluxcd.io/docs/installation),
see [bootstrap](#cluster-bootstrap)

The various components are automatically updated to the latest minor and patch version.

This excludes:

- descheduler, its version is bound to the k8s version and they have not released
1.0.0

## Migration

### 0.x.x -> 1.0.0

- The field `.dns.email` moves to `.certManager.email`.
- The field `.dns.provider.cloudflare.email` is removed, as only `apiToken`s are
supported anyways.

### 1.x.x -> 2.0.0

⚠️ Skip this migration!

- Flux is now a direct dependency
- You should add the following labels to all resources of flux;
- .metadata.labels["app.kubernetes.io/managed-by"]="Helm"
- .metadata.annotations["meta.helm.sh/release-name"]="base-cluster"
- .metadata.annotations["meta.helm.sh/release-namespace"]="flux-system"
- If you have problems when applying / `helm upgrade`ing the new CRDs, like
`cannot patch "alerts.notification.toolkit.fluxcd.io" with kind
CustomResourceDefinition: CustomResourceDefinition.apiextensions.k8s.io
"alerts.notification.toolkit.fluxcd.io" is invalid: status.storedVersions[1]:
Invalid value: "v1beta2": must appear in spec.versions`, you can replace
those CRDs. (kubectl replace --force -f -)
- ⚠️ make sure to only replace CRDs you're not actively using!!, this is
a destructive operation. If all your resources are in flux you can also
try to turn off flux before the replacement and flux _should_ resync and
reconcile all resources.
- remove your manually managed flux resources

### 2.x.x -> 3.0.0

- Flux is removed as a direct dependency

The flux chart is way too unstable, cannot be used for an installation, ...

We be sorry 😥

You're gonna have to install flux yourself again

### 3.x.x -> 4.0.0

The storageClasses are going to be removed from this chart, this is prepared by
leaving them in the cluster on upgrade.

The new [t8s-cluster](../t8s-cluster) is going to provide these, the enduser can
ignore this change.

### 4.x.x -> 5.0.0

The condition if velero gets deployed changed. Velero will not be deployed if you
have not configured its backupstoragelocation. This change is necessary, because
in the current version of velero this value is mandatory. Please move
your existing backupstoragelocation configuration to the base-cluster chart if you
haven't already.

### 5.x.x -> 6.0.0

The kyverno 2.x.x -> 3.x.x upgrade cannot be done without manual intervention, see
https://artifacthub.io/packages/helm/kyverno/kyverno#option-1---uninstallation-and-reinstallation

So you have to backup your resources and delete the kyverno HelmReleases before the
upgrade, they will be recreated in version 6.

This also makes kyverno HA, so be aware that kyverno will need more resources in
you cluster.

{{ .Files.Get "values.md" }}
4 changes: 4 additions & 0 deletions charts/anynines-klutch/ci/backupmanager-values.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
backupManager:
url: https://sb.test.com:3000
username: admin
password: admin
20 changes: 20 additions & 0 deletions charts/anynines-klutch/ci/basic-values.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
dataservices:
postgresql:
url: https://pg.sb.a9s.cwrau.wtf
username: admin
password: fvGeQ8AIclWj1tN8ViiHghtLROQX9c

backupManager:
url: https://backups.a9s.cwrau.wtf
username: admin
password: FkVFcrX8Fow4ELuwkVhjeURKeocX8I

oidc:
ingress:
host: oidc.a9s.cwrau.wtf

ingress:
host: klutch.a9s.cwrau.wtf

kubernetes:
externalAddress: https://212.15.214.137:6443
Loading