Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document setting up grafana agent, github runners, split ingress #244

Merged
merged 1 commit into from
Nov 20, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions pages/deployments/advanced-configuration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
---
title: Advanced Configuration
description: Fine-tuning your Plural Console to meet your requirements
---
35 changes: 35 additions & 0 deletions pages/deployments/ci-gh-actions.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,41 @@ plural cd services update @{cluster-handle}/{service-name} --conf {name}={value}

Feel free to run `plural cd services update --help` for more documentation as well.

## Self-Hosted Runners

Many users will want to host their console in a private network. If that's the case, a standard hosted Github Actions runner will not be able to network to the console api and allow the execution of the `plural cd` commands. The solution for this is to leverage github's self-hosted runners to allow you to run the Actions in an adjacent network and maintain the security posture of your console. We've added a few add-ons to make this setup trivially easy to handle, you'll want to:

- install the `github-actions-controller` runner to set up the k8s operator that manages runners in a cluster. You likely want this to be installed in your management cluster for network adjacency.
- install the `plrl-github-actions-runner` in that same cluster to create a runner set you can schedule jobs on.

Once both are deployed, you can create your first job, it'll likely look something like this:

```yaml
jobs:
# some previous jobs...
update-service:
needs: [docker-build]
runs-on: plrl-github-actions-runner
env:
PLURAL_CONSOLE_TOKEN: ${{ secrets.PLURAL_CONSOLE_TOKEN }}
PLURAL_CONSOLE_URL: ${{ secrets.PLURAL_CONSOLE_URL }}
steps:
- name: Checkout
uses: actions/checkout@v4
- name: installing plural
uses: pluralsh/[email protected]
- name: Using short sha
run: echo ${GITHUB_SHA::7}
- name: Update service
run: plural cd services update @mgmt/marketing --conf tag=sha-${GITHUB_SHA::7}
```

Note that the `runs-on` job attribute is what specifies this as using the plrl-github-actions runner. It's worth also looking into some of the control mechanisms Github provides to gate what repositories and workflows can leverage self-hosted runners to manage the security tradeoffs it poses.

{% callout severity="warning" %}
Github recommends you don't use self-hosted runners on public repositories due to the complexity required to prevent workflows from being run by fork repository pull requests.
{% /callout %}

## Addendum

Since the plural cli is a standalone go binary, it can easily be injected in any CI framework in much the same way by installing it and the executing the appropriate cli command to modify your service once a deployable artifact has been built.
15 changes: 15 additions & 0 deletions pages/deployments/monitoring-addons.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,18 @@ description: Set up common monitoring agents
The `datadog` add-on will automatically install Datadog's onto a cluster for you. You can also create a global service to automate installing the agent throughout your fleet. It will ask for your datadog api and app keys, and automatically inject them into the agent. We'll also manage future upgrades of the agent so you don't have to.

Once the agent is installed, there are often additional features that need to be enabled to get the full Datadog ecosystem functioning. We recommend visiting their docs [here](https://docs.datadoghq.com/containers/kubernetes/installation/?tab=operator#next-steps)

## Grafana Agent

The `grafana-agent` add-on will deploy an instance of grafana's metrics agent on a cluster in a self-serviceable way. The agent simplifies the process of configuring remote writes for prometheus (without needing a full prometheus db) and also integrates with the standard coreos `ServiceMonitor` and `PodMonitor` CRDs.

Our configuration for the agent will ask you to:

- input hostname and basic auth information for prometheus
- input hostname and basic auth information for loki

And will immediately start shipping logs to both on your behalf. We'd recommend also leveraging our global service setup to simplify rolling it out to your entire fleet. You'll be able to distinguish metrics via the `cluster` label in both Loki and Prometheus, which will map to the cluster handle attribute to ensure it's human-readable.

If you haven't set up Loki or Prometheus, and you created your console via Plural, we recommend using our Mimir and Loki setups in the Plural marketplace. They're completely self-serviceable and will properly configure the underlying S3/GCS/Azure Blob Storage needed to persist the metrics data. In addition, our Grafana distribution auto-integrates them as datasources, so there's no additional setup needed there.

If you set up your console via BYOK, then feel free to let us know and we can help you set them up as part of our support packages.
50 changes: 50 additions & 0 deletions pages/deployments/network-configuration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
---
title: Network Configuration
description: Modifying ingress controller and setting up public/private endpoints for your console
---

## Overview

There are a few strategies you can take to harden the network security of your console or align it with how you typically secure kubernetes ingresses. We'll note a few of these here.

## Bringing Your Own Ingress

Our helm chart has the ability to reconfigure the ingress class for your console. This could be useful if you already have an ingress controller with CIDR ranges and WAF setups built in. The helm values change is relatively simple, simply do:

```yaml
ingress:
ingressClass: <new-ingress-class>
# potentially you might also want to add some annotations
annotations:
new.ingress.annotations: <value>

kas:
ingress:
ingressClass: <new-ingress-class>
```

Both KAS and the console leverage websockets for some portion of their functionality. In the case of the console, the websockets are also far more performant with connection stickiness in place. Some ingress controllers have inconsistent websocket support (or require paid versions to unlock it), which is worth keeping in mind.

Also we do configure the ingresses with cert-manager by default. Some orgs will set a wildcard cert at the ingress level, in which case you'd want to disable the ingress-level certs.

## Public/Private Ingress

Another setup we support is splitting the console ingress between public and private. This allows you to host the entirety of the Console's api in a private network, while exposing a subset needed to serve the apis for the deployment agents to poll our APIs. These apis are minimal, they only provide:

- read access to the services deployable to an agent
- a ping endpoint for a given cluster sending the cluster version and a timestamp
- the ability to update the components created for a service by an agent

This is a relatively easy way to ensure network connectivity to end clusters in a pretty broad network topology, but there are of course other more advanced setups a team can attempt. The basic setup for this is as follows:

```yaml
ingress:
ingressClass: internal-nginx # or another private ingress controller

externalIngress:
hostname: console-ext.your.subdomain # or whatever you'd like to rename it
```

This will create a second, limited ingress exposing only the apis listed above via path routing. In this world, we'd also recommend you leave the KAS service also on a similar network as the external ingress.

There are still additional tactics you can use to harden this setup, for instance adding CIDR ranges for the NAT gateways of all the networks the clusters you wish to deploy to reside on can provide robust firewalling for the ingresses you'd configured.
10 changes: 10 additions & 0 deletions src/NavData.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,16 @@ const rootNavData: NavMenu = deepFreeze([
href: '/deployments/existing-cluster',
title: 'Set Up on your own Cluster',
},
{
href: '/deployments/advanced-configuration',
title: 'Advanced Configuration',
sections: [
{
title: 'Network Configuration',
href: '/deployments/network-configuration',
},
],
},
],
},
{
Expand Down
6 changes: 6 additions & 0 deletions src/generated/pages.json
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,9 @@
{
"path": "/deployments/addons"
},
{
"path": "/deployments/advanced-configuration"
},
{
"path": "/deployments/architecture"
},
Expand Down Expand Up @@ -152,6 +155,9 @@
{
"path": "/deployments/network-addons"
},
{
"path": "/deployments/network-configuration"
},
{
"path": "/deployments/operations"
},
Expand Down
Loading