Skip to content

Latest commit

 

History

History
234 lines (175 loc) · 9.09 KB

scratch.md

File metadata and controls

234 lines (175 loc) · 9.09 KB

"scratch" infrastructure

Overview

Materialize provides employees with on-demand infrastructure in an AWS account called "scratch". This account is intended for quick and dirty tests, benchmarks, short-lived side projects, and so on.

Scratch is not intended for long-term or mission-critical infrastructure, which we manage in the i2 repo. Machines created in scratch are therefore automatically deleted after 36 hours.

Permissions

All members of the engineering org should have full administrative control of the account; if you don't have access and believe you should, check with your onboarding buddy or manager.

If you are a member of a different org but feel that access to AWS resources could be useful to you, ask in #eng-infra on Slack.

To access scratch resources from the AWS console, use our SSO app.

To access scratch resources from the command line, follow the instructions here.

Management with the bin/scratch script

The bin/scratch script is intended to make it easy to launch EC2 instances from the command line. It has three subcommands: create, mine, and destroy.

bin/scratch create

This subcommand expects a series of JSON objects on standard input, each of which describes a machine to launch and configure in EC2. An example follows:

{
    "name": "chbench",
    "launch_script": "bin/mzcompose --find chbench run cloud-load-test",
    "instance_type": "r5a.4xlarge",
    "ami": "ami-0aeb7c931a5a61206",
    "size_gb": 200,
    "tags": {
        "scrape_benchmark_numbers": "true",
        "lt_name": "release-chbench",
        "purpose": "load_test",
        "mzconduct_workflow": "cloud-load-test",
        "test": "chbench",
        "environment": "scratch"
    }
}

See Grafana Integration, below, for some details about the tags.

bin/scratch create takes in configs from stdin, or by passing a name as a positional arg, like:

$ bin/scratch create dev-box

It looks for the name as a json file in misc/scratch, a good starter to just do some plain, personal testing is dev-box.


All of the keys are required (though tags can be an empty dictionary). Their meanings should largely be self-explanatory.

Any number of JSON objects may be specified, one after the other. The script creates a "cluster" of machines identified by a random 6-digit hexadecimal nonce. The materialize repo is pushed to each machine in the cluster, and then the specified launch script is run in the background. After kicking off the launch script (but without waiting for it to complete), the script terminates.

Each cluster's /etc/hosts file is modified to point to the other members of the cluster by name; for example, from another machine the operator could ping the above-described machine with ping chbench.

SSH access

SSH access to the instances is provided by EC2 instance connect. If you specify a custom AMI, you need to make sure it's an AMI that supports EC2 instance connect.

To SSH to an instance:

bin/scratch ssh INSTANCE-ID

If you need to, you can install and use the mssh command provided by the underlying [EC2 connect CLI] directly, but it's usually much easier to go through bin/scratch ssh.

To make remote ports available locally:

bin/scratch forward INSTANCE_ID port1 port2 ...

Visual Studo Code Remote Development

VS Code Remote development allows us to configure common settings and an environment in which materialize can be developed from inside a docker container. To use:

  1. Create a scratch instance: bin/scratch create remote-dev
  2. Log in with bin/scratch ssh INSTANCE-ID
  3. Append a local ssh public key (should be a file like ~/.ssh/id_ed25519.pub on your laptop) to the .ssh/authorized_keys file on the remote instance. If you want to push code to github, you must also install a private key on the instance, in which case it must have a passphrase (so that no one is able to impersonate you should they gain access to the EC2 instance).
  4. From your laptop, ensure ssh works and the fingerprint is correct: ssh ubuntu@INSTANCE-IP-ADDRESS
  5. Open VS Code on your laptop, and open Remote-SSH: Connect to Host... from the command palatte.
  6. Click to add a new ssh host and enter ssh ubuntu@INSTANCE-IP-ADDRESS and add it to your local ssh config.
  7. Click Connect in the Host added! dialog.
  8. Open the materialize folder (which is automatically created by the scratch command), then Reopen in Container from the dialog.

This will take a few minutes to build your dev container. See the guide for more details. Because this is in scratch, it may be deleted after 36 hours. If you need a longer term enviroment, use --max-age-days when creating the scratch instance.

bin/scratch mine

This subcommand lists all the machines that a given user has created with bin/scratch create, along with metadata about them. For example:

brennan@New-Orleans ~ ❯❯❯ ~/code/materialize/bin/scratch mine
+-------------------+---------------------+-------------------+--------------------+-------------------------+---------------------+---------+
|        Name       |     Instance ID     | Public IP Address | Private IP Address |       Launched By       |     Delete After    |  State  |
+-------------------+---------------------+-------------------+--------------------+-------------------------+---------------------+---------+
| f2b1bc7a-btv_test | i-02790a4efb77b06b4 |   18.191.163.58   |    10.1.26.167     | [email protected] | 2021-09-16 12:39:50 | running |
+-------------------+---------------------+-------------------+--------------------+-------------------------+---------------------+---------+

Important options include --all, which lists machines for all users, and --output-format csv, which does what it looks like. To look up machines for someone other than yourself, list their email addresses at the end of the command, like so: scratch mine [email protected].

bin/scratch push

bin/scratch push <instance_id>

push re-pushes your git HEAD to the specified instance. You can override the commit to checkout with --rev

bin/scratch destroy

This subcommand terminates a list of machines given by Instance ID on the command line. For example:

bin/scratch destroy i-02790a4efb77b06b4

As a convenience, you can also destroy all of your instances with the --all-mine option:

bin/scratch destroy --all-mine

Pass --dry-run if you want to see what instances bin/scratch destroy would destroy without actually destroyin them.

Grafana integration

The environmentd process always exposes metrics at its primary port's HTTP server on the prometheus-standard /metrics path, but the scratch instance needs to be configured correctly for our Prometheus server to actually scrape the metrics and thereby expose them to Grafana.

The tl;dr is that you must configure mzcompose with --preserve-ports and the EC2 instance with the "scrape_benchmark_numbers": "true" tag. Thus a bare-minimum Grafana-integrated config looks like:

{
    "launch_script": "bin/mzcompose --preserve-ports ..<remainder of args>"
    // .. snip config ..
    "tags": {
        "scrape_benchmark_numbers": "true"
    }
}

Read on for more details and some other items that can be configured.

Prometheus config

There are three tags on the EC2 instance that configure our Prometheus integration:

  • scrape_benchmark_numbers: must be set to exactly the string "true" in order for Prometheus to observe the instance.
  • purpose: is used as a filter in the Grafana UI. You can use this to group all your instances (e.g. set it to myname-debugging) or set it to load-test or benchmark.
  • test displays as an additional filter inside of the Grafana UI.

So the minimum scratch config to get metrics into Grafana looks like:

{
    // .. snip general config ..
    "tags": {
        "scrape_benchmark_numbers": "true"
    }
}

And a slightly more complete one could be:

{
    // .. snip general config ..
    "tags": {
        "scrape_benchmark_numbers": "true",
        "purpose": "bwm-debugging",
        "test": "chbench"
    }
}

{
    // .. snip general config ..
    "tags": {
        "scrape_benchmark_numbers": "true",
        "purpose": "bwm-debugging",
        "test": "billing"
    }
}

mzcompose config

Prometheus only looks for Materialize metrics on port 6875. The canonical way to ensure that Materialize is available on port 6875 on a host is to pass the --preserve-ports argument to mzcompose. (Without this flag, mzcompose chooses a random host port for Materialize, which will be unknown to Prometheus.)

{
    "launch_script": "bin/mzcompose --preserve-ports ..<remainder of args>"
    // .. snip config ..
}