Skip to content

Commit

Permalink
GitBook: [0.11] 38 pages modified
Browse files Browse the repository at this point in the history
  • Loading branch information
Vishal Bollu authored and gitbook-bot committed Nov 28, 2019
1 parent e6604eb commit 2e2962a
Show file tree
Hide file tree
Showing 25 changed files with 127 additions and 149 deletions.
48 changes: 15 additions & 33 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,32 +2,18 @@

Cortex is an open source platform for deploying machine learning models—trained with nearly any framework—as production web services.

<br>

<!-- Set header Cache-Control=no-cache on the S3 object metadata (see https://help.github.com/en/articles/about-anonymized-image-urls) -->
![Demo](https://d1zqebknpdh033.cloudfront.net/demo/gif/v0.8.gif)

<br>

## Key features

- **Autoscaling:** Cortex automatically scales APIs to handle production workloads.

- **Multi framework:** Cortex supports TensorFlow, PyTorch, scikit-learn, XGBoost, and more.

- **CPU / GPU support:** Cortex can run inference on CPU or GPU infrastructure.

- **Spot instances:** Cortex supports EC2 spot instances.

- **Rolling updates:** Cortex updates deployed APIs without any downtime.

- **Log streaming:** Cortex streams logs from deployed models to your CLI.

- **Prediction monitoring:** Cortex monitors network metrics and tracks predictions.

- **Minimal configuration:** Deployments are defined in a single `cortex.yaml` file.

<br>
* **Autoscaling:** Cortex automatically scales APIs to handle production workloads.
* **Multi framework:** Cortex supports TensorFlow, PyTorch, scikit-learn, XGBoost, and more.
* **CPU / GPU support:** Cortex can run inference on CPU or GPU infrastructure.
* **Spot instances:** Cortex supports EC2 spot instances.
* **Rolling updates:** Cortex updates deployed APIs without any downtime.
* **Log streaming:** Cortex streams logs from deployed models to your CLI.
* **Prediction monitoring:** Cortex monitors network metrics and tracks predictions.
* **Minimal configuration:** Deployments are defined in a single `cortex.yaml` file.

## Usage

Expand Down Expand Up @@ -92,19 +78,15 @@ positive 8
negative 4
```

<br>

## How it works

The CLI sends configuration and code to the cluster every time you run `cortex deploy`. Each model is loaded into a Docker container, along with any Python packages and request handling code. The model is exposed as a web service using Elastic Load Balancing (ELB), TensorFlow Serving, and ONNX Runtime. The containers are orchestrated on Elastic Kubernetes Service (EKS) while logs and metrics are streamed to CloudWatch.

<br>
The CLI sends configuration and code to the cluster every time you run `cortex deploy`. Each model is loaded into a Docker container, along with any Python packages and request handling code. The model is exposed as a web service using Elastic Load Balancing \(ELB\), TensorFlow Serving, and ONNX Runtime. The containers are orchestrated on Elastic Kubernetes Service \(EKS\) while logs and metrics are streamed to CloudWatch.

## Examples

<!-- CORTEX_VERSION_README_MINOR x5 -->
- [Sentiment analysis](https://github.com/cortexlabs/cortex/tree/0.11/examples/tensorflow/sentiment-analyzer) in TensorFlow with BERT
- [Image classification](https://github.com/cortexlabs/cortex/tree/0.11/examples/tensorflow/image-classifier) in TensorFlow with Inception
- [Text generation](https://github.com/cortexlabs/cortex/tree/0.11/examples/pytorch/text-generator) in PyTorch with DistilGPT2
- [Reading comprehension](https://github.com/cortexlabs/cortex/tree/0.11/examples/pytorch/reading-comprehender) in PyTorch with ELMo-BiDAF
- [Iris classification](https://github.com/cortexlabs/cortex/tree/0.11/examples/sklearn/iris-classifier) in scikit-learn
* [Sentiment analysis](https://github.com/cortexlabs/cortex/tree/0.11/examples/tensorflow/sentiment-analyzer) in TensorFlow with BERT
* [Image classification](https://github.com/cortexlabs/cortex/tree/0.11/examples/tensorflow/image-classifier) in TensorFlow with Inception
* [Text generation](https://github.com/cortexlabs/cortex/tree/0.11/examples/pytorch/text-generator) in PyTorch with DistilGPT2
* [Reading comprehension](https://github.com/cortexlabs/cortex/tree/0.11/examples/pytorch/reading-comprehender) in PyTorch with ELMo-BiDAF
* [Iris classification](https://github.com/cortexlabs/cortex/tree/0.11/examples/sklearn/iris-classifier) in scikit-learn

3 changes: 2 additions & 1 deletion docs/cluster/aws.md → docs/cluster-management/aws.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,5 @@

As of now, Cortex only runs on AWS. We plan to support other cloud providers in the future. If you don't have an AWS account you can get started with one [here](https://portal.aws.amazon.com/billing/signup#/start).

Follow this [tutorial](https://aws.amazon.com/premiumsupport/knowledge-center/create-access-key) to create an access key. Enable programmatic access for the IAM user, and attach the built-in `AdministratorAccess` policy to your IAM user (or see [security](security.md) for a minimal access configuration).
Follow this [tutorial](https://aws.amazon.com/premiumsupport/knowledge-center/create-access-key) to create an access key. Enable programmatic access for the IAM user, and attach the built-in `AdministratorAccess` policy to your IAM user \(or see [security](security.md) for a minimal access configuration\).

5 changes: 2 additions & 3 deletions docs/cluster/config.md → docs/cluster-management/config.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@
# Cluster configuration

The Cortex cluster may be configured by providing a configuration file to `cortex cluster up` or `cortex cluster update` via the `--config` flag (e.g. `cortex cluster up --config=cluster.yaml`). Below is the schema for the cluster configuration file, with default values shown (unless otherwise specified):

<!-- CORTEX_VERSION_BRANCH_STABLE -->
The Cortex cluster may be configured by providing a configuration file to `cortex cluster up` or `cortex cluster update` via the `--config` flag \(e.g. `cortex cluster up --config=cluster.yaml`\). Below is the schema for the cluster configuration file, with default values shown \(unless otherwise specified\):

```yaml
# cluster.yaml
Expand Down Expand Up @@ -83,3 +81,4 @@ image_istio_pilot: cortexlabs/istio-pilot:0.11.0
image_istio_citadel: cortexlabs/istio-citadel:0.11.0
image_istio_galley: cortexlabs/istio-galley:0.11.0
```
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ If you are not using a sensitive AWS account and do not have a lot of experience

The operator requires read permissions for any S3 bucket containing exported models, read and write permissions for the Cortex S3 bucket, read and write permissions for the Cortex CloudWatch log group, and read and write permissions for CloudWatch metrics. The policy below may be used to restrict the Operator's access:

```json
```javascript
{
"Version": "2012-10-17",
"Statement": [
Expand Down Expand Up @@ -43,8 +43,9 @@ In order to connect to the operator via the CLI, you must provide valid AWS cred

## API access

By default, your Cortex APIs will be accessible to all traffic. You can restrict access using AWS security groups. Specifically, you will need to edit the security group with the description: "Security group for Kubernetes ELB <ELB name> (istio-system/apis-ingressgateway)".
By default, your Cortex APIs will be accessible to all traffic. You can restrict access using AWS security groups. Specifically, you will need to edit the security group with the description: "Security group for Kubernetes ELB \(istio-system/apis-ingressgateway\)".

## HTTPS

All APIs are accessible via HTTPS. The certificate is autogenerated during installation using `localhost` as the Common Name (CN). Therefore, clients will need to skip certificate verification (e.g. `curl -k`) when using HTTPS.
All APIs are accessible via HTTPS. The certificate is autogenerated during installation using `localhost` as the Common Name \(CN\). Therefore, clients will need to skip certificate verification \(e.g. `curl -k`\) when using HTTPS.

Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

1. [AWS credentials](aws.md)
2. [Docker](https://docs.docker.com/install)
3. [Cortex CLI](install.md)
3. [Cortex CLI](../install.md)
4. [AWS CLI](https://aws.amazon.com/cli)

## Uninstalling Cortex
Expand Down Expand Up @@ -34,3 +34,4 @@ aws s3 rb --force s3://<bucket-name>
# delete the log group
aws logs describe-log-groups --log-group-name-prefix=<log_group_name> --query logGroups[*].[logGroupName] --output text | xargs -I {} aws logs delete-log-group --log-group-name {}
```

3 changes: 1 addition & 2 deletions docs/cluster/update.md → docs/cluster-management/update.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,6 @@ cortex cluster update

## Upgrading to a newer version of Cortex

<!-- CORTEX_VERSION_MINOR -->

```bash
# spin down your cluster
cortex cluster down
Expand All @@ -30,3 +28,4 @@ cortex version
# spin up your cluster
cortex cluster up
```

27 changes: 14 additions & 13 deletions docs/development.md → docs/contributing/development.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
# Development Environment
# Development

## Prerequisites

1. Go (>=1.12.9)
1. Docker
1. eksctl
1. kubectl
1. Go \(&gt;=1.12.9\)
2. Docker
3. eksctl
4. kubectl

## Cortex Dev Environment

Expand Down Expand Up @@ -135,23 +135,24 @@ path/to/cortex/bin/cortex deploy
If you're making changes in the operator and want faster iterations, you can run an off-cluster operator.

1. `make operator-stop` to stop the in-cluster operator
1. `make devstart` to run the off-cluster operator (which rebuilds the CLI and restarts the Operator when files change)
1. `path/to/cortex/bin/cortex configure` (on a separate terminal) to configure your cortex CLI to use the off-cluster operator. When prompted for operator URL, use `http://localhost:8888`
2. `make devstart` to run the off-cluster operator \(which rebuilds the CLI and restarts the Operator when files change\)
3. `path/to/cortex/bin/cortex configure` \(on a separate terminal\) to configure your cortex CLI to use the off-cluster operator. When prompted for operator URL, use `http://localhost:8888`

Note: `make cortex-up-dev` will start Cortex without installing the operator.

If you want to switch back to the in-cluster operator:

1. `<ctrl+C>` to stop your off-cluster operator
1. `make operator-start` to install the operator in your cluster
1. `path/to/cortex/bin/cortex configure` to configure your cortex CLI to use the in-cluster operator. When prompted for operator URL, use the URL shown when running `make cortex-info`
2. `make operator-start` to install the operator in your cluster
3. `path/to/cortex/bin/cortex configure` to configure your cortex CLI to use the in-cluster operator. When prompted for operator URL, use the URL shown when running `make cortex-info`

## Dev Workflow

1. `make cortex-up-dev`
1. `make devstart`
1. Make changes
1. `make registry-dev`
1. Test your changes with projects in `examples` or your own
2. `make devstart`
3. Make changes
4. `make registry-dev`
5. Test your changes with projects in `examples` or your own

See `Makefile` for additional dev commands

Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

## PyPI packages

You can install your required PyPI packages and import them in your Python files. Cortex looks for a `requirements.txt` file in the top level Cortex project directory (i.e. the directory which contains `cortex.yaml`):
You can install your required PyPI packages and import them in your Python files. Cortex looks for a `requirements.txt` file in the top level Cortex project directory \(i.e. the directory which contains `cortex.yaml`\):

```text
./iris-classifier/
Expand All @@ -12,7 +12,7 @@ You can install your required PyPI packages and import them in your Python files
└── requirements.txt
```

Note that some packages are pre-installed by default (see [predictor](../deployments/predictor.md) or [request handlers](../deployments/request-handlers.md) depending on which runtime you're using).
Note that some packages are pre-installed by default \(see [predictor](../deployments/predictor.md) or [request handlers](../deployments/request-handlers.md) depending on which runtime you're using\).

## Private packages on GitHub

Expand All @@ -28,7 +28,7 @@ You can generate a personal access token by following [these steps](https://help

## Project files

Cortex makes all files in the project directory (i.e. the directory which contains `cortex.yaml`) available to request handlers. Python bytecode files (`*.pyc`, `*.pyo`, `*.pyd`), files or folders that start with `.`, and `cortex.yaml` are excluded.
Cortex makes all files in the project directory \(i.e. the directory which contains `cortex.yaml`\) available to request handlers. Python bytecode files \(`*.pyc`, `*.pyo`, `*.pyd`\), files or folders that start with `.`, and `cortex.yaml` are excluded.

The contents of the project directory is available in `/mnt/project/` in the API containers. For example, if this is your project directory:

Expand All @@ -53,3 +53,4 @@ def pre_inference(sample, signature, metadata):
print(config)
...
```

Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# System packages

Cortex uses Docker images to deploy your models. These images can be replaced with custom images that you can augment with your system packages and libraries. You will need to push your custom images to a container registry that your cluster has access to (e.g. [Docker Hub](https://hub.docker.com/) or [AWS ECR](https://aws.amazon.com/ecr/)).
Cortex uses Docker images to deploy your models. These images can be replaced with custom images that you can augment with your system packages and libraries. You will need to push your custom images to a container registry that your cluster has access to \(e.g. [Docker Hub](https://hub.docker.com/) or [AWS ECR](https://aws.amazon.com/ecr/)\).

See the `image paths` section in [cluster configuration](../cluster/config.md) for a complete list of customizable images.
See the `image paths` section in [cluster configuration](../cluster-management/config.md) for a complete list of customizable images.

## Create a custom image

Expand All @@ -14,7 +14,7 @@ mkdir my-api && cd my-api && touch Dockerfile

Specify the base image you want to override followed by your customizations. The sample Dockerfile below inherits from Cortex's Python serving image and installs the `tree` system package.

```dockerfile
```text
# Dockerfile
FROM cortexlabs/predictor-serve
Expand Down Expand Up @@ -79,3 +79,4 @@ def predict(sample, metadata):
subprocess.run(["tree"])
...
```

3 changes: 2 additions & 1 deletion docs/deployments/autoscaling.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,5 @@ Cortex adjusts the number of replicas that are serving predictions by monitoring

## Autoscaling Nodes

Cortex spins up and down nodes based on the aggregate resource requests of all APIs. The number of nodes will be at least `min_instances` and no more than `max_instances` (configured during installation and modifiable via `cortex cluster update` or the [AWS console](https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-manual-scaling.html)).
Cortex spins up and down nodes based on the aggregate resource requests of all APIs. The number of nodes will be at least `min_instances` and no more than `max_instances` \(configured during installation and modifiable via `cortex cluster update` or the [AWS console](https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-manual-scaling.html)\).

1 change: 1 addition & 0 deletions docs/cluster/cli.md → docs/deployments/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -186,3 +186,4 @@ Usage:
Flags:
-h, --help help for completion
```

10 changes: 5 additions & 5 deletions docs/deployments/compute.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,22 +11,22 @@ For example:
cpu: 1
gpu: 1
mem: 1G

```
CPU, GPU, and memory requests in Cortex correspond to compute resource requests in Kubernetes. In the example above, the API will only be scheduled once 1 CPU, 1GPU, and 1G of memory are available on any instance, and the deployment will be guaranteed to have access to those resources throughout its execution. In some cases, resource requests can be (or may default to) `Null`.
CPU, GPU, and memory requests in Cortex correspond to compute resource requests in Kubernetes. In the example above, the API will only be scheduled once 1 CPU, 1GPU, and 1G of memory are available on any instance, and the deployment will be guaranteed to have access to those resources throughout its execution. In some cases, resource requests can be \(or may default to\) `Null`.

## CPU

One unit of CPU corresponds to one virtual CPU on AWS. Fractional requests are allowed, and can be specified as a floating point number or via the "m" suffix (`0.2` and `200m` are equivalent).
One unit of CPU corresponds to one virtual CPU on AWS. Fractional requests are allowed, and can be specified as a floating point number or via the "m" suffix \(`0.2` and `200m` are equivalent\).

## Memory

One unit of memory is one byte. Memory can be expressed as an integer or by using one of these suffixes: `K`, `M`, `G`, `T` (or their power-of two counterparts: `Ki`, `Mi`, `Gi`, `Ti`). For example, the following values represent roughly the same memory: `128974848`, `129e6`, `129M`, `123Mi`.
One unit of memory is one byte. Memory can be expressed as an integer or by using one of these suffixes: `K`, `M`, `G`, `T` \(or their power-of two counterparts: `Ki`, `Mi`, `Gi`, `Ti`\). For example, the following values represent roughly the same memory: `128974848`, `129e6`, `129M`, `123Mi`.

## GPU

1. Make sure your AWS account is subscribed to the [EKS-optimized AMI with GPU Support](https://aws.amazon.com/marketplace/pp/B07GRHFXGM).
2. You may need to [file an AWS support ticket](https://console.aws.amazon.com/support/cases#/create?issueType=service-limit-increase&limitType=ec2-instances) to incease the limit for your desired instance type.
3. Set instance type to an AWS GPU instance (e.g. p2.xlarge) when installing Cortex.
3. Set instance type to an AWS GPU instance \(e.g. p2.xlarge\) when installing Cortex.
4. Note that one unit of GPU corresponds to one virtual GPU on AWS. Fractional requests are not allowed.

1 change: 1 addition & 0 deletions docs/deployments/deployments.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,4 @@ Deployments are used to group a set of APIs that are deployed together. It must
- kind: deployment
name: my_deployment
```
7 changes: 4 additions & 3 deletions docs/deployments/onnx.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ Deploy ONNX models as web services.
mem: <string> # memory request per replica (default: Null)
```
See [packaging ONNX models](../packaging/onnx.md) for information about exporting ONNX models.
See [packaging ONNX models](../packaging-models/onnx.md) for information about exporting ONNX models.
## Example
Expand All @@ -45,6 +45,7 @@ See [packaging ONNX models](../packaging/onnx.md) for information about exportin
You can log information about each request by adding a `?debug=true` parameter to your requests. This will print:

1. The raw sample
2. The value after running the `pre_inference` function (if provided)
2. The value after running the `pre_inference` function \(if provided\)
3. The value after running inference
4. The value after running the `post_inference` function (if provided)
4. The value after running the `post_inference` function \(if provided\)

1 change: 1 addition & 0 deletions docs/deployments/prediction-monitoring.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,3 +24,4 @@ For classification models, the tracker should be configured with `model_type: cl
tracker:
model_type: classification
```
Loading

0 comments on commit 2e2962a

Please sign in to comment.