Skip to content

Commit

Permalink
Merge branch 'main' into mibe-sagemaker-notebooks
Browse files Browse the repository at this point in the history
  • Loading branch information
ahsimb authored Nov 2, 2023
2 parents e294347 + d040a56 commit f1307d5
Show file tree
Hide file tree
Showing 189 changed files with 2,984 additions and 812 deletions.
6 changes: 3 additions & 3 deletions .github/workflows/check_ci.yaml
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
name: Run Unit Tests
name: Run Tests for CI Build

on:
push:
branches-ignore:
- "main"

jobs:
run_unit_tests:
run_ci_tests:
environment: AWS_CI_TESTS
runs-on: ubuntu-latest

Expand All @@ -18,7 +18,7 @@ jobs:
uses: ./.github/actions/prepare_poetry_env

- name: Run pytest
run: poetry run pytest test/test_install_dependencies.py
run: poetry run pytest test/ci/test_install_dependencies.py
env: # Set the secret as an env variable
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_ACCESS_KEY_SECRET }}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ jobs:
- name: Setup Python & Poetry Environment
uses: ./.github/actions/prepare_poetry_env
- name: Build Release
run: poetry run python3 -m exasol_script_languages_developer_sandbox.main start-release-build --upload-url "${{ github.event.inputs.upload_url }}" --branch "$GITHUB_REF"
run: poetry run python3 -m exasol.ds.sandbox.main start-release-build --upload-url "${{ github.event.inputs.upload_url }}" --branch "$GITHUB_REF"
env: # Set the secret as an env variable
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_ACCESS_KEY_SECRET }}
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/test_release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,9 @@ jobs:
uses: ./.github/actions/prepare_poetry_env

- name: Start test release
run: poetry run python3 -m exasol_script_languages_developer_sandbox.main start-test-release-build --release-title "${{ github.event.inputs.release_title }}" --branch "$GITHUB_REF"
run: poetry run python3 -m exasol.ds.sandbox.main start-test-release-build --release-title "${{ github.event.inputs.release_title }}" --branch "$GITHUB_REF"
env: # Set the secret as an env variable
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_ACCESS_KEY_SECRET }}
AWS_DEFAULT_REGION: ${{ secrets.AWS_REGION }}
GITHUB_TOKEN: ${{ github.token }}
GITHUB_TOKEN: ${{ github.token }}
7 changes: 4 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,13 @@
# Exasol Data Science Sandbox

# Overview
## Overview

This project provides an automated mechanism to build and export virtual machines images (AWS AMI, VMDK,...)
which can be used to develop new script-languages container for the Exasol DB.
enabling users to try out data science algorithms in Jupyter notebooks connected to the Exasol database.

## Where to find the VM images

The release process will automatically store the links to the images in the [release notes](https://github.com/exasol/script-languages-developer-sandbox/releases/latest), as there will be a specific AMI per release.
The release process will automatically store the links to the images in the [release notes](https://github.com/exasol/data-science-sandbox/releases/latest), as there will be a specific AMI per release.
Please check the user guide about details of the image.

## Links
Expand Down
6 changes: 3 additions & 3 deletions aws-code-build/ci/buildspec.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ version: 0.2
env:
shell: bash
variables:
RUN_DEVELOPER_SANDBOX_CI_TEST: "true"
DSS_RUN_CI_TEST: "true"
AWS_USER_NAME: "ci_user"

phases:
Expand All @@ -20,7 +20,7 @@ phases:

pre_build:
commands:
- echo RUN_DEVELOPER_SANDBOX_CI_TEST is "$RUN_DEVELOPER_SANDBOX_CI_TEST" #supposed to be true
- echo DSS_RUN_CI_TEST is "$DSS_RUN_CI_TEST" #supposed to be true
build:
commands:
- poetry run python3 -m pytest -s test/test_ci*.py
- poetry run python3 -m pytest -s test/ci/test_ci*.py
6 changes: 3 additions & 3 deletions aws-code-build/ci/buildspec_release.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ version: 0.2
env:
shell: bash
variables:
DEFAULT_PASSWORD: "scriptlanguages"
DEFAULT_PASSWORD: "dss"
ASSET_ID: ""
AWS_USER_NAME: "release_user"
MAKE_AMI_PUBLIC_OPTION: "--no-make-ami-public"
Expand All @@ -27,5 +27,5 @@ phases:
- echo MAKE_AMI_PUBLIC_OPTION is "$MAKE_AMI_PUBLIC_OPTION"
build:
commands:
- poetry run python3 -m exasol_script_languages_developer_sandbox.main create-vm --default-password "$DEFAULT_PASSWORD" --asset-id "$ASSET_ID" $MAKE_AMI_PUBLIC_OPTION
- poetry run python3 -m exasol_script_languages_developer_sandbox.main update-release --release-id "$RELEASE_ID" --asset-id "$ASSET_ID"
- poetry run python3 -m exasol.ds.sandbox.main create-vm --default-password "$DEFAULT_PASSWORD" --asset-id "$ASSET_ID" $MAKE_AMI_PUBLIC_OPTION
- poetry run python3 -m exasol.ds.sandbox.main update-release --release-id "$RELEASE_ID" --asset-id "$ASSET_ID"
1 change: 0 additions & 1 deletion doc/changes/changelog.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
# Changes

* [0.2.0](changes_0.2.0.md)
* [0.1.0](changes_0.1.0.md)
47 changes: 12 additions & 35 deletions doc/changes/changes_0.1.0.md
Original file line number Diff line number Diff line change
@@ -1,47 +1,24 @@
# script-languages-developer-sandbox 0.1.0, released 2022-10-06
# data-science-sandbox 0.1.0, released 2023-11-02

Code name: Initial release

## Summary

Initial release of the script-languages-developer-sandbox. It provides the creation of a developer sandbox AMI and virtual machine images for a specific version of the script-languages-release project.
Initial release of the data-science-sandbox. It provides the creation of an Amazon Machone Image (AMI) and virtual machine images for a specific version of the data-science-sanbox-release project.

## Script-Languages-Release
## Data-Science-Sandbox-Release

Version: 5.0.0
Version: 0.1.0

## Features

- #11: Created a notebook to show training with scikit-learn in the notebook
- #15: Installed exasol-notebook-connector via ansible

## Bug Fixes

- #18: Fixed network connection
- #51: Fixed network connection
- #57: Fixed release build
- #60: Fixed release build command
- #64: Fixed Netplan file name

## Features / Enhancements

- #2: Implemented launch of an EC2 instance
- #3: Installed SLC dependencies via Ansible
- #4: Implemented deployment and access of S3 Bucket for VM's
- #5: Implemented export of VM's
- #24: Move CI test to AWS Codebuild
- #25: Implemented motd message about Jupyter password change
- #8: Implemented a release workflow
- #36: Added make-ami-public option
- #43: Added CDN to the S3 VM Bucket
- #45: Protected cloudfront access
- #47: Renamed virtual images
- #38: Included tutorial Jupyterlab notebook

## Documentation

- #19: Added user guide and developer guide
- #49: Added tutorial about how to start the VM/AMI

- #1: Fixed CI build

## Refactoring

- #22: Improved logging
- #26: Implemented search for latest AMI
- #12: Updated the script-languages-release tag with the correct version
- #21: Minor refactoring tasks
- #41: Renamed cloudformation stack
- #5: Renamed all occurrences of "script language developer" by "data science"
32 changes: 0 additions & 32 deletions doc/changes/changes_0.2.0.md

This file was deleted.

99 changes: 75 additions & 24 deletions doc/developer_guide/developer_guide.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Script-Languages-Developer-Sandbox Developer Guide
# Data Science Sandbox Developer Guide

## Overview

Expand All @@ -25,32 +25,32 @@ bash install.sh

## Design Goals

script-languages-developer-sandbox uses AWS as backend, because it provides the possibility to run the whole workflow during a ci-test.
The Data Science Sandbox (DSS) uses AWS as backend, because it provides the possibility to run the whole workflow during a ci-test.

This project uses
This project uses
- `boto3` to interact with AWS
- `pygithub` to interact with the Github releases
- `ansible-runner` to interact with Ansible.
Proxy classes to those projects are injected at the CLI layer. This allows to inject mock classes in the unit tests.
- `ansible-runner` to interact with Ansible.
Proxy classes to those projects are injected at the CLI layer. This allows to inject mock classes in the unit tests.
A CLI command has normally a respective function in the `lib` submodule. Hence, the CLI layer should not contain any logic, but invoke the respective library function only. Also, the proxy classes which abstract the dependant packages shall not contain too much logic. Ideally they should invoke only one function to the respective package.


## Commands

There are generally three types of commands:

| Type | Explanation |
| Type | Explanation |
| ----- | --------- |
| Release Commands | used during the release |
| Deployment Commands | used to deploy infrastructure onto AWS cloud |
| Development Commands | used to identify problems or for testing |
| Release Commands | used during the release |
| Deployment Commands | used to deploy infrastructure onto AWS cloud |
| Development Commands | used to identify problems or for testing |

### Release commands

The following commands are used during the release AWS Codebuild job:
- `create-vm` - creates a new AMI and VM images
- `update-release` - updates release notes of an existing Github release
- `start-release-build` - starts the release on AWS codebuild
- `start-release-build` - starts the release on AWS codebuild

### Developer commands

Expand All @@ -62,19 +62,19 @@ All other commands provide a subset of the features of the release commands, and
- `setup-ec2-and-install-dependencies` - starts a new EC2 instance and install dependencies via Ansible
- `show-aws-assets` - shows AWS entities associated with a specific keyword (called __asset-id__)
- `start-test-release` - starts a Test Release flow
- `make-ami-public` - Changes permissions of an existing AMI such that it becomes public
- `make-ami-public` - Changes permissions of an existing AMI such that it becomes public

### Deployment commands

The following commands can be used to deploy the infrastructure onto a given AWS account:
- `setup-ci-codebuild` - deploys the AWS Codebuild cloudformation stack which will run the ci-test
- `setup-vm-bucket` - deploys the AWS Bucket cloudformation stack which will be used to deploy the VM images
- `setup-release-codebuild` - deploys the AWS Codebuild cloudformation stack which will be used for the release-build
- `setup-vm-bucket-waf` - deploys the AWS Codebuild cloudformation stack which contains the WAF Acl configuration for the Cloudfront distribution of the VM Bucket
- `setup-vm-bucket-waf` - deploys the AWS Codebuild cloudformation stack which contains the WAF Acl configuration for the Cloudfront distribution of the VM Bucket

## Flow

The following diagram shows the high-level steps to generate the images:
The following diagram shows the high-level steps to generate the images:
![image info](./img/create-vm-overview.drawio.png)

### Setup EC2
Expand All @@ -94,6 +94,49 @@ Installs all dependencies via Ansible:
Finally, the default password will be set, and also the password will be marked as expired, such that the user will be forced to enter a new password during initial login.
Also, the ssh password authentication will be enabled, and for security reasons the folder "~/.ssh" will be removed.

### Tests

DSS comes with a number of tests in directory `test`.
There are subdirectories clustering tests with common scope and prerequisites e.g. external resources.

| Directory | Content |
|---------------------|---------|
| `test/unit` | Simple unit tests requiring no additional setup or external resources. |
| `test/integration` | Integration tests with longer runtime and some requiring additional resources. |
| `test/aws` | Tests involving AWS resources. In order to execute these tests you need an AWS account, a user with permissions in this account, and an access key. |

To run the tests in file `test/integration/test_ci.py` please use
```shell
export DSS_RUN_CI_TEST=true
poetry run test/integration/test_ci.py
```

#### Executing tests involving AWS resources

In AWS web interface, IAM create an access key for CLI usage and save or download the *access key id* and the *secret access key*.

In file `~/.aws/config` add lines
```
[profile dss_aws_tests]
region = eu-central-1
```

In file `~/.aws/credentials` add
```
[dss_aws_tests]
aws_access_key_id=...
aws_secret_access_key=...
```

In case your are using MFA authentication please allocate a temporary token.

After that you can set an environment variable and execute the tests involving AWS resources:

```shell
export AWS_PROFILE=dss_aws_tests_mfa
poetry run pytest test/test_deploy_codebuild.py
```

### Export

The export creates an AMI based on the running EC2 instance and exports the AMI as VM image in the default formats to a S3 bucket.
Expand All @@ -107,10 +150,10 @@ The release is executed in a AWS Codebuild job, the following diagram shows the

The bucket has private access. In order to control access, the Bucket cloudformation stack also contains a Cloudfront distribution. Public Https access is only possibly through Cloudfront. Another stack contains a Web application firewall (WAF), which will be used by the Cloudfront distribution. Due to restrictions in AWS, the WAF stack needs to be deployed in region "us-east-1". The WAF stack provides two rules which aim to minimize a possible bot attack:

| Name | Explanation | Priority |
| Name | Explanation | Priority |
|----------------------|-----------------------------------------------------------------------------------------|----------|
| VMBucketRateLimit | Declares the minimum possible rate limit for access: 100 requests in a 5 min interval. | 0 |
| CAPTCHA | Forces a captcha action for any IP which does not matcha predefined set of IP-addresses | 1 |
| VMBucketRateLimit | Declares the minimum possible rate limit for access: 100 requests in a 5 min interval. | 0 |
| CAPTCHA | Forces a captcha action for any IP which does not matcha predefined set of IP-addresses | 1 |



Expand All @@ -119,18 +162,26 @@ The bucket has private access. In order to control access, the Bucket cloudforma
The following diagram shows the involved cloudformation stacks:
![image info](./img/cloudformation-stacks.drawio.png)

"DEVELOPER-SANDBOX-VM-SLC-BUCKET", "DEVELOPER-SANDBOX-CI-TEST-CODEBUILD" & "DEVELOPER-SANDBOX-RELEASE-CODEBUILD" are permanent and need to be deployed using the "deploy" commands (see [commands](#deployment-commands)).
The EC2-stack lives only during the creation of a new developer sandbox image.
The following resources are permanent and need to be deployed using the "deploy" [commands](#deployment-commands):
* `DATA-SCIENCE-SANDBOX-VM-Bucket`
* `DATA-SCIENCE-SANDBOX-CI-TEST-CODEBUILD`
* `DATA-SCIENCE-SANDBOX-RELEASE-CODEBUILD`

The EC2-stack lives only during the creation of a new sandbox image.

## Tagging

Each of the involved resources might cause costs: cloudformation stacks, AMI, EC2 key-pairs.
To enable you to keep track of all these resources, the implementation tags them after creation with a specific keyword (called __asset-id__).
The S3 objects are identified by the prefix in the S3 bucket. The command tags only the dynamically created entities with the asset-id but not the permanent cloudformation stacks.
You can use the command `show-aws-assets` to get a list of all assets which were created during the execution.
This is very useful if an error occured.
If the creation of a sandbox finished normally it is expected to have only the AMI, images (S3 objects) and the export tasks (one for each image) listed.

To enable keeping track of all these resources, the implementation tags them after creation with a specific keyword (called __asset-id__).

The S3 objects are identified by the prefix in the S3 bucket.

The command tags only the dynamically created entities with the *asset-id* but not the permanent cloudformation stacks.

The command `show-aws-assets` lists all assets which were created during the execution.
* This is very useful if an error occured.
* If the creation of a sandbox finished normally the list should contain only the AMI, images (S3 objects) and the export tasks (one for each image).

## How to contribute

Expand All @@ -139,4 +190,4 @@ The project has two types of CI tests:
- A system test which runs on a AWS Codebuild

Both ci tests need to pass before the approval of a Github PR.
The Github workflow will run on each push to a branch in the Github repository. However, the AWS Codebuild will only run after you push a commit containing the string "[CodeBuild]" in the commit message.
The Github workflow will run on each push to a branch in the Github repository. However, the AWS Codebuild will only run after you push a commit containing the string "[CodeBuild]" in the commit message.
Loading

0 comments on commit f1307d5

Please sign in to comment.