Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Renamed all occurrences of script-languages-developer-sandbox by data-science-sandbox #13

Merged
merged 21 commits into from
Oct 23, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/check_ci.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: Run Unit Tests
name: Run Tests for CI Build

on:
push:
Expand All @@ -18,7 +18,7 @@ jobs:
uses: ./.github/actions/prepare_poetry_env

- name: Run pytest
run: poetry run pytest test/test_install_dependencies.py
run: poetry run pytest test/integration/test_install_dependencies.py
env: # Set the secret as an env variable
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_ACCESS_KEY_SECRET }}
Expand Down
99 changes: 78 additions & 21 deletions doc/developer_guide/developer_guide.md
ckunki marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -27,30 +27,30 @@ bash install.sh

The Data Science Sandbox (DSS) uses AWS as backend, because it provides the possibility to run the whole workflow during a ci-test.

This project uses
This project uses
- `boto3` to interact with AWS
- `pygithub` to interact with the Github releases
- `ansible-runner` to interact with Ansible.
Proxy classes to those projects are injected at the CLI layer. This allows to inject mock classes in the unit tests.
- `ansible-runner` to interact with Ansible.
Proxy classes to those projects are injected at the CLI layer. This allows to inject mock classes in the unit tests.
A CLI command has normally a respective function in the `lib` submodule. Hence, the CLI layer should not contain any logic, but invoke the respective library function only. Also, the proxy classes which abstract the dependant packages shall not contain too much logic. Ideally they should invoke only one function to the respective package.


## Commands

There are generally three types of commands:

| Type | Explanation |
| Type | Explanation |
| ----- | --------- |
| Release Commands | used during the release |
| Deployment Commands | used to deploy infrastructure onto AWS cloud |
| Development Commands | used to identify problems or for testing |
| Release Commands | used during the release |
| Deployment Commands | used to deploy infrastructure onto AWS cloud |
| Development Commands | used to identify problems or for testing |

### Release commands

The following commands are used during the release AWS Codebuild job:
- `create-vm` - creates a new AMI and VM images
- `update-release` - updates release notes of an existing Github release
- `start-release-build` - starts the release on AWS codebuild
- `start-release-build` - starts the release on AWS codebuild

### Developer commands

Expand All @@ -62,19 +62,19 @@ All other commands provide a subset of the features of the release commands, and
- `setup-ec2-and-install-dependencies` - starts a new EC2 instance and install dependencies via Ansible
- `show-aws-assets` - shows AWS entities associated with a specific keyword (called __asset-id__)
- `start-test-release` - starts a Test Release flow
- `make-ami-public` - Changes permissions of an existing AMI such that it becomes public
- `make-ami-public` - Changes permissions of an existing AMI such that it becomes public

### Deployment commands

The following commands can be used to deploy the infrastructure onto a given AWS account:
- `setup-ci-codebuild` - deploys the AWS Codebuild cloudformation stack which will run the ci-test
- `setup-vm-bucket` - deploys the AWS Bucket cloudformation stack which will be used to deploy the VM images
- `setup-release-codebuild` - deploys the AWS Codebuild cloudformation stack which will be used for the release-build
- `setup-vm-bucket-waf` - deploys the AWS Codebuild cloudformation stack which contains the WAF Acl configuration for the Cloudfront distribution of the VM Bucket
- `setup-vm-bucket-waf` - deploys the AWS Codebuild cloudformation stack which contains the WAF Acl configuration for the Cloudfront distribution of the VM Bucket

## Flow

The following diagram shows the high-level steps to generate the images:
The following diagram shows the high-level steps to generate the images:
![image info](./img/create-vm-overview.drawio.png)

### Setup EC2
Expand All @@ -94,6 +94,55 @@ Installs all dependencies via Ansible:
Finally, the default password will be set, and also the password will be marked as expired, such that the user will be forced to enter a new password during initial login.
Also, the ssh password authentication will be enabled, and for security reasons the folder "~/.ssh" will be removed.

### Tests

DSS comes with a number of tests in directory `test`.
There are subdirectories clustering tests with common scope and prerequisites e.g. external resources.

| Directory | Content |
|---------------------|---------|
| `test/unit` | Simple unit tests requiring no additional setup or external resources. |
| `test/integration` | Integration tests with longer runtime and some requiring additional resources. |
| `test/aws` | Tests involving AWS resources. In order to execute these tests you need an AWS account, a user with permissions in this account, and an access key. |

To run the tests in file `test/integration/test_ci.py` please use
```shell
export DSS_RUN_CI_TEST=true
poetry run test/integration/test_ci.py
```

#### Executing tests involving AWS resources

In AWS web interface, IAM create an access key for CLI usage and save or download the *access key id* and the *secret access key*.

In file `~/.aws/config` add lines
```
[profile dss_aws_tests]
region = eu-central-1
```

In file `~/.aws/credentials` add
```
[dss_aws_tests]
aws_access_key_id=...
aws_secret_access_key=...
```

From [product-integration-tool-chest](https://github.com/exasol/product-integration-tool-chest) call
ckunki marked this conversation as resolved.
Show resolved Hide resolved
```shell
aws-store-session-token dss_aws_tests
```

This will ask you for your MFA code and add or update profile `[dss_aws_tests_mfa]` in file `~/.aws/credentials`.

Now you can set an environment variable and execute the tests involing AWS resources:

```shell
export AWS_PROFILE=dss_aws_tests_mfa
poetry run pytest test/test_deploy_codebuild.py
```


### Export

The export creates an AMI based on the running EC2 instance and exports the AMI as VM image in the default formats to a S3 bucket.
Expand All @@ -107,10 +156,10 @@ The release is executed in a AWS Codebuild job, the following diagram shows the

The bucket has private access. In order to control access, the Bucket cloudformation stack also contains a Cloudfront distribution. Public Https access is only possibly through Cloudfront. Another stack contains a Web application firewall (WAF), which will be used by the Cloudfront distribution. Due to restrictions in AWS, the WAF stack needs to be deployed in region "us-east-1". The WAF stack provides two rules which aim to minimize a possible bot attack:

| Name | Explanation | Priority |
| Name | Explanation | Priority |
|----------------------|-----------------------------------------------------------------------------------------|----------|
| VMBucketRateLimit | Declares the minimum possible rate limit for access: 100 requests in a 5 min interval. | 0 |
| CAPTCHA | Forces a captcha action for any IP which does not matcha predefined set of IP-addresses | 1 |
| VMBucketRateLimit | Declares the minimum possible rate limit for access: 100 requests in a 5 min interval. | 0 |
| CAPTCHA | Forces a captcha action for any IP which does not matcha predefined set of IP-addresses | 1 |



Expand All @@ -119,18 +168,26 @@ The bucket has private access. In order to control access, the Bucket cloudforma
The following diagram shows the involved cloudformation stacks:
![image info](./img/cloudformation-stacks.drawio.png)

"DATA-SCIENCE-SANDBOX-VM-Bucket", "DATA-SCIENCE-SANDBOX-CI-TEST-CODEBUILD" & "DATA-SCIENCE-SANDBOX-RELEASE-CODEBUILD" are permanent and need to be deployed using the "deploy" commands (see [commands](#deployment-commands)).
The following resources are permanent and need to be deployed using the "deploy" [commands](#deployment-commands):
* `DATA-SCIENCE-SANDBOX-VM-Bucket`
* `DATA-SCIENCE-SANDBOX-CI-TEST-CODEBUILD`
* `DATA-SCIENCE-SANDBOX-RELEASE-CODEBUILD`

The EC2-stack lives only during the creation of a new sandbox image.

## Tagging

Each of the involved resources might cause costs: cloudformation stacks, AMI, EC2 key-pairs.
To enable you to keep track of all these resources, the implementation tags them after creation with a specific keyword (called __asset-id__).
The S3 objects are identified by the prefix in the S3 bucket. The command tags only the dynamically created entities with the asset-id but not the permanent cloudformation stacks.
You can use the command `show-aws-assets` to get a list of all assets which were created during the execution.
This is very useful if an error occured.
If the creation of a sandbox finished normally it is expected to have only the AMI, images (S3 objects) and the export tasks (one for each image) listed.

To enable keeping track of all these resources, the implementation tags them after creation with a specific keyword (called __asset-id__).

The S3 objects are identified by the prefix in the S3 bucket.

The command tags only the dynamically created entities with the *asset-id* but not the permanent cloudformation stacks.

The command `show-aws-assets` lists all assets which were created during the execution.
* This is very useful if an error occured.
* If the creation of a sandbox finished normally the list should contain only the AMI, images (S3 objects) and the export tasks (one for each image).

## How to contribute

Expand All @@ -139,4 +196,4 @@ The project has two types of CI tests:
- A system test which runs on a AWS Codebuild

Both ci tests need to pass before the approval of a Github PR.
The Github workflow will run on each push to a branch in the Github repository. However, the AWS Codebuild will only run after you push a commit containing the string "[CodeBuild]" in the commit message.
The Github workflow will run on each push to a branch in the Github repository. However, the AWS Codebuild will only run after you push a commit containing the string "[CodeBuild]" in the commit message.
2 changes: 1 addition & 1 deletion exasol/ds/sandbox/lib/vm_bucket/vm_dss_bucket_waf.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
from exasol.ds.sandbox.lib.logging import get_status_logger, LogType
from exasol.ds.sandbox.lib.render_template import render_template

STACK_NAME = "DATA-SCIENCE-SANDBOX-VM-SLC-Bucket-WAF"
STACK_NAME = "DATA-SCIENCE-SANDBOX-VM-Bucket-WAF"

LOG = get_status_logger(LogType.VM_BUCKET)

Expand Down
4 changes: 2 additions & 2 deletions exasol/ds/sandbox/templates/ec2_cloudformation.jinja.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ Resources:
ToPort: 8888
Tags:
- Key: "exasol:project"
Value: "ScriptLanguages"
Value: "DataScienceSandbox"
- Key: "exasol:owner"
Value: {{user_name}}
- Key: {{trace_tag}}
Expand All @@ -35,7 +35,7 @@ Resources:
VolumeSize: 100
Tags:
- Key: "exasol:project"
Value: "ScriptLanguages"
Value: "DataScienceSandbox"
- Key: "exasol:owner"
Value: {{user_name}}
- Key: {{trace_tag}}
Expand Down
5 changes: 5 additions & 0 deletions test/aws/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
"""
This package contains tests involving AWS resources. In order to execute
these tests you need an AWS account, a user with permissions in this account and
an access key.
"""
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

from exasol.ds.sandbox.lib.aws_access.aws_access import AwsAccess
from exasol.ds.sandbox.lib.render_template import render_template
from test.cloudformation_validation import validate_using_cfn_lint
from test.aws.cloudformation_validation import validate_using_cfn_lint


codebuild_cloudformation_templates = [
Expand Down
2 changes: 1 addition & 1 deletion test/test_deploy_ec2.py → test/aws/test_deploy_ec2.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
from exasol.ds.sandbox.lib.setup_ec2.cf_stack import CloudformationStack, \
CloudformationStackContextManager
from exasol.ds.sandbox.lib.tags import create_default_asset_tag
from test.cloudformation_validation import validate_using_cfn_lint
from test.aws.cloudformation_validation import validate_using_cfn_lint


def test_deploy_ec2_upload_invoked(ec2_cloudformation_yml, default_asset_id, test_dummy_ami_id):
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,9 @@
from exasol.ds.sandbox.lib.aws_access.aws_access import AwsAccess
from exasol.ds.sandbox.lib.vm_bucket.vm_dss_bucket import run_setup_vm_bucket, find_vm_bucket, \
create_vm_bucket_cf_template
from test.aws_mock_data import TEST_BUCKET_ID, get_waf_cloudformation_mock_data, TEST_ACL_ARN, \
from test.aws.aws_mock_data import TEST_BUCKET_ID, get_waf_cloudformation_mock_data, TEST_ACL_ARN, \
get_s3_cloudformation_mock_data
from test.cloudformation_validation import validate_using_cfn_lint
from test.aws.cloudformation_validation import validate_using_cfn_lint
from test.mock_cast import mock_cast
from exasol.ds.sandbox.lib.vm_bucket.vm_dss_bucket import STACK_NAME as VM_STACK_NAME

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@
from exasol.ds.sandbox.lib.aws_access.aws_access import AwsAccess
from exasol.ds.sandbox.lib.vm_bucket.vm_dss_bucket_waf import run_setup_vm_bucket_waf, \
find_acl_arn, get_cloudformation_template
from test.aws_mock_data import get_waf_cloudformation_mock_data, TEST_ACL_ARN
from test.cloudformation_validation import validate_using_cfn_lint
from test.aws.aws_mock_data import get_waf_cloudformation_mock_data, TEST_ACL_ARN
from test.aws.cloudformation_validation import validate_using_cfn_lint
from test.mock_cast import mock_cast

TEST_IP = "1.1.1.1"
Expand Down
2 changes: 1 addition & 1 deletion test/test_export_vm.py → test/aws/test_export_vm.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
build_image_destination
from exasol.ds.sandbox.lib.export_vm.run_export_vm import export_vm
from exasol.ds.sandbox.lib.export_vm.vm_disk_image_format import VmDiskImageFormat
from test.aws_mock_data import get_ami_image_mock_data, TEST_AMI_ID, TEST_ROLE_ID, TEST_BUCKET_ID, INSTANCE_ID, \
from test.aws.aws_mock_data import get_ami_image_mock_data, TEST_AMI_ID, TEST_ROLE_ID, TEST_BUCKET_ID, INSTANCE_ID, \
get_export_image_task_mock_data, get_s3_cloudformation_mock_data, get_waf_cloudformation_mock_data
from test.mock_cast import mock_cast

Expand Down
4 changes: 4 additions & 0 deletions test/integration/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
"""
This package contains integration tests with longer runtime and maybe
requiring additional resources.
"""
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
CloudformationStackContextManager
from exasol.ds.sandbox.lib.setup_ec2.run_setup_ec2 import run_lifecycle_for_ec2
from exasol.ds.sandbox.lib.tags import create_default_asset_tag
from test.aws_local_stack_access import AwsLocalStackAccess
from test.integration.aws_local_stack_access import AwsLocalStackAccess


def test_ec2_lifecycle_with_local_stack(local_stack, default_asset_id, test_dummy_ami_id):
Expand Down Expand Up @@ -71,7 +71,7 @@ def test_validate_cloudformation_template_fails_with_local_stack(local_stack):
ToPort: 22
Tags:
- Key: "exasol:project"
Value: "ScriptLanguages"
Value: "DataScienceSandbox"
- Key: "exasol:owner"
Value: "test user"

Expand All @@ -90,7 +90,7 @@ def test_validate_cloudformation_template_fails_with_local_stack(local_stack):
VolumeSize: 100
Tags:
- Key: "exasol:project"
Value: "ScriptLanguages"
Value: "DataScienceSandbox"
- Key: "exasol:owner"
Value: "test"

Expand Down
File renamed without changes.
4 changes: 4 additions & 0 deletions test/unit/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
"""
This package contains unit tests requiring no additional setup or external
resources.
"""
4 changes: 2 additions & 2 deletions test/test_ansible.py → test/unit/test_ansible.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
from exasol.ds.sandbox.lib.setup_ec2.run_install_dependencies import run_install_dependencies

import test.ansible
import test.ansible_conflict
import test.unit.ansible_conflict


class AnsibleTestAccess:
Expand Down Expand Up @@ -129,7 +129,7 @@ def test_run_ansible_check_multiple_repositories_with_same_content_causes_except
"""
Test that multiple repositories containing same files raises an runtime exception.
"""
test_repositories = default_repositories + (AnsibleResourceRepository(test.ansible_conflict),)
test_repositories = default_repositories + (AnsibleResourceRepository(test.unit.ansible_conflict),)
with pytest.raises(RuntimeError):
run_install_dependencies(AnsibleTestAccess(), test_config, host_infos=tuple(),
ansible_run_context=default_ansible_run_context,
Expand Down
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ def motd_file(tmp_path):
jupyter_server_config_file = tmp_path / "jupyter_server_config.json"
python_file = tmp_path / "999_jupyter.py"

src_path = Path(__file__).parent.parent / "exasol.ds.sandbox" / "runtime" / \
src_path = Path(__file__).parent.parent / "exasol" / "ds" / "sandbox" / "runtime" / \
"ansible" / "roles" / "jupyter" / "templates" / "etc" /"update-motd.d" / "999-jupyter"
with open(src_path, "r") as f:
python_code_template = f.read()
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Loading