Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#144 updated the user guide, prepared the release #145

Merged
merged 2 commits into from
Oct 15, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions doc/changes/changes_0.11.0.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,13 @@
# SageMaker Extension 0.11.0, released T.B.D.

Code name: T.B.D.
Simplified installation.

## Summary

T.B.D.
Using a single installation command. Using the pytest plugins for testing.

### Refactoring

* #140: Used the pytest plugins for testing.
* #142: Made a unified deployment CLI command.
* #144: Updated the installation section of the user guide.
109 changes: 31 additions & 78 deletions doc/user_guide/user_guide.md
Original file line number Diff line number Diff line change
@@ -4,10 +4,9 @@ Exasol Sagemaker Extension provides a Python library together with Exasol Script
and UDFs that train Machine Learning Models on data stored in Exasol using AWS SageMaker
Autopilot service.

The extension basically exports a given Exasol table into AWS S3, and then triggers
Machine Learning training using the AWS Autopilot service with the specified parameters.
In addition, the training status can be polled using the auxiliary scripts provided
within the scope of the project.
The extension exports a given Exasol table into AWS S3, and then triggers Machine Learning training
using the AWS Autopilot service with the specified parameters. The training status can be polled using
the auxiliary scripts provided within the scope of the project.

## Table of Contents

@@ -25,82 +24,38 @@ within the scope of the project.


## Installation
### Install The Built Archive
- Install the packaged sagemaker-extension project as follows (Please check [the latest release](https://github.com/exasol/sagemaker-extension/releases/latest)):
```buildoutcfg
pip install exasol_sagemaker_extension.whl
```
### The Pre-built Language Container

This extension requires the installation of a Language Container in the Exasol Database.
The Script Language Container is a way to install the required programming language and
necessary dependencies in the Exasol Database so the UDF scripts can be executed.

The Language Container is downloaded and installed by executing the
deployment script below. Please make sure that the version of the Language Container matches the
installed version of the Sagemaker Extension Package. See [the latest release](https://github.com/exasol/sagemaker-extension/releases) on Github.

```buildoutcfg
python -m exasol_sagemaker_extension.deploy language-container <options>
```

Please refer to the [Language Container Deployment Guide](https://github.com/exasol/python-extension-common/blob/main/doc/user_guide/user-guide.md#language-container-deployer) for details about this command.

### Scripts Deployment
Deploy all necessary scripts to the specified ```SCHEMA``` in Exasol using the following python cli command:
### Install the Python Package
The Sagemaker Extension package can be installed using pip:
```shell
pip install exasol-sagemaker-extension
```

```buildoutcfg
python -m exasol_sagemaker_extension.deployment.deploy_cli <options>
### Deploy the Extension to the Database
The Sagemaker Extension must be deployed to the database using the following command:
```shell
python -m exasol_sagemaker_extension.deploy <options>
```

The choice of options is primarily determined by the storage backend being used - On-Prem or SaaS.

### List of options

The table below lists all available options. It shows which ones are applicable for On-Prem and for SaaS backends.
Unless stated otherwise in the comments column, the option is required for either or both backends.

Some of the values, like passwords, are considered confidential. For security reasons, it is recommended to store
those values in environment variables instead of providing them in the command line. The names of the environment
variables are given in the comments column, where applicable. Alternatively, it is possible to put just the name of
an option in the command line, without providing its value. In this case, the command will prompt to enter the value
interactively. For long values, such as the SaaS account id, it is more practical to copy/paste the value from
another source.

| Option name | On-Prem | SaaS | Comment |
|:-----------------------------|:-------:|:----:|:-------------------------------------------------------|
| dsn | [x] | | i.e. <db_host:db_port> |
| db-user | [x] | | |
| db-pass | [x] | | Env. [DB_PASSWORD] |
| saas-url | | [x] | Optional, Env. [SAAS_HOST] |
| saas-account-id | | [x] | Env. [SAAS_ACCOUNT_ID] |
| saas-database-id | | [x] | Optional, Env. [SAAS_DATABASE_ID] |
| saas-database-name | | [x] | Optional, provide if the database_id is unknown |
| saas-token | | [x] | Env. [SAAS_TOKEN] |
| schema | [x] | [x] | DB schema to deploy the scripts in |
| ssl-cert-path | [x] | [x] | Optional |
| [no_]use-ssl-cert-validation | [x] | [x] | Optional boolean, defaults to True |
| ssl-client-cert-path | [x] | | Optional |
| ssl-client-private-key | [x] | | Optional |
| develop | [x] | [x] | Optional, if True, causes re-generation of the scripts |
| verbose | [x] | [x] | Optional, if True produces verbose output |

### TLS/SSL options

The `--ssl-cert-path` is needed if the TLS/SSL certificate is not in the OS truststore.
Generally speaking, this certificate is a list of trusted CA. It is needed for the server's certificate
validation by the client.
The option `--use-ssl-cert-validation`is the default, it can be disabled with `--no-use-ssl-cert-validation`.
One needs to exercise caution when turning the certificate validation off as it potentially lowers the security of the
Database connection.
The "server" certificate described above shall not be confused with the client's own certificate.
In some cases, this certificate may be requested by a server. The client certificate may or may not include
the private key. In the latter case, the key may be provided as a separate file.

### AWS Connection Object
- Create an Exasol connection object with AWS credentials that has
AWS Sagemaker Execution permission. The connection will encapsulate the address of the AWS S3 bucket where the exported data will be stored.
The deployment includes the installation of the Script Language Container (SLC) and several
scripts. The SLC is a way to install the required programming language and necessary dependencies
in the Exasol Database so that UDF scripts can be executed. The version of the installed SLC must
match the version of the Sagemaker Extension Package. See [the latest release](https://github.com/exasol/sagemaker-extension/releases) on Github.

For information about the available options common to all Exasol extensions please refer to the
[documentation](https://github.com/exasol/python-extension-common/blob/0.8.0/doc/user_guide/user-guide.md)
in the Exasol Python Extension Common package.
In addition, this extension provides the following installation options:

| Option name | Default | Comment |
|:--------------------|:-------:|:-----------------------------------------------|
| [no-]deploy-slc | True | Install SLC as part of the deployment |
| [no-]deploy-scripts | True | Install scripts as part of the deployment |
| [no-]to-print | False | Print SQL statements instead of executing them |

### Create AWS Connection Object
Create an Exasol connection object with AWS credentials that have AWS Sagemaker Execution permission.
The connection will encapsulate the address of the AWS S3 bucket where the exported data will be stored.
For more information please check the [Create Connection in Exasol](https://docs.exasol.com/sql/create_connection.htm?Highlight=connection) document.
Below is a template of the query that will create the required connection object.
```buildoutcfg
@@ -110,8 +65,6 @@ Below is a template of the query that will create the required connection object
IDENTIFIED BY '<AWS_SECRET_ACCESS_KEY>'
```



## Execution of Training
### Execute Autopilot Training
- Example usage of the AWS Sagemaker Autopilot service in Exasol is as follows.
4 changes: 0 additions & 4 deletions exasol_sagemaker_extension/deploy.py
Original file line number Diff line number Diff line change
@@ -29,10 +29,6 @@
def deploy(deploy_slc: bool, deploy_scripts: bool, **kwargs):

if deploy_slc:
# Workaround for the issue#78 in PEC
if StdParams.path_in_bucket.name in kwargs and kwargs[StdParams.path_in_bucket.name] is None:
kwargs[StdParams.path_in_bucket.name] = ''

slc_deployer = LanguageContainerDeployerCli(
container_url_arg=CONTAINER_URL_ARG,
container_name_arg=CONTAINER_NAME_ARG)
105 changes: 53 additions & 52 deletions poetry.lock
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -29,7 +29,7 @@ sagemaker = "^2.59.1"
pyexasol = ">=0.26.0,<1"
importlib-resources = "^6.4.0"
click = "^8.0.3"
exasol-python-extension-common = ">=0.7.0,<1"
exasol-python-extension-common = ">=0.8.0,<1"

[tool.poetry.dev-dependencies]
pytest = "^7.1"
11 changes: 3 additions & 8 deletions tests/integration_tests/deployment/test_deploy_cli.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
from click.testing import CliRunner
from exasol.python_extension_common.cli.std_options import StdParams
from exasol.python_extension_common.cli.std_options import StdParams, get_cli_arg

from exasol_sagemaker_extension.deploy import deploy_command
from exasol_sagemaker_extension.deployment.language_container import export_slc
@@ -40,14 +40,9 @@ def test_deploy_cli(pyexasol_connection, cli_args):

pyexasol_connection.execute(f'CREATE SCHEMA IF NOT EXISTS "{DB_SCHEMA}"')

def std_param_to_opt(std_param: StdParams) -> str:
# This function should have been implemented in the StdParams
return f'--{std_param.name.replace("_", "-")}'

with export_slc() as container_file:
args_string = (f'{cli_args} '
f'{std_param_to_opt(StdParams.schema)} "{DB_SCHEMA}" '
f'{std_param_to_opt(StdParams.container_file)} "{container_file}"')
args_string = ' '.join([cli_args, get_cli_arg(StdParams.schema, DB_SCHEMA),
get_cli_arg(StdParams.container_file, container_file)])
runner = CliRunner()
result = runner.invoke(deploy_command, args=args_string, catch_exceptions=False)
assert result.exit_code == 0