Skip to content

Commit

Permalink
SNOW-1527866: [3/x] Improvements to the Polaris CLI (apache#21)
Browse files Browse the repository at this point in the history
<!-- Please describe your change here and remove this comment -->
This PR covers various improvements to the [recently
merged](https://github.com/snowflakedb/managed-pull/7) Polaris
CLI. Changes here include:

* Added the `namespaces` command for creating, listing, and dropping
namespaces with the CLI
* Refactored error handling to reduce the proliferation of string
literals
* Added support for the `FILE` storage type
* Added end-to-end regression tests for the CLI
* Various usability and bug fixes
* Support for the `CLIENT_ID` and `CLIENT_SECRET` environment variables
* CLI documentation

## Pre-review checklist
- [ ] I attest that this change meets the bar for low risk without
security requirements as defined in the [Accelerated Risk Assessment
Criteria](https://developer-handbook.m1.us-west-2.aws.app.snowflake.com/docs/reference/security-review/accelerated-risk-assessment/#eligibility)
and I have taken the [Risk Assessment Training in
Workday](https://wd5.myworkday.com/snowflake/learning/course/6c613806284a1001f111fedf3e4e0000).
- Checking this checkbox is mandatory if using the [Accelerated Risk
Assessment](https://developer-handbook.m1.us-west-2.aws.app.snowflake.com/docs/reference/security-review/accelerated-risk-assessment/)
to risk assess the changes in this Pull Request.
- If this change does not meet the bar for low risk without security
requirements (as confirmed by the peer reviewers of this pull request)
then a [formal Risk
Assessment](https://developer-handbook.m1.us-west-2.aws.app.snowflake.com/docs/reference/security-review/risk-assessment/)
must be completed. Please note that a formal Risk Assessment will
require you to spend extra time performing a security review for this
change. Please account for this extra time earlier rather than later to
avoid unnecessary delays in the release process.
- [ ] This change has code coverage for the new code added
  • Loading branch information
eric-maynard committed Jul 31, 2024
1 parent bb54423 commit cfc71e8
Show file tree
Hide file tree
Showing 19 changed files with 1,844 additions and 157 deletions.
1,067 changes: 1,067 additions & 0 deletions docs/command-line-interface.md

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/entities.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ For details on how to use Storage Types in the REST API, see [the API docs](../r

A namespace is a logical entity that resides within a [catalog](#catalog) and can contain other entities such as [tables](#table) or [views](#view). Some other systems may refer to namespaces as _schemas_ or _databases_.

In Polaris, namespaces can be nested up to 16 levels. For example, `a.b.c.d.e.f.g` is a valid namespace. `b` is said to reside within `a`, and so on.
In Polaris, namespaces can be nested. For example, `a.b.c.d.e.f.g` is a valid namespace. `b` is said to reside within `a`, and so on.

For information on managing namespaces with the REST API or for more information on what data can be associated with a namespace, see [the API docs](../regtests/client/python/docs/CreateNamespaceRequest.md).

Expand Down
30 changes: 13 additions & 17 deletions docs/quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@

# Quick Start

This guide serves as a introduction to several key entities that can be managed with Polaris, describes how to build and deploy Polaris locally, and finally includes examples of how to use Polaris with Spark and Trino.
This guide serves as a introduction to several key entities that can be managed with Polaris, describes how to build and deploy Polaris locally, and finally includes examples of how to use Polaris with Apache Spark.

## Prerequisites

Expand All @@ -39,23 +39,19 @@ git clone https://github.com/polaris-catalog/polaris.git

#### With Docker

If you plan to deploy Polaris inside [Docker](https://www.docker.com/)], you'll need to install docker itself. For can be done using [homebrew](https://brew.sh/):
If you plan to deploy Polaris inside [Docker](https://www.docker.com/), you'll need to install docker itself. For example, this can be done using [homebrew](https://brew.sh/):

```
brew install docker
brew install --cask docker
```

Once installed, make sure Docker is running. This can be done on macOS with:

```
open -a Docker
```
Once installed, make sure Docker is running.

#### From Source

If you plan to build Polaris from source yourself, you will need to satisfy a few prerequisites first.

Polaris is built using [gradle](https://gradle.org/) and is compatible with Java 21. We recommend the use of [jenv](https://www.jenv.be/) to manage multiple Java versions. For example, to install Java 21 via [homebre]w(https://brew.sh/) and configure it with jenv:
Polaris is built using [gradle](https://gradle.org/) and is compatible with Java 21. We recommend the use of [jenv](https://www.jenv.be/) to manage multiple Java versions. For example, to install Java 21 via [homebrew](https://brew.sh/) and configure it with jenv:

```
cd ~/polaris
Expand All @@ -77,13 +73,13 @@ If you want to connect to Polaris with [Apache Spark](https://spark.apache.org/)
brew install git
```

Then, clone Spark and check out a versioned branch. This guide uses [Spark 3.5.0](https://spark.apache.org/releases/spark-release-3-5-0.html).
Then, clone Spark and check out a versioned branch. This guide uses [Spark 3.5](https://spark.apache.org/releases/spark-release-3-5-0.html).

```
cd ~
git clone https://github.com/apache/spark.git
cd ~/spark
git checkout branch-3.5.0
git checkout branch-3.5
```

## Deploying Polaris
Expand Down Expand Up @@ -128,7 +124,7 @@ For this tutorial, we'll launch an instance of Polaris that stores entities only
When Polaris is launched using in-memory mode the root `CLIENT_ID` and `CLIENT_SECRET` can be found in stdout on initial startup. For example:

```
Bootstrapped with credentials: {"client-id": "XXXX", "client-secret": "YYYY"}
realm: default-realm root principal credentials: XXXX:YYYY
```

Be sure to note of these credentials as we'll be using them below.
Expand Down Expand Up @@ -230,10 +226,10 @@ In order to give this principal the ability to interact with the catalog, we mus
--client-id ${CLIENT_ID} \
--client-secret ${CLIENT_SECRET} \
privileges \
--catalog quickstart_catalog \
--catalog-role quickstart_catalog_role \
catalog \
grant \
--catalog quickstart_catalog \
--catalog-role quickstart_catalog_role \
CATALOG_MANAGE_CONTENT
```

Expand All @@ -251,7 +247,7 @@ At this point, we’ve created a principal and granted it the ability to manage

To use a Polaris-managed catalog in [Apache Spark](https://spark.apache.org/), we can configure Spark to use the Iceberg catalog REST API.

This guide uses [Apache Spark 3.5](https://spark.apache.org/releases/spark-release-3-5-0.html), but be sure to find [the appropriate iceberg-spark package for your Spark version](https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-spark). With a local Spark clone, we on the `branch-3.5` branch we can run the following:
This guide uses [Apache Spark 3.5](https://spark.apache.org/releases/spark-release-3-5-0.html), but be sure to find [the appropriate iceberg-spark package for your Spark version](https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-spark). From a local Spark clone on the `branch-3.5` branch we can run the following:

_Note: the credentials provided here are those for our principal, not the root credentials._

Expand Down Expand Up @@ -311,10 +307,10 @@ If at any time access is revoked...
--client-id ${CLIENT_ID} \
--client-secret ${CLIENT_SECRET} \
privileges \
--catalog quickstart_catalog \
--catalog-role quickstart_catalog_role \
catalog \
revoke \
--catalog quickstart_catalog \
--catalog-role quickstart_catalog_role \
CATALOG_MANAGE_CONTENT
```

Expand Down
6 changes: 6 additions & 0 deletions polaris
Original file line number Diff line number Diff line change
Expand Up @@ -31,5 +31,11 @@ fi

pushd $SCRIPT_DIR > /dev/null
PYTHONPATH=regtests/client/python ${SCRIPT_DIR}/polaris-venv/bin/python3 regtests/client/python/cli/polaris_cli.py "$@"
status=$?
popd > /dev/null

if [ $status -ne 0 ]; then
exit 1
fi

exit 0
6 changes: 5 additions & 1 deletion regtests/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,12 @@ WORKDIR /home/spark/regtests
COPY ./setup.sh /home/spark/regtests/setup.sh
COPY ./pyspark-setup.sh /home/spark/regtests/pyspark-setup.sh
COPY ./client/python /home/spark/regtests/client/python
COPY ./polaris /home/spark

RUN ./setup.sh
RUN python3 -m venv /home/spark/polaris-venv && \
. /home/spark/polaris-venv/bin/activate && \
pip install poetry==1.5.0 && \
deactivate

COPY --chown=spark . /home/spark/regtests

Expand Down
13 changes: 12 additions & 1 deletion regtests/client/python/cli/command/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -93,12 +93,23 @@ def options_get(key, f=lambda x: x):
action=options_get(f'{subcommand}_subcommand'),
catalog_name=options_get(Arguments.CATALOG),
catalog_role_name=options_get(Arguments.CATALOG_ROLE),
namespace=options_get(Arguments.NAMESPACE, lambda s: s.split('.')),
namespace=options_get(Arguments.NAMESPACE, lambda s: s.split('.') if s else None),
view=options_get(Arguments.VIEW),
table=options_get(Arguments.TABLE),
privilege=options_get(Arguments.PRIVILEGE),
cascade=options_get(Arguments.CASCADE)
)
elif options.command == Commands.NAMESPACES:
from cli.command.namespaces import NamespacesCommand
subcommand = options_get(f'{Commands.NAMESPACES}_subcommand')
command = NamespacesCommand(
subcommand,
catalog=options_get(Arguments.CATALOG),
namespace=options_get(Arguments.NAMESPACE, lambda s: s.split('.')),
parent=options_get(Arguments.PARENT, lambda s: s.split('.') if s else None),
location=options_get(Arguments.LOCATION),
properties=properties
)

if command is not None:
command.validate()
Expand Down
9 changes: 5 additions & 4 deletions regtests/client/python/cli/command/catalog_roles.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,8 @@
from pydantic import StrictStr

from cli.command import Command
from cli.constants import Subcommands
from cli.constants import Subcommands, Arguments
from cli.options.option_tree import Argument
from polaris.management import PolarisDefaultApi, CreateCatalogRoleRequest, CatalogRole, UpdateCatalogRoleRequest, \
GrantCatalogRoleRequest

Expand All @@ -45,10 +46,10 @@ class CatalogRolesCommand(Command):

def validate(self):
if not self.catalog_name:
raise Exception("Missing required argument: --catalog")
raise Exception(f'Missing required argument: {Argument.to_flag_name(Arguments.CATALOG)}')
if self.catalog_roles_subcommand in {Subcommands.GRANT, Subcommands.REVOKE}:
if not self.principal_role_name:
raise Exception("Missing required argument: --principal")
raise Exception(f'Missing required argument: {Argument.to_flag_name(Arguments.PRINCIPAL_ROLE)}')

def execute(self, api: PolarisDefaultApi) -> None:
if self.catalog_roles_subcommand == Subcommands.CREATE:
Expand Down Expand Up @@ -90,4 +91,4 @@ def execute(self, api: PolarisDefaultApi) -> None:
api.revoke_catalog_role_from_principal_role(
self.principal_role_name, self.catalog_name, self.catalog_role_name)
else:
raise Exception(f"{self.catalog_roles_subcommand} is not supported in the CLI")
raise Exception(f'{self.catalog_roles_subcommand} is not supported in the CLI')
67 changes: 40 additions & 27 deletions regtests/client/python/cli/command/catalogs.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,8 @@
from pydantic import StrictStr

from cli.command import Command
from cli.constants import StorageType, CatalogType, Subcommands
from cli.constants import StorageType, CatalogType, Subcommands, Arguments
from cli.options.option_tree import Argument
from polaris.management import PolarisDefaultApi, Catalog, CreateCatalogRequest, UpdateCatalogRequest, \
StorageConfigInfo, ExternalCatalog, AwsStorageConfigInfo, AzureStorageConfigInfo, GcpStorageConfigInfo, \
PolarisCatalog, CatalogProperties
Expand Down Expand Up @@ -57,35 +58,42 @@ class CatalogsCommand(Command):
def validate(self):
if self.catalogs_subcommand == Subcommands.CREATE:
if not self.storage_type:
raise Exception(f"Missing required argument:"
f" --storage-type")
raise Exception(f'Missing required argument:'
f' {Argument.to_flag_name(Arguments.STORAGE_TYPE)}')
if not self.default_base_location:
raise Exception(f"Missing required argument:"
f" --default-base-location")
if self.catalog_type == CatalogType.EXTERNAL.value:
if not self.remote_url:
raise Exception(f"Missing required argument for {CatalogType.EXTERNAL.value} catalog:"
f" --remote-url")
raise Exception(f'Missing required argument:'
f' {Argument.to_flag_name(Arguments.DEFAULT_BASE_LOCATION)}')
if self.catalogs_subcommand == Subcommands.UPDATE:
if self.allowed_locations:
if not self.storage_type:
raise Exception(f"Missing required argument when updating allowed locations for a catalog:"
f" --storage-type")
raise Exception(f'Missing required argument when updating allowed locations for a catalog:'
f' {Argument.to_flag_name(Arguments.STORAGE_TYPE)}')

if self.storage_type == StorageType.S3.value:
if not self.role_arn:
raise Exception("Missing required argument for storage type 's3': --role-arn")
raise Exception(f"Missing required argument for storage type 's3':"
f" {Argument.to_flag_name(Arguments.ROLE_ARN)}")
if self._has_azure_storage_info() or self._has_gcs_storage_info():
raise Exception("Storage type 's3' supports the storage configurations --role-arn, "
"--external-id, and --user-arn")
raise Exception(f"Storage type 's3' supports the storage credentials"
f" {Argument.to_flag_name(Arguments.ROLE_ARN)},"
f" {Argument.to_flag_name(Arguments.EXTERNAL_ID)}, and"
f" {Argument.to_flag_name(Arguments.USER_ARN)}")
elif self.storage_type == StorageType.AZURE.value:
if not self.tenant_id:
raise Exception("Missing required argument for storage type 'azure': --tenant-id")
raise Exception("Missing required argument for storage type 'azure': "
f" {Argument.to_flag_name(Arguments.TENANT_ID)}")
if self._has_aws_storage_info() or self._has_gcs_storage_info():
raise Exception("Storage type 'azure' supports the storage configurations --tenant-id, "
"--multi-tenant-app-name, and --consent-url")
elif self._has_aws_storage_info() or self._has_azure_storage_info():
raise Exception("Storage type 'gcs' supports the storage configuration: --service-account")
raise Exception("Storage type 'azure' supports the storage credentials"
f" {Argument.to_flag_name(Arguments.TENANT_ID)},"
f" {Argument.to_flag_name(Arguments.MULTI_TENANT_APP_NAME)}, and"
f" {Argument.to_flag_name(Arguments.CONSENT_URL)}")
elif self.storage_type == StorageType.GCS.value:
if self._has_aws_storage_info() or self._has_azure_storage_info():
raise Exception("Storage type 'gcs' supports the storage credential"
f" {Argument.to_flag_name(Arguments.SERVICE_ACCOUNT)}")
elif self.storage_type == StorageType.FILE.value:
if self._has_aws_storage_info() or self._has_azure_storage_info() or self._has_gcs_storage_info():
raise Exception("Storage type 'file' does not support any storage credentials")

def _has_aws_storage_info(self):
return self.role_arn or self.external_id or self.user_arn
Expand Down Expand Up @@ -121,6 +129,11 @@ def _build_storage_config_info(self):
tenant_id=self.tenant_id,
multi_tenant_app_name=self.multi_tenant_app_name
)
elif self.storage_type == StorageType.FILE.value:
config = StorageConfigInfo(
storage_type=self.storage_type.upper(),
allowed_locations=self.allowed_locations
)
return config

def execute(self, api: PolarisDefaultApi) -> None:
Expand Down Expand Up @@ -161,17 +174,17 @@ def execute(self, api: PolarisDefaultApi) -> None:
print(catalog.to_json())
elif self.catalogs_subcommand == Subcommands.UPDATE:
catalog = api.get_catalog(self.catalog_name)
default_base_location_properties = {}
if self.default_base_location:
default_base_location_properties = {'default-base-location': self.default_base_location}
catalog.properties = {**default_base_location_properties, **self.properties}

if self.default_base_location or self.properties:
catalog.properties = CatalogProperties(
default_base_location=self.default_base_location,
additional_properties=self.properties
)
request = UpdateCatalogRequest(
current_entity_version=catalog.entity_version,
catalog=catalog
)
if (self.allowed_locations or self._has_aws_storage_info() or self._has_azure_storage_info() or
self._has_gcs_storage_info()):
if (self._has_aws_storage_info() or self._has_azure_storage_info() or self._has_gcs_storage_info() or
self.allowed_locations or self.default_base_location):
request = UpdateCatalogRequest(
current_entity_version=catalog.entity_version,
catalog=catalog,
Expand All @@ -180,5 +193,5 @@ def execute(self, api: PolarisDefaultApi) -> None:

api.update_catalog(self.catalog_name, request)
else:
raise Exception(f"{self.catalogs_subcommand} is not supported in the CLI")
raise Exception(f'{self.catalogs_subcommand} is not supported in the CLI')

79 changes: 79 additions & 0 deletions regtests/client/python/cli/command/namespaces.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
import json
import re
from dataclasses import dataclass
from typing import Dict, Optional, List

from pydantic import StrictStr

from cli.command import Command
from cli.constants import Subcommands, Arguments, UNIT_SEPARATOR
from cli.options.option_tree import Argument
from polaris.catalog import IcebergCatalogAPI, CreateNamespaceRequest, ApiClient, Configuration
from polaris.catalog.exceptions import NotFoundException
from polaris.management import PolarisDefaultApi


@dataclass
class NamespacesCommand(Command):
"""
A Command implementation to represent `polaris namespaces`. The instance attributes correspond to parameters
that can be provided to various subcommands
Example commands:
* ./polaris namespaces create --catalog my_schema my_namespace
* ./polaris namespaces list --catalog my_catalog
* ./polaris namespaces delete --catalog my_catalog my_namespace.inner
"""

namespaces_subcommand: str
catalog: str
namespace: List[StrictStr]
parent: List[StrictStr]
location: str
properties: Optional[Dict[str, StrictStr]]

def validate(self):
if not self.catalog:
raise Exception(f'Missing required argument:'
f' {Argument.to_flag_name(Arguments.CATALOG)}')

def _get_catalog_api(self, api: PolarisDefaultApi):
"""
Convert a management API to a catalog API
"""
catalog_host = re.match(r'(http://[^/]+)', api.api_client.configuration.host).group(1)
configuration = Configuration(
host=f'{catalog_host}/api/catalog',
username=api.api_client.configuration.username,
password=api.api_client.configuration.password,
access_token=api.api_client.configuration.access_token,
)
return IcebergCatalogAPI(ApiClient(configuration))

def execute(self, api: PolarisDefaultApi) -> None:
catalog_api = self._get_catalog_api(api)
if self.namespaces_subcommand == Subcommands.CREATE:
properties = self.properties or {}
if self.location:
properties = {**properties, Arguments.LOCATION: self.location}
request = CreateNamespaceRequest(
namespace=self.namespace,
properties=self.properties
)
catalog_api.create_namespace(
prefix=self.catalog,
create_namespace_request=request)
elif self.namespaces_subcommand == Subcommands.LIST:
if self.parent is not None:
result = catalog_api.list_namespaces(prefix=self.catalog, parent=UNIT_SEPARATOR.join(self.parent))
else:
result = catalog_api.list_namespaces(prefix=self.catalog)
for namespace in result.namespaces:
print(json.dumps({"namespace": '.'.join(namespace)}))
elif self.namespaces_subcommand == Subcommands.DELETE:
catalog_api.drop_namespace(prefix=self.catalog, namespace=UNIT_SEPARATOR.join(self.namespace))
elif self.namespaces_subcommand == Subcommands.GET:
catalog_api.namespace_exists(prefix=self.catalog, namespace=UNIT_SEPARATOR.join(self.namespace))
print(json.dumps({"namespace": '.'.join(self.namespace)}))
else:
raise Exception(f"{self.namespaces_subcommand} is not supported in the CLI")
Loading

0 comments on commit cfc71e8

Please sign in to comment.