Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python: improve Glue catalog using Boto3 types #8304

Merged
merged 4 commits into from
Sep 22, 2023

Conversation

rustyconover
Copy link
Contributor

Improve the type checking of the Glue catalog module by using mypy_boto3_glue type definitions for the AWS API calls. Eliminate the use of many literal types.

Update poetry.lock with the new dependency.

Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small nits on the string concatenation, is this intentional?

python/pyiceberg/catalog/glue.py Outdated Show resolved Hide resolved
python/pyiceberg/catalog/glue.py Outdated Show resolved Hide resolved
python/pyiceberg/catalog/glue.py Outdated Show resolved Hide resolved
python/pyproject.toml Outdated Show resolved Hide resolved
Copy link
Contributor

@HonahX HonahX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rustyconover Thanks for the contribution!

Comment on lines -64 to -71
PROP_GLUE_TABLE = "Table"
PROP_GLUE_TABLE_TYPE = "TableType"
PROP_GLUE_TABLE_DESCRIPTION = "Description"
PROP_GLUE_TABLE_PARAMETERS = "Parameters"
PROP_GLUE_TABLE_DATABASE_NAME = "DatabaseName"
PROP_GLUE_TABLE_NAME = "Name"
PROP_GLUE_TABLE_OWNER = "Owner"
PROP_GLUE_TABLE_STORAGE_DESCRIPTOR = "StorageDescriptor"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that with the incorporation of Boto3 types, we now have an enhanced verification mechanism for key names when defining table inputs and parameters. However, in my perspective, it's still beneficial to retain the use of string constants, both to minimize the chance of typos and to ensure consistency throughout our codebase. For context, we've employed this approach with string constants in both hive.py and rest.py:

PROP_EXTERNAL = "EXTERNAL"
PROP_TABLE_TYPE = "table_type"
PROP_METADATA_LOCATION = "metadata_location"
PROP_PREVIOUS_METADATA_LOCATION = "previous_metadata_location"

AUTHORIZATION_HEADER = "Authorization"
BEARER_PREFIX = "Bearer"
CATALOG_SCOPE = "catalog"
CLIENT_ID = "client_id"
PREFIX = "prefix"
CLIENT_SECRET = "client_secret"
CLIENT_CREDENTIALS = "client_credentials"
CREDENTIAL = "credential"
GRANT_TYPE = "grant_type"

If the concern revolves around the clarity of the variable names, perhaps we can consider simplifying them? For example:

TABLE = "Table"
TABLE_TYPE = "TableType"
...

I'd truly appreciate your perspective on this. Do you think this approach would be beneficial, or do you have alternative suggestions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think mypy sufficiently protects against typos. Having variables with literal strings doesn't seem very useful to me, I prefer just to rely on the types defined by mypy_boto3_glue.

python/pyiceberg/catalog/glue.py Show resolved Hide resolved
Comment on lines 136 to 138
properties: Properties = glue_table["Parameters"]

def _convert_glue_to_iceberg(self, glue_table: Dict[str, Any]) -> Table:
properties: Properties = glue_table.get(PROP_GLUE_TABLE_PARAMETERS, {})
database_name = glue_table["DatabaseName"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to add null check for "Parameters" and "DatabaseName" as they are specified as NotRequired according to https://youtype.github.io/boto3_stubs_docs/mypy_boto3_glue/type_defs/#tabletypedef ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I've added assertions for those properties.

python/pyiceberg/catalog/glue.py Show resolved Hide resolved
Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I wasn't aware of the types package. Thanks for adding this

@@ -441,12 +441,13 @@ def test_update_namespace_properties_overlap_update_removal(

@mock_glue
def test_passing_profile_name() -> None:
session_properties = {
session_properties: Dict[str, Any] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Suggested change
session_properties: Dict[str, Any] = {
session_properties: Dict[str, Optional[str]] = {

@Fokko Fokko merged commit 4cc609a into apache:master Sep 22, 2023
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants