-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python: improve Glue catalog using Boto3 types #8304
Python: improve Glue catalog using Boto3 types #8304
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some small nits on the string concatenation, is this intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rustyconover Thanks for the contribution!
PROP_GLUE_TABLE = "Table" | ||
PROP_GLUE_TABLE_TYPE = "TableType" | ||
PROP_GLUE_TABLE_DESCRIPTION = "Description" | ||
PROP_GLUE_TABLE_PARAMETERS = "Parameters" | ||
PROP_GLUE_TABLE_DATABASE_NAME = "DatabaseName" | ||
PROP_GLUE_TABLE_NAME = "Name" | ||
PROP_GLUE_TABLE_OWNER = "Owner" | ||
PROP_GLUE_TABLE_STORAGE_DESCRIPTOR = "StorageDescriptor" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that with the incorporation of Boto3 types, we now have an enhanced verification mechanism for key names when defining table inputs and parameters. However, in my perspective, it's still beneficial to retain the use of string constants, both to minimize the chance of typos and to ensure consistency throughout our codebase. For context, we've employed this approach with string constants in both hive.py and rest.py:
iceberg/python/pyiceberg/catalog/hive.py
Lines 149 to 152 in cf358a4
PROP_EXTERNAL = "EXTERNAL" | |
PROP_TABLE_TYPE = "table_type" | |
PROP_METADATA_LOCATION = "metadata_location" | |
PROP_PREVIOUS_METADATA_LOCATION = "previous_metadata_location" |
iceberg/python/pyiceberg/catalog/rest.py
Lines 90 to 98 in cf358a4
AUTHORIZATION_HEADER = "Authorization" | |
BEARER_PREFIX = "Bearer" | |
CATALOG_SCOPE = "catalog" | |
CLIENT_ID = "client_id" | |
PREFIX = "prefix" | |
CLIENT_SECRET = "client_secret" | |
CLIENT_CREDENTIALS = "client_credentials" | |
CREDENTIAL = "credential" | |
GRANT_TYPE = "grant_type" |
If the concern revolves around the clarity of the variable names, perhaps we can consider simplifying them? For example:
TABLE = "Table"
TABLE_TYPE = "TableType"
...
I'd truly appreciate your perspective on this. Do you think this approach would be beneficial, or do you have alternative suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think mypy sufficiently protects against typos. Having variables with literal strings doesn't seem very useful to me, I prefer just to rely on the types defined by mypy_boto3_glue.
python/pyiceberg/catalog/glue.py
Outdated
properties: Properties = glue_table["Parameters"] | ||
|
||
def _convert_glue_to_iceberg(self, glue_table: Dict[str, Any]) -> Table: | ||
properties: Properties = glue_table.get(PROP_GLUE_TABLE_PARAMETERS, {}) | ||
database_name = glue_table["DatabaseName"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to add null check for "Parameters" and "DatabaseName" as they are specified as NotRequired
according to https://youtype.github.io/boto3_stubs_docs/mypy_boto3_glue/type_defs/#tabletypedef ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I've added assertions for those properties.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I wasn't aware of the types package. Thanks for adding this
@@ -441,12 +441,13 @@ def test_update_namespace_properties_overlap_update_removal( | |||
|
|||
@mock_glue | |||
def test_passing_profile_name() -> None: | |||
session_properties = { | |||
session_properties: Dict[str, Any] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
session_properties: Dict[str, Any] = { | |
session_properties: Dict[str, Optional[str]] = { |
Improve the type checking of the Glue catalog module by using mypy_boto3_glue type definitions for the AWS API calls. Eliminate the use of many literal types.
Update poetry.lock with the new dependency.