-
Notifications
You must be signed in to change notification settings - Fork 234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert information
into dict
#752
base: main
Are you sure you want to change the base?
Conversation
17ad88f
to
af5ea1f
Compare
I'll fix the UT's later today |
457b311
to
723729c
Compare
bc48118
to
27aef99
Compare
61d68fa
to
06cfeda
Compare
@@ -139,13 +144,54 @@ def add_schema_to_cache(self, schema) -> str: | |||
def _get_relation_information(self, row: agate.Row) -> RelationInfo: | |||
"""relation info was fetched with SHOW TABLES EXTENDED""" | |||
try: | |||
_schema, name, _, information = row | |||
table_properties = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed this by directly separating the columns and properties when we parse the output. I think that makes more sense than splitting this later on because the SHOW TABLES EXTENDED
output is different from the DESCRIBE TABLE EXTENDED
output.
06cfeda
to
f97c161
Compare
@mikealfare would you have time to look into this one? :) |
This one appears to be a bit more involved and I see that @Fleid labelled this as "pr_tracked". That means it has its own item in our (internal) backlog (= we're aware of it, but it hasn't necessarily been slotted for a sprint yet). I would look to him to get capacity for this issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I very much like the goal of this PR: converting the information
string to a columns
and properties
dict!
But ... the code changes are lot. If we can test it well and with the suggested changes, I think it is the good enough to move forward to the features that benefit from the columns
and properties
dict.
@@ -320,10 +322,15 @@ def test_parse_relation(self): | |||
input_cols = [Row(keys=["col_name", "data_type"], values=r) for r in plain_rows] | |||
|
|||
config = self._get_target_http(self.project_cfg) | |||
rows = SparkAdapter(config).parse_describe_extended(relation, input_cols) | |||
self.assertEqual(len(rows), 4) | |||
adapter = SparkAdapter(config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Fokko : Could you rewrite this code to have a similar flow as before, i.e. (re)move the three build-up lines and have one call to get the to-be-tested object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've condensed the code quite a bit. Let me know what you think!
@@ -33,8 +33,8 @@ class SparkRelation(BaseRelation): | |||
is_delta: Optional[bool] = None | |||
is_hudi: Optional[bool] = None | |||
is_iceberg: Optional[bool] = None | |||
# TODO: make this a dict everywhere | |||
information: Optional[str] = None | |||
columns: List[Tuple[str, str]] = field(default_factory=list) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this! Much better than the information
string ❤️
@@ -138,13 +143,54 @@ def quote(self, identifier: str) -> str: # type: ignore | |||
def _get_relation_information(self, row: agate.Row) -> RelationInfo: | |||
"""relation info was fetched with SHOW TABLES EXTENDED""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Fokko : Could you extend the docstring to explain that the SHOW TABLES EXTENDED
is preferred because fetching multiple tables at once is faster than fetching tables one by one. And, that we except the downside of parsing the |---
string given the performance gains?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Certainly. I've added a docstring. Let me know what you think
self, table_results: agate.Table | ||
) -> Tuple[List[Tuple[str, str]], Dict[str, str]]: | ||
# Wrap it in an iter, so we continue reading the properties from where we stopped reading columns | ||
table_results_itr = iter(table_results) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uh oh, the comment was still pending
@@ -320,10 +322,15 @@ def test_parse_relation(self): | |||
input_cols = [Row(keys=["col_name", "data_type"], values=r) for r in plain_rows] | |||
|
|||
config = self._get_target_http(self.project_cfg) | |||
rows = SparkAdapter(config).parse_describe_extended(relation, input_cols) | |||
self.assertEqual(len(rows), 4) | |||
adapter = SparkAdapter(config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've condensed the code quite a bit. Let me know what you think!
@@ -138,13 +143,54 @@ def quote(self, identifier: str) -> str: # type: ignore | |||
def _get_relation_information(self, row: agate.Row) -> RelationInfo: | |||
"""relation info was fetched with SHOW TABLES EXTENDED""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Certainly. I've added a docstring. Let me know what you think
This PR has been marked as Stale because it has been open with no activity as of late. If you would like the PR to remain open, please comment on the PR or else it will be closed in 7 days. |
This PR has been marked as Stale because it has been open with no activity as of late. If you would like the PR to remain open, please comment on the PR or else it will be closed in 7 days. |
resolves #751
Description
Checklist
changie new
to create a changelog entry