-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add $partitions
table to Delta Connector
#19722
Add $partitions
table to Delta Connector
#19722
Conversation
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
this.partitionColumns = requireNonNull(partitionColumns, "partitionColumns is null"); | ||
|
||
ImmutableList.Builder<ColumnMetadata> columns = ImmutableList.builder(); | ||
partitionColumns.forEach(column -> columns.add(new ColumnMetadata(column.getBaseColumnName(), VARCHAR))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't follow the reasoning behind this constraint - why not using the actual type of the partition column?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially tried to do that, but there were issues with casting back to the original type as AddFileEntry.getCanonicalPartitionValues()
returns a string represenation of the value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be the original type. Also, we need to discuss the result format (Hive or Iceberg)
-- Hive
select * from "test$partitions";
b
---
1
-- Iceberg
select * from "test$partitions";
partition | record_count | file_count | total_size | data
-----------+--------------+------------+------------+--------------------------------------------------
{b=1} | 1 | 1 | 317 | {a={min=1, max=1, null_count=0, nan_count=NULL}}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try to see if I can get the types correct 👍
Good point on format, I've only used the hive
and delta
connectors in practice so went with the original Hive format. The Iceberg format has a lot more info, but I'm not sure how best to materialize that from the transaction log. I'll defer to one of you on which format we want
Thanks for the review as well, I'll take a look at the other comments and fix this PR up. Appreciate the patience for a first time contributor 🙏
This commit adds the `DeltaLakePartitionsTable` and `DeltaLakePartitionsTableProvider` classes to support the `$partitions` system table in the Delta Connector. As the partition values casted as string from internal calls, the resulting table will always return VARCHAR, so called that out in the docs.
601089c
to
66626ae
Compare
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
@@ -0,0 +1,18 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove an unrelated change.
@@ -704,6 +704,49 @@ delta.feature.columnMapping | supported | | |||
|
|||
(delta-lake-special-columns)= | |||
|
|||
#### `$partitions` table | |||
|
|||
The `$partitions` table provides access to a specified Delta Lake Table's partitions keys and values. Each table row is a unique partition value, with the columns being the partition keys. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The line is too long. Please wrap at 80 characters.
name | comment | regionkey | | ||
----------------------------+-----------------+---------------------+ | ||
foo | foo | east | | ||
bar | bar | west | | ||
baz | baz | west | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This table looks too wide. Please narrow it. Same for the following $partitions
result.
} | ||
|
||
private Optional<SystemTable> getRawSystemTable(SchemaTableName systemTableName) | ||
private Optional<SystemTable> getRawSystemTable(SchemaTableName systemTableName, ConnectorSession session) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace the order between SchemaTableName & ConnectorSession.
assertUpdate("INSERT INTO " + tableName + " VALUES (0, CAST('2019-09-08' AS DATE)), (1, CAST('2019-09-09' AS DATE)), (2, CAST('2019-09-09' AS DATE))", 3); | ||
assertUpdate("INSERT INTO " + tableName + " VALUES (3, CAST('2019-09-09' AS DATE)), (4, CAST('2019-09-10' AS DATE)), (5, CAST('2019-09-10' AS DATE))", 3); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please test with NULL partitions.
@@ -0,0 +1,18 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please rebase on master instead of merge and then fix logical conflicts.
Error: Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.11.0:compile (default-compile) on project trino-delta-lake: Compilation failure: Compilation failure:
Error: /home/runner/work/trino/trino/plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakePartitionsTable.java:[95,49] cannot find symbol
Error: symbol: method getSnapshot(io.trino.spi.connector.ConnectorSession,io.trino.spi.connector.SchemaTableName,java.lang.String,java.util.Optional<java.lang.Object>)
Error: location: variable transactionLogAccess of type io.trino.plugin.deltalake.transactionlog.TransactionLogAccess
Error: /home/runner/work/trino/trino/plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakePartitionsTableProvider.java:[57,63] cannot find symbol
Error: symbol: method getSnapshot(io.trino.spi.connector.ConnectorSession,io.trino.spi.connector.SchemaTableName,java.lang.String,java.util.Optional<java.lang.Object>)
Error: location: variable transactionLogAccess of type io.trino.plugin.deltalake.transactionlog.TransactionLogAccess
Error: -> [Help 1]
@Test | ||
public void testPartitionsTable() | ||
{ | ||
String tableName = "test_simple_partitions_table"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All 3 tests have _simple_
in the table name. It doesn't provide any benefits unless we have complex tests. I would remove them.
assertUpdate("INSERT INTO " + tableName + " VALUES (3, CAST('2019-09-09' AS DATE)), (4, CAST('2019-09-10' AS DATE)), (5, CAST('2019-09-10' AS DATE))", 3); | ||
assertUpdate("UPDATE " + tableName + " SET _bigint = 50 WHERE _bigint = BIGINT '5'", 1); | ||
assertUpdate("DELETE FROM " + tableName + " WHERE _date = DATE '2019-09-08'", 1); | ||
assertQuerySucceeds("ALTER TABLE " + tableName + " EXECUTE OPTIMIZE"); | ||
assertQuery("SELECT count(*) FROM " + tableName, "VALUES 5"); | ||
assertQuery("SELECT count(*) FROM \"" + tableName + "$partitions\"", "VALUES 0"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would remove these statements. Single INSERT is sufficient for verifying $partitons
with unpartitioned table.
@@ -704,6 +704,49 @@ delta.feature.columnMapping | supported | | |||
|
|||
(delta-lake-special-columns)= | |||
|
|||
#### `$partitions` table |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current place (after (delta-lake-special-columns)=
) is wrong.
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove a redundant empty line.
This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua |
👋 @whitleykeith - this PR has become inactive. If you're still interested in working on it, please let us know, and we can try to get reviewers to help with that. We're working on closing out old and inactive PRs, so if you're too busy or this has too many merge conflicts to be worth picking back up, we'll be making another pass to close it out in a few weeks. |
This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua |
@findinpath @ebyhr .. is this still useful/valid or should we close? |
@mosabua I'm the original author of the PR. Sorry for not chiming in here, I chatted with @findinpath last month in slack about this. We use Trino a lot, especially the Delta connector, and I would love to be able to get this in a good state. I ran out of time when I first started development on this but I'm hoping I can block off some time in the next few weeks to resolve comments + make necessary changes |
Thats great news @whitleykeith .. feel free to continue to work with @findinpath and others and let me know if you need any other help. |
It was easier to make a brand new PR since there was a lot of changes (including a java version upgrade), new PR is here: #20735 |
Description
This PR implements the
$partitions
system table for the Delta Lake connector, and behaves similarly to the Hive analog. It additionally adds a.devcontainer.json
to support running Trino in a devcontainer/codespace. I did this primarily for my own workflow and can remove it if it's not desired.An implementation wrinkle: Due to how the
AddFileEntry.getCanonicalPartitionValues
returns strings for partition values and some issues with casting back to their original types, the columns in the result table are always VARCHAR, even if the underlying column is a different type. I called this out in the related doc updates.Additional context and related issues
Closes #18590
Release notes
( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text: