Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove Iceberg materialized view storage tables from metastores #18853

Merged
merged 5 commits into from
Nov 10, 2023

Conversation

alexjo2144
Copy link
Member

Description

Storage tables clutter the namespace, and do not need a separate metastore entry. The pointer to the current metadata file is instead stored in the materialized view properties.

This applies only to newly created materialized views, existing ones are not effected.

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text:

# Iceberg
* Newly created Materialized Views will not create a separate entry in the metastore for a storage table.

@github-actions github-actions bot added tests:hive iceberg Iceberg connector hive Hive connector labels Aug 29, 2023
@cla-bot cla-bot bot added the cla-signed label Aug 29, 2023
@alexjo2144 alexjo2144 requested a review from findepi August 29, 2023 20:57
Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(skimming)

@alexjo2144 alexjo2144 force-pushed the iceberg/mv-storage-table branch 5 times, most recently from 988550d to f984097 Compare September 6, 2023 21:07
@alexjo2144
Copy link
Member Author

@findepi comments applied, PTAL when you get a chance.

@alexjo2144
Copy link
Member Author

A test with secrets would also be great

@findepi
Copy link
Member

findepi commented Sep 11, 2023

/test-with-secrets sha=f9840979650b7774cbe5383eb8c31e9e4aba8a58

@github-actions
Copy link

github-actions bot commented Sep 11, 2023

The CI workflow run with tests that require additional secrets finished as failure: https://github.com/trinodb/trino/actions/runs/6143712396

@findepi
Copy link
Member

findepi commented Sep 12, 2023

failure looks related

@alexjo2144
Copy link
Member Author

failure looks related

Indeed

@alexjo2144 alexjo2144 force-pushed the iceberg/mv-storage-table branch from ed9a0b0 to b441b7f Compare September 13, 2023 15:16
@findepi
Copy link
Member

findepi commented Sep 13, 2023

/test-with-secrets sha=b441b7fc458c58754ab0b8607dd488a537325f7d

@github-actions
Copy link

github-actions bot commented Sep 13, 2023

The CI workflow run with tests that require additional secrets finished as failure: https://github.com/trinodb/trino/actions/runs/6175817369

@alexjo2144
Copy link
Member Author

Failure on the run with secrets was in the databricks delta lake suite, unrelated.

return hideMaterializedViewStorageTableEnabled;
}

@Config("iceberg.materialized-views.hide-storage-tables")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test using iceberg.materialized-views.hide-storage-tables=false.
We need to ensure that Trino can still read/refresh this kind of materialized views.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

talked offline that this can be a follow-up.

@findepi
Copy link
Member

findepi commented Oct 30, 2023

(rebased)

@sopel39 sopel39 removed their request for review October 31, 2023 13:00
@findepi findepi force-pushed the iceberg/mv-storage-table branch 2 times, most recently from 110cf6e to 8030954 Compare November 7, 2023 15:01
@findepi findepi force-pushed the iceberg/mv-storage-table branch 2 times, most recently from 69c5447 to 9759483 Compare November 8, 2023 10:24
@@ -6701,30 +6701,6 @@ public void testSnapshotSummariesHaveTrinoQueryIdFormatV2()
"WHEN MATCHED THEN UPDATE SET b = t.b * 50", tableName, sourceTableName)));
}

@Test
public void testMaterializedViewSnapshotSummariesHaveTrinoQueryId()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change?

@findepi findepi force-pushed the iceberg/mv-storage-table branch 2 times, most recently from 01f0c4f to 28892f7 Compare November 8, 2023 16:37
@findepi
Copy link
Member

findepi commented Nov 8, 2023

/test-with-secrets sha=28892f73b45825172bf0fa552a832753466a602e

Copy link

github-actions bot commented Nov 8, 2023

The CI workflow run with tests that require additional secrets finished as failure: https://github.com/trinodb/trino/actions/runs/6801273645

The FILE_FORMAT_PROPERTY table property has a default value, so no need
for explicit default in the code.
The helper method always returns a value
@findepi findepi force-pushed the iceberg/mv-storage-table branch from 28892f7 to 1457de6 Compare November 9, 2023 12:16
Storage tables clutter the namespace, and do not need a separate metastore entry.
The pointer to the current metadata file is instead stored in the materialized
view properties.

This applies only to newly created materialized views, existing ones are not
effected.
@findepi findepi force-pushed the iceberg/mv-storage-table branch from 1457de6 to 5293ddb Compare November 9, 2023 12:16
@findepi findepi merged commit f937d31 into trinodb:master Nov 10, 2023
50 of 52 checks passed
@github-actions github-actions bot added this to the 433 milestone Nov 10, 2023
@mosabua
Copy link
Member

mosabua commented Nov 10, 2023

Modified the release notes entry and added.

@alexjo2144 alexjo2144 deleted the iceberg/mv-storage-table branch November 27, 2023 16:44
@wobrycki
Copy link

Hi @alexjo2144,
are the materialized view storage tables still accessible as iceberg tables as it was before?
Previously we had a case, when storage table (with name starting from st_) was referenced from spark, connected to the same iceberg configuration as trino.
For some reason we choose Spark for certain operations on iceberg tables and we would like to continue doing it.

Now, system.metadata.materiazed_views is saying the storage table name is view_name$materialized_view_storage and spark ends with error: pyspark.sql.utils.AnalysisException: Table or view not found when trying to access the storage table.

@alexjo2144
Copy link
Member Author

Hey @wobrycki

With the new behavior, no you won't be able to do that. I didn't think reading MV storage tables from Spark would be something anyone was doing. How do you figure out which storage table belongs to each MV? Or do you just list tables and then call a procedure like rewrite_data_files in Spark for each one?

For now, you can return to the previous behavior by setting iceberg.materialized-views.hide-storage-table=false in your Iceberg config.

On the other hand, if it's file compaction (rewrite_data_files) you are worried about, it was likely not doing much for MV storage tables in the first place. Since MVs are fully refreshed each time they are updated, the data layout should be pretty compact already.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed hive Hive connector iceberg Iceberg connector
Development

Successfully merging this pull request may close these issues.

5 participants