Add ConnectorMetadata#isMaterializedView #18933

ksobolew · 2023-09-05T13:44:14Z

Description

The current implementation of MetadataManager#isMaterializedView calls ConnectorMetadata#getMaterializedView under the covers. This means that we have to load and deserialize the materialized view metadata just to see if it exists, which is unnecessary most of the time. This may also cause issues when the metadata is currupted in some way (see the last commit for relevant test cases, where we add support for this method to the Iceberg plugin). In such a case, we want to still be able to DROP the corrupted materialized view - which is currently impossible, because the DropMaterializedViewTask will call getMaterializedView, which will fail when it tries to load the corrupted metadata. With this change this is less likely.

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text:

# Section
* Fix failure in `DROP MATERIALIZED VIEW` in Iceberg when the materialized view metadata is missing or corrupted.

ksobolew · 2023-09-06T13:26:59Z

CI #16437

findepi · 2023-09-13T15:21:58Z

And that would prevent it from being DROPped, because the DropMaterializedViewTask first wants to check if the entity to be dropped is a materialized view. If it called getMaterializedView, as it used to, dropping such a view would fail too; but with isMaterializedView that is not a problem.

please add a test for that

This may not be just an optimization,

is this also an optimization? if so, there should be a test change (eg counting file system or metastore access) that shows the improvement.

findepi · 2023-09-13T15:22:24Z

(x) This is not user-visible or is docs only, and no release notes are required.

The description contradicts that.

core/trino-spi/src/main/java/io/trino/spi/connector/ConnectorMetadata.java

ksobolew · 2023-09-13T15:29:52Z

is this also an optimization? if so, there should be a test change (eg counting file system or metastore access) that shows the improvement.

There are already changes to the TestTrinoDatabaseMetaData that show changes in the call patterns. It's not exactly fewer calls, but fewer to getMaterializedView.

findepi · 2023-09-15T08:09:21Z

TestTrinoDatabaseMetaData is quite tied to JDBC driver
can we have a test for Iceberg connector in the Iceberg module as well?
I think it would make sense, given that Iceberg is the only connector that has MV support

see eg

trino/plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergMetastoreAccessOperations.java

Lines 328 to 372 in 8f25ce5

    
           @Test(dataProvider = "metadataQueriesTestTableCountDataProvider") 
        
           public void testInformationSchemaColumns(int tables) 
        
           { 
        
               String schemaName = "test_i_s_columns_schema" + randomNameSuffix(); 
        
               assertUpdate("CREATE SCHEMA " + schemaName); 
        
               Session session = Session.builder(getSession()) 
        
                       .setSchema(schemaName) 
        
                       .build(); 
        
               for (int i = 0; i < tables; i++) { 
        
                   assertUpdate(session, "CREATE TABLE test_select_i_s_columns" + i + "(id varchar, age integer)"); 
        
                   // Produce multiple snapshots and metadata files 
        
                   assertUpdate(session, "INSERT INTO test_select_i_s_columns" + i + " VALUES ('abc', 11)", 1); 
        
                   assertUpdate(session, "INSERT INTO test_select_i_s_columns" + i + " VALUES ('xyz', 12)", 1); 
        
                   assertUpdate(session, "CREATE TABLE test_other_select_i_s_columns" + i + "(id varchar, age integer)"); // won't match the filter 
        
               } 
        
               // Bulk retrieval 
        
               assertMetastoreInvocations(session, "SELECT * FROM information_schema.columns WHERE table_schema = CURRENT_SCHEMA AND table_name LIKE 'test_select_i_s_columns%'", 
        
                       ImmutableMultiset.builder() 
        
                               .add(GET_ALL_TABLES_FROM_DATABASE) 
        
                               .addCopies(GET_TABLE, tables * 2) 
        
                               .addCopies(GET_TABLES_WITH_PARAMETER, 2) 
        
                               .build()); 
        
               // Pointed lookup 
        
               assertMetastoreInvocations(session, "SELECT * FROM information_schema.columns WHERE table_schema = CURRENT_SCHEMA AND table_name = 'test_select_i_s_columns0'", 
        
                       ImmutableMultiset.builder() 
        
                               .add(GET_TABLE) 
        
                               .build()); 
        
               // Pointed lookup via DESCRIBE (which does some additional things before delegating to information_schema.columns) 
        
               assertMetastoreInvocations(session, "DESCRIBE test_select_i_s_columns0", 
        
                       ImmutableMultiset.builder() 
        
                               .add(GET_DATABASE) 
        
                               .add(GET_TABLE) 
        
                               .build()); 
        
               for (int i = 0; i < tables; i++) { 
        
                   assertUpdate(session, "DROP TABLE test_select_i_s_columns" + i); 
        
                   assertUpdate(session, "DROP TABLE test_other_select_i_s_columns" + i); 
        
               } 
        
           }

ksobolew · 2023-09-15T08:12:59Z

Sure, I'm working on tests right now (though demonstrating how a corrupted metadata can prevent DROP MV from executing is kinda challenging)

The current implementation of `MetadataManager#isMaterializedView` calls `ConnectorMetadata#getMaterializedView` under the covers. This means that we have to load and deserialize the materialized view metadata just to see if it exists, which is unnecessary most of the time. This may also cause issues when the metadata is currupted in some way (see subsequent commits for relevant test cases, where we add support for this method to the Iceberg plugin). In such a case, we want to still be able to `DROP` the corrupted materialized view - which is currently impossible, because the `DropMaterializedViewTask` will call `getMaterializedView`, which will fail when it tries to load the corrupted metadata. With this change this is less likely.

Here it's mostly just an optimization, which allows us to avoid loading the materialized view metadata unnecessarily in some situations.

ksobolew · 2023-09-28T11:21:28Z

@findepi Addressed comments, PTAL.

ksobolew · 2023-09-28T11:21:42Z

TestTrinoDatabaseMetaData is quite tied to JDBC driver
can we have a test for Iceberg connector in the Iceberg module as well?
I think it would make sense, given that Iceberg is the only connector that has MV support

Added such a test, but it's not very meaningful, IMO. It just says that we call getTable on the metastore one time less when we DROP MATERIALIZED VIEW.

ksobolew · 2023-09-28T11:22:00Z

(x) This is not user-visible or is docs only, and no release notes are required.

The description contradicts that.

Added a release note.

ksobolew · 2023-09-28T11:23:18Z

And that would prevent it from being DROPped, because the DropMaterializedViewTask first wants to check if the entity to be dropped is a materialized view. If it called getMaterializedView, as it used to, dropping such a view would fail too; but with isMaterializedView that is not a problem.

please add a test for that

Finally found a way to do that; test added.

This may not be just an optimization,

is this also an optimization? if so, there should be a test change (eg counting file system or metastore access) that shows the improvement.

I rewrote the commit message (and the PR description) to phrase it as primarily a correctness issue, with some efficiency improvements added as a bonus.

ksobolew · 2023-09-28T12:17:37Z

See #19177 for an example of a failure that this PR can mitigate.

This commit adds, among other things, test cases where we `DROP` materialized view with corrupted metadata, which would otherwise fail. In case of Hive catalogs we can bypass the metadata cache entirely, in case of Glue catalogs the benefits are not that pronounced. But we still avoid some potentially expensive operations and/or ones potentially causing failures.

findepi · 2023-09-29T12:57:03Z

/test-with-secrets sha=f5d848b3678109c4fe1e67b18b798d5c0e1ee04e

findepi · 2023-09-29T12:57:19Z

i plan to review this (later)

github-actions · 2023-09-29T13:24:02Z

The CI workflow run with tests that require additional secrets finished as failure: https://github.com/trinodb/trino/actions/runs/6352010690

findepi · 2023-09-29T20:25:21Z

This means that we have to load and deserialize the materialized view metadata just to see if it exists, which is unnecessary most of the time

This is true, but actually it is probably much lesser problem than it sounds it is

. This may also cause issues when the metadata is currupted in some way (see the last commit for relevant test cases, where we add support for this method to the Iceberg plugin). In such a case, we want to still be able to DROP the corrupted materialized view

The ConnectorMaterializedViewDefinition is the "materialized view definition", it shouldn't need to load the storage table for anything.
I think the fact that in Iceberg case it loads properties from the storage table is sub-optimal design and we should revisit it.

Alex's Remove Iceberg materialized view storage tables from metastores #18853 will squash MV and storage table within metastore
plus, we should preserve user-visible MV properties in the metastore level object as well
- we can do so safely, since storage table is "owned" by the MV so it must not change behind the scenes in any way

This is important not only for DROP corrupted_mv case.
It is also important for all metadata queries. Querying system.metatda.materialized_views should ideally not need to go to S3 to obtain "materialized view definitions", especially not O(number of MVs) times.

ksobolew · 2023-10-02T11:53:23Z

This is true, but actually it is probably much lesser problem than it sounds it is

It is a problem, at least, for the use case I'm trying to fix.

This is important not only for DROP corrupted_mv case.
It is also important for all metadata queries. Querying system.metatda.materialized_views should ideally not need to go to S3 to obtain "materialized view definitions", especially not O(number of MVs) times.

That's true as well.

I appreciate the feedback and I agree that we don't want to add more code which may be considered a workaround for limitations of the current architecture, but I also don't want this to become a case of "sure, but first please spend 6 months refactoring this complex piece of code". What would be the commended course of action then?

github-actions · 2024-01-11T17:34:22Z

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

mosabua · 2024-01-11T21:39:26Z

Reminder to continue work on this @ksobolew @findepi or close it.

ksobolew · 2024-01-12T10:08:05Z

This work is currently on hold and it's not clear this is the way forward. I'll reopen if it crystalizes.

cla-bot bot added the cla-signed label Sep 5, 2023

ksobolew marked this pull request as ready for review September 5, 2023 13:49

github-actions bot added tests:hive hive Hive connector labels Sep 5, 2023

ksobolew force-pushed the kudi/is-mv branch from 7d01173 to 447d84c Compare September 6, 2023 08:17

github-actions bot added the jdbc Relates to Trino JDBC driver label Sep 6, 2023

ksobolew force-pushed the kudi/is-mv branch from 447d84c to 3c8b2f2 Compare September 6, 2023 08:57

ksobolew force-pushed the kudi/is-mv branch from 3c8b2f2 to 20f0aeb Compare September 7, 2023 14:01

github-actions bot added the iceberg Iceberg connector label Sep 7, 2023

ksobolew requested review from findepi and hashhar September 12, 2023 14:25

findepi reviewed Sep 13, 2023

View reviewed changes

core/trino-spi/src/main/java/io/trino/spi/connector/ConnectorMetadata.java Outdated Show resolved Hide resolved

Fix deprecation warnings

c3906ce

ksobolew force-pushed the kudi/is-mv branch from 20f0aeb to 64cafd6 Compare September 28, 2023 09:59

ksobolew added 2 commits September 28, 2023 13:05

Add support for isMaterializedView to Hive

6aeb62e

Here it's mostly just an optimization, which allows us to avoid loading the materialized view metadata unnecessarily in some situations.

ksobolew force-pushed the kudi/is-mv branch from 64cafd6 to 3e5d566 Compare September 28, 2023 11:15

ksobolew force-pushed the kudi/is-mv branch from 3e5d566 to 632751a Compare September 28, 2023 12:19

Add a test case for DROP MV to Iceberg Metastore tests

fc4dd6f

ksobolew force-pushed the kudi/is-mv branch from 632751a to f5d848b Compare September 28, 2023 15:04

github-actions bot added the stale label Jan 11, 2024

ksobolew closed this Jan 12, 2024

ksobolew deleted the kudi/is-mv branch May 20, 2024 12:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ConnectorMetadata#isMaterializedView #18933

Add ConnectorMetadata#isMaterializedView #18933

ksobolew commented Sep 5, 2023 •

edited

Loading

ksobolew commented Sep 6, 2023

findepi commented Sep 13, 2023

findepi commented Sep 13, 2023

ksobolew commented Sep 13, 2023

findepi commented Sep 15, 2023

ksobolew commented Sep 15, 2023

ksobolew commented Sep 28, 2023

ksobolew commented Sep 28, 2023

ksobolew commented Sep 28, 2023

ksobolew commented Sep 28, 2023 •

edited

Loading

ksobolew commented Sep 28, 2023

findepi commented Sep 29, 2023

findepi commented Sep 29, 2023

github-actions bot commented Sep 29, 2023 •

edited

Loading

findepi commented Sep 29, 2023

ksobolew commented Oct 2, 2023

github-actions bot commented Jan 11, 2024

mosabua commented Jan 11, 2024

ksobolew commented Jan 12, 2024

Add ConnectorMetadata#isMaterializedView #18933

Add ConnectorMetadata#isMaterializedView #18933

Conversation

ksobolew commented Sep 5, 2023 • edited Loading

Description

Additional context and related issues

Release notes

ksobolew commented Sep 6, 2023

findepi commented Sep 13, 2023

findepi commented Sep 13, 2023

ksobolew commented Sep 13, 2023

findepi commented Sep 15, 2023

ksobolew commented Sep 15, 2023

ksobolew commented Sep 28, 2023

ksobolew commented Sep 28, 2023

ksobolew commented Sep 28, 2023

ksobolew commented Sep 28, 2023 • edited Loading

ksobolew commented Sep 28, 2023

findepi commented Sep 29, 2023

findepi commented Sep 29, 2023

github-actions bot commented Sep 29, 2023 • edited Loading

findepi commented Sep 29, 2023

ksobolew commented Oct 2, 2023

github-actions bot commented Jan 11, 2024

mosabua commented Jan 11, 2024

ksobolew commented Jan 12, 2024

ksobolew commented Sep 5, 2023 •

edited

Loading

ksobolew commented Sep 28, 2023 •

edited

Loading

github-actions bot commented Sep 29, 2023 •

edited

Loading