Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hive: Add View support for HIVE catalog #8907

Closed
wants to merge 1 commit into from
Closed

Conversation

nk1506
Copy link
Contributor

@nk1506 nk1506 commented Oct 23, 2023

Issues #8698

@nk1506 nk1506 marked this pull request as draft October 23, 2023 13:12
@nk1506 nk1506 changed the title Core: Add View support for HIVE catalog Hive: Add View support for HIVE catalog Oct 25, 2023
@nk1506 nk1506 marked this pull request as ready for review October 27, 2023 01:13
@nk1506 nk1506 force-pushed the hive_view branch 2 times, most recently from dc5d0c7 to b3c8edc Compare October 28, 2023 18:41
@ajantha-bhat ajantha-bhat requested a review from pvary October 30, 2023 05:42
@nk1506 nk1506 force-pushed the hive_view branch 3 times, most recently from 8625c9d to aaaa9a4 Compare November 2, 2023 09:45
table = metaClients.run(client -> client.getTable(database, tableName));
HiveOperationsBase.validateTableOrViewIsIceberg(table, fullName);

if (table.getTableType().equalsIgnoreCase(TableType.VIRTUAL_VIEW.name())) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Hive Table is not iceberg table new metadataLocation should always be null.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC the decision was to do not use the HMS tableType for this. Shouldn't we use BaseMetastoreTableOperations.TABLE_TYPE_PROP property instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have segregated Iceberg table and Iceberg View check here.

Comment on lines 85 to 87
if (!table.getTableType().equalsIgnoreCase(TableType.VIRTUAL_VIEW.name())) {
throw new NoSuchObjectException();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not throw the final exception immediately?

@ajantha-bhat
Copy link
Member

@nk1506, @pvary: I have added this PR to upcoming 1.5.0 release.
I will also do one round of review tomorrow.

@nk1506: Please consolidate what comments are pending as now for this PR.
We can discuss what can be fixed in this PR and what can be done as a follow up.

@nk1506
Copy link
Contributor Author

nk1506 commented Jan 11, 2024

@ajantha-bhat / @pvary I think only major comment left here is Should we use lock before committing view table or not ? If yes should we do with this PR ?

@ajantha-bhat
Copy link
Member

@nk1506: Not just major, I want to conclude on all the comments that are open. So, we can move forward.

@ajantha-bhat
Copy link
Member

I agree that we do need these improvements for views

  • purge support
  • Common interfaces for TableOperations + ViewOperations.
  • Common interfaces for TableMetadata + ViewMetadata.

But I don't think we should drag the view support for Hive catalog since REST, Nessie, JDBC already supports it.
Can we handle them in a follow up? I did see that we have opened the discussions on mailing list.

@pvary, @rdblue, @jbonofre, @nastra : What is your opinion on this?

@nk1506
Copy link
Contributor Author

nk1506 commented Jan 18, 2024

I agree that we do need these improvements for views

  • purge support
  • Common interfaces for TableOperations + ViewOperations.
  • Common interfaces for TableMetadata + ViewMetadata.

But I don't think we should drag the view support for Hive catalog since REST, Nessie, JDBC already supports it. Can we handle them in a follow up? I did see that we have opened the discussions on mailing list.

@pvary, @rdblue, @jbonofre, @nastra : What is your opinion on this?

I have created an issue for abstracting out ViewMetadata and TableMetadata.

@jbonofre
Copy link
Member

It makes sense to me to split in two parts:

  1. First the view support on Hive catalog "as it is", to be consistent with other catalogs
  2. Prepare a followup with the proposed improvements that could be leverage by all catalogs.

@pvary
Copy link
Contributor

pvary commented Jan 19, 2024

But I don't think we should drag the view support for Hive catalog since REST, Nessie, JDBC already supports it.

I am not sure that I would like to support a feature in production where manual cleanup is needed after deleting an object.
To be honest, I prefer not support something, until we have a technically robust implementation. Implementing views without locks would cause view history corruption, copy-pasting code will cause maintainability issues in the medium-long term. Just consider the decision, that we did not create a common ancestor for the metadata....
This kind of technical debt could be crippling for an OSS project, as everyone would wait for others to fix the accumulated issues.

@jbonofre
Copy link
Member

@pvary I agree. Maybe I wasn't clear, my point was:

  • abstracting behaviors common to table and view makes sense but not a blocker for this PR imho
  • having something consistent about delete specifically to Hive (like locks, etc) is a requirement. I agree with you.

So, if you propose the later, I agree and it should be part of this PR. For the first point, not blocker for this PR and other PRs related to views.

@pvary
Copy link
Contributor

pvary commented Jan 19, 2024

  • abstracting behaviors common to table and view makes sense but not a blocker for this PR imho

If the community decides against abstracting out the common part of the Metadata, then we can decide how to handle the situation. On the other hand, until the discussion is ongoing, I would not run ahead with a hopefully temporary solution.

.filter(
table ->
table.getParameters() != null
&& BaseMetastoreTableOperations.ICEBERG_TABLE_TYPE_VALUE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned that this is changing behavior for the catalog. There was no filter on external table types previously and I don't believe that was actually required.


// If we try to create the view but the metadata location is already set, then we had a
// concurrent commit
if (newView
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check doesn't look correct. If this is a newView there shouldn't be an existing view definition. What if this is an existing hive view that does not have a location set? It seems that this should fail if a view is found but you're attempting to create a new one.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell, I think this is borrowed from the HiveTableOperations::doCommit() code. There it makes a bit more sense as we get a lock, do a check, then fail if there's an existing table (before we write our table).

It seems the change to abstract out common logic is in https://github.com/apache/iceberg/pull/9461/files

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to follow the table behaviour . How concurrent commits will be different in case of View ? Am i missing something here?

LOG.debug("Committing existing view: {}", fullName);
} else {
view =
newHmsTable(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When creating the table/view definition in hms, we should be setting something in the original and expanded text fields to indicate that this is an Iceberg view. This can help if something non-iceberg attempts to load the view definition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we read this from metadata properties or add a constant ?

Copy link
Collaborator

@szehon-ho szehon-ho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look forward to this pr. Looks like some of this is pending #9461, so will sense for me to look at that first

@@ -284,7 +284,7 @@ private Map<String, String> tableOverrideProperties() {
}
}

protected static String fullTableName(String catalogName, TableIdentifier identifier) {
public static String fullTableName(String catalogName, TableIdentifier identifier) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks a bit strange to have a util class in Base class, can we move it to some util (for example, CatalogUtil?)


// If we try to create the view but the metadata location is already set, then we had a
// concurrent commit
if (newView
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell, I think this is borrowed from the HiveTableOperations::doCommit() code. There it makes a bit more sense as we get a lock, do a check, then fail if there's an existing table (before we write our table).

It seems the change to abstract out common logic is in https://github.com/apache/iceberg/pull/9461/files

if (currentMetadataLocation() != null && !currentMetadataLocation().isEmpty()) {
parameters.put(PREVIOUS_METADATA_LOCATION_PROP, currentMetadataLocation());
}
parameters.put(TABLE_TYPE_PROP, ICEBERG_TABLE_TYPE_VALUE.toUpperCase(Locale.ENGLISH));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this should be view property?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in case of Hive-Table is should be always ICEBERG_TABLE_TYPE_VALUE = "iceberg";

@nk1506
Copy link
Contributor Author

nk1506 commented Apr 4, 2024

Closing this PR as same has been addressed with ongoing PR .

@nk1506 nk1506 closed this Apr 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants