-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spark: Throw exception on ALTER VIEW <viewName> AS <query>
#9510
Conversation
6394bb5
to
fb714a6
Compare
ebfb691
to
477bd67
Compare
823df91
to
907a5f1
Compare
@@ -36,6 +38,9 @@ object CheckViews extends (LogicalPlan => Unit) { | |||
verifyColumnCount(ident, columnAliases, query) | |||
SchemaUtils.checkColumnNameDuplication(query.schema.fieldNames, SQLConf.get.resolver) | |||
|
|||
case AlterViewAs(ResolvedV2View(_, _, _), _, query) => | |||
SchemaUtils.checkColumnNameDuplication(query.schema.fieldNames, SQLConf.get.resolver) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't have to verify column counts here because ALTER doesn't support specifying column aliases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This worries me. Are the aliases from the original create expected to be applied to the new view version? I don't think that the expected behavior is for the column comments and aliases to go away.
For example, if I ran these two statements, what should happen?
-- original create command
CREATE VIEW daily_signups (
signup_count COMMENT 'Number of signups',
day COMMENT 'Signup date') AS
SELECT count(1), day FROM signups GROUP BY day
-- updated view with DISTINCT to deduplicate
ALTER VIEW daily_signups AS
SELECT count(distinct email), day FROM signups GROUP BY day
To me, it seems like this should work and result in the same view schema, with signup_count
as the first column and the column docs preserved. That probably means that we need to store the column names and comments separately after all, or at least flag that they were set somewhere (view properties?).
Also, it looks like the indentation is inconsistent between this and other cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this is a weird case and I also had a different expectation when initially running into this. That's why I added the alterViewWithUpdatedQueryColumns()
test to make this behavior explicit.
I also looking into how V1 views behave here and they do the same thing (aka losing the column comments from the original CREATE):
spark-sql (default)> create view iceberg1.v10 (x COMMENT 'comment 1', y COMMENT 'comment 2') AS SELECT count(id), zip from iceberg1.foo group by zip;
spark-sql (default)> show create table iceberg1.v10;
CREATE VIEW iceberg1.v10 (
x COMMENT 'comment 1',
y COMMENT 'comment 2')
TBLPROPERTIES (
'transient_lastDdlTime' = '1706538575')
AS SELECT count(id) AS cnt, zip from iceberg1.foo group by zip
Time taken: 0.055 seconds, Fetched 1 row(s)
spark-sql (default)> describe extended iceberg1.v10;
x bigint comment 1
y int comment 2
# Detailed Table Information
Catalog spark_catalog
Database iceberg1
Table v10
Owner nastra
Created Time Mon Jan 29 15:29:35 CET 2024
Last Access UNKNOWN
Created By Spark 3.5.0
Type VIEW
View Text SELECT count(id) AS cnt, zip from iceberg1.foo group by zip
View Original Text SELECT count(id) AS cnt, zip from iceberg1.foo group by zip
View Catalog and Namespace spark_catalog.default
View Query Output Columns [cnt, zip]
Table Properties [transient_lastDdlTime=1706538575]
Serde Library org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat org.apache.hadoop.mapred.SequenceFileInputFormat
OutputFormat org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
Storage Properties [serialization.format=1]
Time taken: 0.063 seconds, Fetched 21 row(s)
spark-sql (default)> alter view iceberg1.v10 AS SELECT count(zip) AS cnt, zip from iceberg1.foo group by zip;
Time taken: 0.213 seconds
spark-sql (default)> show create table iceberg1.v10;
CREATE VIEW iceberg1.v10 (
cnt,
zip)
TBLPROPERTIES (
'transient_lastDdlTime' = '1706538634')
AS SELECT count(zip) AS cnt, zip from iceberg1.foo group by zip
Time taken: 0.029 seconds, Fetched 1 row(s)
spark-sql (default)> describe extended iceberg1.v10;
cnt bigint
zip int
# Detailed Table Information
Catalog spark_catalog
Database iceberg1
Table v10
Owner nastra
Created Time Mon Jan 29 15:29:35 CET 2024
Last Access UNKNOWN
Created By Spark 3.5.0
Type VIEW
View Text SELECT count(zip) AS cnt, zip from iceberg1.foo group by zip
View Original Text SELECT count(zip) AS cnt, zip from iceberg1.foo group by zip
View Catalog and Namespace spark_catalog.default
View Query Output Columns [cnt, zip]
Table Properties [transient_lastDdlTime=1706538634]
Serde Library org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat org.apache.hadoop.mapred.SequenceFileInputFormat
OutputFormat org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
Storage Properties [serialization.format=1]
Time taken: 0.06 seconds, Fetched 21 row(s)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To summarize this topic, we discussed this offline and decided that we should not support ALTER VIEW ... AS
for now and rather throw an exception (indicating to use CREATE OR REPLACE VIEW
), as it leads to surprising behaviors that column aliases/comments are lost after updating the underlying SQL.
....5/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveViews.scala
Outdated
Show resolved
Hide resolved
|
||
assertThatThrownBy(() -> sql("ALTER VIEW %s AS %s", viewName, sql)) | ||
.isInstanceOf(AnalysisException.class) | ||
.hasMessageContaining("Cannot create the persistent object") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the error message in Spark unfortunately doesn't distinguish between create/alter cases and throws that error for V1 views as well
907a5f1
to
5c9bf1b
Compare
spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestViews.java
Show resolved
Hide resolved
@@ -123,4 +132,17 @@ case class RewriteViewCommands(spark: SparkSession) extends Rule[LogicalPlan] wi | |||
|
|||
collectTempViews(child) | |||
} | |||
|
|||
private object IsIcebergView { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Is
for this name doesn't make sense because it doesn't produce a boolean. It produces a loaded v2 view. I don't even think that the v2 view needs to be an Iceberg view.
I think this should probably be renamed to ResolvedView
and the other should be named ResolvedIdentifier
or something. I also don't like the duplication across these two helpers. It seems like this is doing what the other rule does, but should instead be delegating to other rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, could this unapply
method be added to ResolvedView
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like those new names better. I've updated the existing one to ResolvedIdent
as otherwise it clashes with the existing ResolvedIdentifier
. Unfortunately both unapply
methods can't be combined into ResolvedView
@@ -60,17 +63,23 @@ case class RewriteViewCommands(spark: SparkSession) extends Rule[LogicalPlan] wi | |||
properties = properties, | |||
allowExisting = allowExisting, | |||
replace = replace) | |||
|
|||
case a@AlterViewAs(ResolvedV2View(_, ident, _), _, query) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like that this depends on the next case running before this one. I think that I would combine the two.
I think this should be:
case a@AlterViewAs(ResolvedView(resolved), _, query) =>
val q = CTESubstitution.apply(query)
verifyTemporaryObjectsDontExist(ident, q)
a.copy(child=resolved, query = q)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we want to go this route, then we'd also need to add the same handling for SetViewProperties(UnresolvedView(...))
/ UnsetViewProperties(UnresolvedView(...))
, hence it seemed better to me to just handle case u@UnresolvedView(ResolvedView(resolved), _, _, _)
separately.
I'm fine either way, so let me know if you'd still like me to switch to the approach you proposed.
.../main/scala/org/apache/spark/sql/execution/datasources/v2/ExtendedDataSourceV2Strategy.scala
Outdated
Show resolved
Hide resolved
...nsions/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/views/ResolvedV2View.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks pretty good for the properties cases, but I think we have a major question around the ALTER VIEW ... AS case. @nastra, can you separate those two implementations? There's no reason to mix them together and we can move forward with the view property updates quickly.
5c36b4d
to
9c6bcf1
Compare
ALTER VIEW <viewName> AS <query>
@@ -60,6 +62,9 @@ case class RewriteViewCommands(spark: SparkSession) extends Rule[LogicalPlan] wi | |||
properties = properties, | |||
allowExisting = allowExisting, | |||
replace = replace) | |||
|
|||
case AlterViewAs(ResolvedV2View(_, _), _, _) => | |||
throw new AnalysisException("ALTER VIEW <viewName> AS is not supported. Use CREATE OR REPLACE VIEW instead") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to happen here instead of in CheckViews
so that Spark doesn't throw a different analysis exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I threw the error here mainly because when we supported AlterViewAs
, most of the things had to happen in RewriteViewCommands
but I've moved this now to CheckViews
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
Throwing an exception in RewriteViewCommands
is a little suspicious, but I think that's to avoid Spark throwing exceptions in rules rather than in the validation phase so it's okay.
`ALTER VIEW <viewName> AS <query>` doesn't allow to preserve column aliases and column comment when the underlying query is changed, which can lead to unexpected behavior. For now it's better to use `CREATE OR REPLACE VIEW` as that is more explicit when the schema of the view is defined with column aliases/comments.
9c6bcf1
to
b646058
Compare
…9510) `ALTER VIEW <viewName> AS <query>` doesn't allow to preserve column aliases and column comment when the underlying query is changed, which can lead to unexpected behavior. For now it's better to use `CREATE OR REPLACE VIEW` as that is more explicit when the schema of the view is defined with column aliases/comments.
ALTER VIEW <viewName> AS <query>
doesn't allow to preserve column aliases and column comment when the underlying query is changed, which can lead to unexpected behavior. For now it's better to useCREATE OR REPLACE VIEW
as that is more explicit when the schema of the view is defined with column aliases/comments.PS: Tests will pass once the
RewriteViewCommands
changes from #9582 are merged