-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spark: Test reading default values in Spark #11832
Conversation
.isInstanceOf(IllegalArgumentException.class) | ||
.hasMessage("Missing required field: missing_str"); | ||
.hasRootCauseInstanceOf(IllegalArgumentException.class) | ||
.hasMessageContaining("Missing required field: missing_str"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was needed to validate the reader failure in testMissingRequiredWithoutDefault
in Spark scans because the failure happens on executors and is wrapped in SparkException
when it is thrown on the driver.
@@ -542,44 +539,4 @@ public void testPrimitiveTypeDefaultValues(Type.PrimitiveType type, Object defau | |||
|
|||
writeAndValidate(writeSchema, readSchema); | |||
} | |||
|
|||
protected void withSQLConf(Map<String, String> conf, Action action) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was unused.
@@ -96,11 +96,20 @@ public static void assertEqualsSafe(Types.StructType struct, List<Record> recs, | |||
|
|||
public static void assertEqualsSafe(Types.StructType struct, Record rec, Row row) { | |||
List<Types.NestedField> fields = struct.fields(); | |||
for (int i = 0; i < fields.size(); i += 1) { | |||
Type fieldType = fields.get(i).type(); | |||
for (int readPos = 0; readPos < fields.size(); readPos += 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import org.junit.jupiter.api.Test; | ||
import org.junit.jupiter.api.io.TempDir; | ||
|
||
public abstract class DataFrameWriteTestBase extends ScanTestBase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New base suite for tests of data frame writes, which replaces TestDataFrameWrites
and ParameterizedAvroDataTest
.
import org.junit.jupiter.api.io.TempDir; | ||
|
||
/** An AvroDataScan test that validates data by reading through Spark */ | ||
public abstract class ScanTestBase extends AvroDataTest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New base class for scan tests (TestAvroScan
, TestParquetScan
, TestParquetVectorizedScan
).
|
||
@Parameters(name = "format = {0}") | ||
public static Collection<String> parameters() { | ||
return Arrays.asList("parquet", "avro", "orc"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was broken into DataFrameWriteTestBase
and subclasses for each format:
TestAvroDataFrameWrite
TestParquetDataFrameWrite
TestORCDataFrameWrite
} | ||
|
||
@TestTemplate | ||
public void testWriteWithCustomDataLocation() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replaced by DataFrameWriteTestBase#testAlternateLocation
.
} | ||
|
||
@TestTemplate | ||
public void testNullableWithWriteOption() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assumes Spark 2.x so this is no longer needed.
} | ||
|
||
@TestTemplate | ||
public void testNullableWithSparkSqlOption() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assumes Spark 2.x so this is no longer needed.
} | ||
|
||
@TestTemplate | ||
public void testFaultToleranceOnWrite() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dropped this test because it is testing basic Spark behavior and doesn't belong in scan and write tests for specific schemas. I didn't move it anywhere because I don't think it is a valuable test. Spark stage failures throw exceptions and don't commit. I think it was originally trying to check for side-effects, but that isn't necessary in Iceberg.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the reviews, @Fokko! |
This updates Spark's tests for scans and data frame writes to validate default values.
This fixes problems found in testing:
ReassignIds
was dropping defaultsSchemaParser
did not support eitherinitial-default
orwrite-default
SchemaParser
did not have a test suiteThis also refactors the data frame writer tests and removes
ParameterizedAvroDataTest
that was an unnecessary copy ofAvroDataTest
. To avoid needing the duplicate test suite, this updates the tests to inherit from a base class like the scan tests. Last, there were a few unnecessary tests that have been removed. One was testing basic Spark behavior (no commit if an action fails) and the others were only valid for Spark 2.x.