Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core: use ZSTD compressed parquet by default #8158

Closed
wants to merge 9 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -291,7 +291,7 @@ public void testSelectNestedValues() throws Exception {
public void testNestedValues() throws Exception {
createNestedTable();

Row leafDoubleCol = Row.of(53L, 3L, 1L, 1L, 0.0D, 0.0D);
Row leafDoubleCol = Row.of(46L, 3L, 1L, 1L, 0.0D, 0.0D);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@szehon-ho @RussellSpitzer do you know what leafDoubleCol is? Is it some kind of metrics that could be changed if the compression codec is changed?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I think it include the size, sorry for lack of comment, lets disable zstd in :

    Table table =
        catalog.createTable(
            TableIdentifier.of(Namespace.of(database()), tableName()),
            PRIMITIVE_SCHEMA,
            PartitionSpec.unpartitioned(),
            ImmutableMap.of());

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I can confirm it's the metric of size. Because of changing to zstd, the sizes are reduced overall. I updated it to the new metrics. We can have a followup PR to change those metric related test with uncompressed parquet.

Copy link
Collaborator

@szehon-ho szehon-ho Jul 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dbtsai , i looked into it. I think we have to put it where we write data files, but its a bigger change. There's a test helper we use called FileHelpers.writeDataFile() and we need to change the code to take in map of properties.

Then,

  public static DataFile writeDataFile(Table table, OutputFile out, List<Record> rows, Map<String, String> properties)
      throws IOException {
    FileFormat format = defaultFormat(table.properties());
    GenericAppenderFactory factory = new GenericAppenderFactory(table.schema());
    properties.forEach(factory::set);

but its ok to do in separate pr if you want.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let’s do it in a followup PR to reduce the changes

Copy link
Collaborator

@szehon-ho szehon-ho Jul 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chat offline with @dbtsai , will make follow up to make expected metrics take from DataFile.fileSizeInBytes()

Row leafLongCol = Row.of(54L, 3L, 1L, null, 0L, 1L);
Row metrics = Row.of(Row.of(leafDoubleCol, leafLongCol));

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -291,7 +291,7 @@ public void testSelectNestedValues() throws Exception {
public void testNestedValues() throws Exception {
createNestedTable();

Row leafDoubleCol = Row.of(53L, 3L, 1L, 1L, 0.0D, 0.0D);
Row leafDoubleCol = Row.of(46L, 3L, 1L, 1L, 0.0D, 0.0D);
Row leafLongCol = Row.of(54L, 3L, 1L, null, 0L, 1L);
Row metrics = Row.of(Row.of(leafDoubleCol, leafLongCol));

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -291,7 +291,7 @@ public void testSelectNestedValues() throws Exception {
public void testNestedValues() throws Exception {
createNestedTable();

Row leafDoubleCol = Row.of(53L, 3L, 1L, 1L, 0.0D, 0.0D);
Row leafDoubleCol = Row.of(46L, 3L, 1L, 1L, 0.0D, 0.0D);
Row leafLongCol = Row.of(54L, 3L, 1L, null, 0L, 1L);
Row metrics = Row.of(Row.of(leafDoubleCol, leafLongCol));

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -326,7 +326,7 @@ public void testSelectNestedValues() throws Exception {
public void testNestedValues() throws Exception {
createNestedTable();

Object[] leafDoubleCol = row(53L, 3L, 1L, 1L, 0.0D, 0.0D);
Object[] leafDoubleCol = row(46L, 3L, 1L, 1L, 0.0D, 0.0D);
Object[] leafLongCol = row(54L, 3L, 1L, null, 0L, 1L);
Object[] metrics = row(leafDoubleCol, leafLongCol);

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -353,7 +353,7 @@ public void testSelectNestedValues() throws Exception {
public void testNestedValues() throws Exception {
createNestedTable();

Object[] leafDoubleCol = row(53L, 3L, 1L, 1L, 0.0D, 0.0D);
Object[] leafDoubleCol = row(46L, 3L, 1L, 1L, 0.0D, 0.0D);
Object[] leafLongCol = row(54L, 3L, 1L, null, 0L, 1L);
Object[] metrics = row(leafDoubleCol, leafLongCol);

Expand Down