Core: use ZSTD compressed parquet by default #8158

dbtsai · 2023-07-26T21:57:30Z

In memory of @kbendick as he was making zstd parquet as default at Apple. He conducted the extensive benchmarking and internal testing; his valuable findings recommending to adopt zstd parquet as default.

This PR modifies the default Iceberg parquet compression codec from gzip to zstd.

Currently, Iceberg employs gzip compression as the default option. However, based on our benchmark results, we have found that zstd-compressed parquet files consistently exhibit faster compression and decompression speeds compared to gzip parquet files. Additionally, zstd parquet files are generally slightly smaller in size than their gzip counterparts. As a result of these findings, Trino has already made the switch from gzip to zstd as its Iceberg parquet codec in trinodb/trino#10045

RussellSpitzer

I think this is a good idea, we do recommend this internally.

dbtsai · 2023-07-27T00:14:07Z

Look like there are issues in using Zstd with nested data in both Flink and Spark. This can be reproduced by

./gradlew -DsparkVersions=3.4 -DscalaVersion=2.13 -DhiveVersions= -DflinkVersions= \
    :iceberg-spark:iceberg-spark-3.4_2.13:test \
    --tests "org.apache.iceberg.spark.source.TestMetadataTableReadableMetrics" \
    -Pquick=true -x javadoc

for Spark, and

./gradlew -DsparkVersions= -DhiveVersions= -DflinkVersions=1.15 \
    :iceberg-flink:iceberg-flink-1.15:test \
    --tests "org.apache.iceberg.flink.source.TestMetadataTableReadableMetrics" \
    -Pquick=true -x javadoc

for Flink.

Will look into the issue soon.

dbtsai · 2023-07-27T17:11:38Z

...15/flink/src/test/java/org/apache/iceberg/flink/source/TestMetadataTableReadableMetrics.java

@@ -291,7 +291,7 @@ public void testSelectNestedValues() throws Exception {
  public void testNestedValues() throws Exception {
    createNestedTable();

-    Row leafDoubleCol = Row.of(53L, 3L, 1L, 1L, 0.0D, 0.0D);
+    Row leafDoubleCol = Row.of(46L, 3L, 1L, 1L, 0.0D, 0.0D);


@szehon-ho @RussellSpitzer do you know what leafDoubleCol is? Is it some kind of metrics that could be changed if the compression codec is changed?

Yea I think it include the size, sorry for lack of comment, lets disable zstd in :

Table table = catalog.createTable( TableIdentifier.of(Namespace.of(database()), tableName()), PRIMITIVE_SCHEMA, PartitionSpec.unpartitioned(), ImmutableMap.of());

Yeah, I can confirm it's the metric of size. Because of changing to zstd, the sizes are reduced overall. I updated it to the new metrics. We can have a followup PR to change those metric related test with uncompressed parquet.

@dbtsai , i looked into it. I think we have to put it where we write data files, but its a bigger change. There's a test helper we use called FileHelpers.writeDataFile() and we need to change the code to take in map of properties.

Then,

public static DataFile writeDataFile(Table table, OutputFile out, List<Record> rows, Map<String, String> properties) throws IOException { FileFormat format = defaultFormat(table.properties()); GenericAppenderFactory factory = new GenericAppenderFactory(table.schema()); properties.forEach(factory::set);

but its ok to do in separate pr if you want.

Let’s do it in a followup PR to reduce the changes

Chat offline with @dbtsai , will make follow up to make expected metrics take from DataFile.fileSizeInBytes()

RussellSpitzer · 2023-07-28T16:21:24Z

Since this is a change to defaults I do want to give other folks a chance to chime in, but I think this is a very safe change for new tables and won't effect older ones so I'm inclined to get this in for the next minor Iceberg release.

szehon-ho · 2023-07-28T17:35:42Z

I also mentioned same thing, to wait a day or two for any other comment.

this is a very safe change for new tables and won't effect older ones

Wouldn't this affect all existing tables where this property is not explicitly set to GZIP/UNCOMPRESSED?

RussellSpitzer · 2023-07-28T18:04:51Z

I also mentioned same thing, to wait a day or two for any other comment.

this is a very safe change for new tables and won't effect older ones

Wouldn't this affect all existing tables where this property is not explicitly set to GZIP/UNCOMPRESSED?

There was a bug where default values were not getting persisted, but they are supposed to

aokolnychyi · 2023-07-28T20:49:23Z

core/src/main/java/org/apache/iceberg/TableProperties.java


  public static final String PARQUET_COMPRESSION_LEVEL = "write.parquet.compression-level";
  public static final String DELETE_PARQUET_COMPRESSION_LEVEL =
      "write.delete.parquet.compression-level";
-  public static final String PARQUET_COMPRESSION_LEVEL_DEFAULT = null;
+  public static final String PARQUET_COMPRESSION_LEVEL_DEFAULT =
+      null; // For zstd, it is default to "3"


How valuable is this comment given than it is specific to the underlying codec and may change in the future?

Agree, like we don't mention gzip is default to "6".

aokolnychyi · 2023-07-28T20:54:22Z

I generally support switching to zstd by default but I think @szehon-ho is right that it will effect not only new tables but also existing tables where the codec value was not set explicitly. Is that a problem? Maybe? We can consider doing that only for new tables but I am not sure it is worth the extra complexity. If someone did not configure the codec, most likely it does not matter for them so they won't even notice this change?

Thoughts, @rdblue @jackye1995 @stevenzwu @nastra @danielcweeks?

stevenzwu · 2023-07-29T16:39:11Z

it will effect not only new tables but also existing tables where the codec value was not set explicitly. Is that a problem?

My main question is on the dep. does all runtime env has zstd ready? we know gzip is.

manuzhang · 2023-08-01T06:25:57Z

If someone did not configure the codec, most likely it does not matter for them so they won't even notice this change?

End users won't notice, even if we only switch the default for new tables. Platform admins need to be aware of the change and handle any dependency issues or side effects for end users. IMO, it's more about how we communicate it to users.

dbtsai · 2023-08-01T18:46:11Z

To avoid surprise to our users, I agree with the above comments. We should address the following two items as followup PRs.

Persistent the default value of compression codec in write.parquet.compression-codec for new table.
For existing table that doesn't set write.parquet.compression-codec, we should default to gzip, and set it.

@stevenzwu I believe most of the modern runtime supports zstd parquet. Trino switched to use zstd parquet as default for Iceberg table for almost 2 years, and we never hear any compatibility issue.

rustyconover · 2023-08-02T01:41:33Z

I'd suggest that the compression level of ZSTD is important to consider.

ZSTD has a wide range of compression levels and the choice of the default level is important for write speed (and file size). Decompression speed isn't as sensitive to change based on the compression level.

pan3793 · 2023-08-02T04:27:15Z

I'd suggest that the compression level of ZSTD is important to consider.

ZSTD has a wide range of compression levels and the choice of the default level is important for write speed (and file size). Decompression speed isn't as sensitive to change based on the compression level.

+1, I think using 3 as the default compression level is not a good idea.

Based on my experience, 0~3 is suitable for transient data, e.g. Spark shuffle data. Typically for persisting data, e.g. warehouse, ~9 is recommended.

dbtsai · 2023-08-02T04:37:04Z

It's a very nice property that the decompression time is constant to different compression level with zstd, and users can pay one time cost for archive data. The current default value 3 is already smaller than gzip with significant speedup. We can evaluate if we want to change the default value to higher compression level in a followup PR with more experiments.

nastra · 2023-08-03T13:09:29Z

.palantir/revapi.yml

+    - code: "java.field.constantValueChanged"
+      old: "field org.apache.iceberg.TableProperties.PARQUET_COMPRESSION_DEFAULT"
+      new: "field org.apache.iceberg.TableProperties.PARQUET_COMPRESSION_DEFAULT"
+      justification: "{Changing the default compression codec from gzip to zstd}"


Suggested change

justification: "{Changing the default compression codec from gzip to zstd}"

justification: "Changing the default compression codec from gzip to zstd"

Addressed. Thanks.

dramaticlly

thank you @dbtsai.

I did some test in the past, size reduction and write speed up is obvious win but read might need a bit tunning on compression level. I am wondering if there's any quantitative metrics/benchmark you can share on how much was the improvement?

dbtsai · 2023-08-06T04:41:44Z

With some of our dataset, we saw that ZSTD is around 5% smaller than GZIP parquet but 1.5x faster than GZIP Parquet for write, and 1.13x faster for read. This was done a while ago using Spark Parquet reader/writer instead of Iceberg Parquet reader/writer, and I believe we will see similar gain with Iceberg as well.

manuzhang · 2023-08-07T03:12:07Z

@dbtsai both use default compression levels? Uber also did some benchmark on their data but it's two years ago

dbtsai · 2023-08-07T05:49:19Z

@manuzhang we use the default zstd compression level, 3.

aokolnychyi · 2023-08-09T00:46:41Z

Let's discuss this change during the community sync tomorrow.

Fokko · 2023-09-28T19:29:41Z

This has been added by @aokolnychyi in #8593 🥳

dbtsai added 2 commits July 26, 2023 14:26

Change to use zstd by default

1165630

update doc

65e8610

github-actions bot added core docs labels Jul 26, 2023

update

2b88a7b

RussellSpitzer approved these changes Jul 26, 2023

View reviewed changes

style

844e78e

Fix test failures

f2309ad

github-actions bot added spark flink labels Jul 27, 2023

dbtsai commented Jul 27, 2023

View reviewed changes

change data size in metric since the compression codec is changed

f184984

szehon-ho approved these changes Jul 27, 2023

View reviewed changes

aokolnychyi reviewed Jul 28, 2023

View reviewed changes

dbtsai added 2 commits August 1, 2023 10:33

addressed feedback

fbacf2c

addressed feedback

df93fc5

nastra reviewed Aug 3, 2023

View reviewed changes

address feedback

bda3c03

nastra approved these changes Aug 4, 2023

View reviewed changes

dramaticlly approved these changes Aug 4, 2023

View reviewed changes

szehon-ho mentioned this pull request Aug 12, 2023

Core: use ZSTD compression parquet by default for new tables #8299

Closed

szehon-ho mentioned this pull request Aug 31, 2023

Spark 3.4: Add write options to override the compression properties of the table #8313

Merged

Fokko closed this Sep 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Core: use ZSTD compressed parquet by default #8158

Core: use ZSTD compressed parquet by default #8158

dbtsai commented Jul 26, 2023

RussellSpitzer left a comment

dbtsai commented Jul 27, 2023

dbtsai Jul 27, 2023

szehon-ho Jul 27, 2023

dbtsai Jul 27, 2023

szehon-ho Jul 28, 2023 •

edited

Loading

dbtsai Jul 28, 2023

szehon-ho Jul 28, 2023 •

edited

Loading

RussellSpitzer commented Jul 28, 2023

szehon-ho commented Jul 28, 2023

RussellSpitzer commented Jul 28, 2023

aokolnychyi Jul 28, 2023

manuzhang Aug 1, 2023

dbtsai Aug 1, 2023

aokolnychyi commented Jul 28, 2023

stevenzwu commented Jul 29, 2023

manuzhang commented Aug 1, 2023 •

edited

Loading

dbtsai commented Aug 1, 2023

rustyconover commented Aug 2, 2023

pan3793 commented Aug 2, 2023

dbtsai commented Aug 2, 2023

nastra Aug 3, 2023

dbtsai Aug 3, 2023

dramaticlly left a comment •

edited

Loading

dbtsai commented Aug 6, 2023

manuzhang commented Aug 7, 2023

dbtsai commented Aug 7, 2023

aokolnychyi commented Aug 9, 2023

Fokko commented Sep 28, 2023

	justification: "{Changing the default compression codec from gzip to zstd}"
	justification: "Changing the default compression codec from gzip to zstd"

Core: use ZSTD compressed parquet by default #8158

Core: use ZSTD compressed parquet by default #8158

Conversation

dbtsai commented Jul 26, 2023

RussellSpitzer left a comment

Choose a reason for hiding this comment

dbtsai commented Jul 27, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

szehon-ho Jul 28, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

szehon-ho Jul 28, 2023 • edited Loading

Choose a reason for hiding this comment

RussellSpitzer commented Jul 28, 2023

szehon-ho commented Jul 28, 2023

RussellSpitzer commented Jul 28, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aokolnychyi commented Jul 28, 2023

stevenzwu commented Jul 29, 2023

manuzhang commented Aug 1, 2023 • edited Loading

dbtsai commented Aug 1, 2023

rustyconover commented Aug 2, 2023

pan3793 commented Aug 2, 2023

dbtsai commented Aug 2, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dramaticlly left a comment • edited Loading

Choose a reason for hiding this comment

dbtsai commented Aug 6, 2023

manuzhang commented Aug 7, 2023

dbtsai commented Aug 7, 2023

aokolnychyi commented Aug 9, 2023

Fokko commented Sep 28, 2023

szehon-ho Jul 28, 2023 •

edited

Loading

szehon-ho Jul 28, 2023 •

edited

Loading

manuzhang commented Aug 1, 2023 •

edited

Loading

dramaticlly left a comment •

edited

Loading