-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for creating and dropping tables using Iceberg object store #20555
Conversation
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
Re-created from #20516 |
cda8e04
to
4f83e8a
Compare
This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua |
Any thoughts @amogh-jahagirdar |
This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua |
Closing this pull request, as it has been stale for six weeks. Feel free to re-open at any time. |
I added the stale-ignore label so the PR stays open. Please rebase as a next step. |
6ed669a
to
a7454d1
Compare
Hi @mosabua, pushed the rebase in. Looks like the failed check timed out but I'm not able to restart it. |
Please ask for help on the #iceberg or the #core-dev channel |
ed46a94
to
e452e6d
Compare
// TODO: support path override in Iceberg table creation: https://github.com/trinodb/trino/issues/8861 | ||
if (table.properties().containsKey(OBJECT_STORE_PATH) || | ||
table.properties().containsKey("write.folder-storage.path") || // Removed from Iceberg as of 0.14.0, but preserved for backward compatibility | ||
table.properties().containsKey(WRITE_METADATA_LOCATION) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why was WRITE_METADATA_LOCATION removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I totally follow. I'm not removing the property, just its usage here since Trino will be able to drop tables created with Iceberg's object storage file layout with this change
Adding support for updating this table property would be useful, see It is a common to create a table and only later encounter object storage rate limiting. Not allowing updates would make this workflow to fix rate limiting a pain. Not sure if updates should go in this PR or another one @ebyhr but heads up. |
38fe6eb
to
78224d6
Compare
That's a good callout. I'll add it in this PR |
@jakelong95 I already push the code to support |
Yeah, saw that you added adding in object_store_enabled. I'm adding in support for modifying data_location since the two properties are related |
af9410b
to
510f032
Compare
@ebyhr Just pushed up the following changes:
Let me know if you have any more questions or concerns! |
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergConfig.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergUtil.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergUtil.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergUtil.java
Outdated
Show resolved
Hide resolved
...berg/src/test/java/io/trino/plugin/iceberg/TestIcebergTableWithTablePropertyObjectStore.java
Outdated
Show resolved
Hide resolved
...roduct-tests/src/main/java/io/trino/tests/product/iceberg/TestIcebergSparkCompatibility.java
Outdated
Show resolved
Hide resolved
...roduct-tests/src/main/java/io/trino/tests/product/iceberg/TestIcebergSparkCompatibility.java
Outdated
Show resolved
Hide resolved
...roduct-tests/src/main/java/io/trino/tests/product/iceberg/TestIcebergSparkCompatibility.java
Outdated
Show resolved
Hide resolved
c0690b8
to
5bab5a9
Compare
0a268a3
to
7f55756
Compare
0c7e918
to
98c26d0
Compare
@ebyhr fixed the failing unit test that I had added. A new failure from one of the existing unit tests |
1b827b6
to
baafef2
Compare
baafef2
to
9f42c18
Compare
Description
Currently, Trino is only able to write to and read from tables that use Iceberg's
ObjectStorageLocationProvider
, but is unable to create or drop tables using the location provider.This PR enables Trino to create tables using Iceberg's object storage by adding the following properties:
iceberg.object-store.enabled
- Corresponds with Spark'swrite.object-storage.enabled
, which enables use of Iceberg's ObjectStorageLocationProvidericeberg.data-location
- Corresponds to Spark'swrite.data.path
, which sets where Iceberg data files will be writtenEnabling the object store property and setting the data path will cause Iceberg to provide data file locations prefixed by a deterministic hash generated from the file name in the location specified by
write.data.path
, which will help reduce throttling from cloud storage systems like S3 by evenly distributing files across multiple prefixes. Iceberg's own documentation on this feature has more information.For example, without the object store enabled you would get the following locations for data files, all under the same prefix
iceberg-tables/myschema/mytable/data
:s3://mybucket/iceberg-tables/myschema/mytable/data/file1.parquet
s3://mybucket/iceberg-tables/myschema/mytable/data/file2.parquet
s3://mybucket/iceberg-tables/myschema/mytable/data/file3.parquet
But, if you enable the object store, you would get the following locations, each in their own prefix:
s3://mybucket/iceberg-tables/myschema/mytable/data/<file1 hash>/file1.parquet
s3://mybucket/iceberg-tables/myschema/mytable/data/<file2 hash>/file2.parquet
s3://mybucket/iceberg-tables/myschema/mytable/data/<file3 hash>/file3.parquet
Additional context and related issues
This PR maintains compatibility with Spark by using the table properties
write.object-storage.enabled
andwrite.data.path
, which had previously been set up to allow Trino to write to Iceberg tables using theObjectStorageLocationProvider
in #8573Fixes #8861
Release notes
(x) Release notes are required, with the following suggested text: