-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[docs] Replace examples of Hadoop catalog with JDBC catalog #11845
base: main
Are you sure you want to change the base?
Changes from 3 commits
6fe50e1
eb5d30c
0a7b829
fd6efc7
81b3d0a
96d59ee
63c9a1a
94d6a10
9c877dd
e6a64b4
d1c9a16
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -26,7 +26,11 @@ highlight some powerful features. You can learn more about Iceberg's Spark runti | |
- [Writing Data to a Table](#writing-data-to-a-table) | ||
- [Reading Data from a Table](#reading-data-from-a-table) | ||
- [Adding A Catalog](#adding-a-catalog) | ||
- [Next Steps](#next-steps) | ||
- [Configuring JDBC Catalog](#configuring-jdbc-catalog) | ||
- [Configuring REST Catalog](#configuring-rest-catalog) | ||
- [Next steps](#next-steps) | ||
- [Adding Iceberg to Spark](#adding-iceberg-to-spark) | ||
- [Learn More](#learn-more) | ||
|
||
### Docker-Compose | ||
|
||
|
@@ -269,42 +273,104 @@ To read a table, simply use the Iceberg table's name. | |
|
||
### Adding A Catalog | ||
|
||
Iceberg has several catalog back-ends that can be used to track tables, like JDBC, Hive MetaStore and Glue. | ||
Catalogs are configured using properties under `spark.sql.catalog.(catalog_name)`. In this guide, | ||
we use JDBC, but you can follow these instructions to configure other catalog types. To learn more, check out | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. weird that the guide already mention JDBC here, but the example is still hadoop |
||
the [Catalog](docs/latest/spark-configuration.md#catalogs) page in the Spark section. | ||
Apache Iceberg provides several catalog implementations to manage tables and enable SQL operations. | ||
Catalogs are configured using properties under `spark.sql.catalog.(catalog_name)`. | ||
You can configure different catalog types, such as JDBC, Hive Metastore, Glue, and REST, to manage Iceberg tables in Spark. | ||
|
||
This configuration creates a path-based catalog named `local` for tables under `$PWD/warehouse` and adds support for Iceberg tables to Spark's built-in catalog. | ||
This guide covers the configuration of two popular catalog types: | ||
* JDBC Catalog | ||
* REST Catalog | ||
|
||
To learn more, check out the [Catalog](docs/latest/spark-configuration.md#catalogs) page in the Spark section. | ||
|
||
#### Configuring JDBC Catalog | ||
|
||
The JDBC catalog stores Iceberg table metadata in a relational database. | ||
|
||
This configuration creates a JDBC-based catalog named `local` for tables under `$PWD/warehouse` and adds support for Iceberg tables to Spark's built-in catalog. | ||
The JDBC catalog uses file-based SQLite database as the backend. | ||
|
||
=== "CLI" | ||
|
||
```sh | ||
spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:{{ icebergVersion }}\ | ||
spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:{{ icebergVersion }},org.xerial:sqlite-jdbc:3.46.1.3 \ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. taking on this extra dep since i dont see any iceberg specific package i can use. there is a |
||
--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \ | ||
--conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog \ | ||
--conf spark.sql.catalog.spark_catalog.type=hive \ | ||
--conf spark.sql.catalog.local=org.apache.iceberg.spark.SparkCatalog \ | ||
--conf spark.sql.catalog.local.type=hadoop \ | ||
--conf spark.sql.catalog.local.type=jdbc \ | ||
--conf spark.sql.catalog.local.uri=jdbc:sqlite:$PWD/iceberg_catalog_db.sqlite \ | ||
--conf spark.sql.catalog.local.warehouse=$PWD/warehouse \ | ||
--conf spark.sql.defaultCatalog=local | ||
``` | ||
|
||
=== "spark-defaults.conf" | ||
|
||
```sh | ||
spark.jars.packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:{{ icebergVersion }} | ||
spark.jars.packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:{{ icebergVersion }},org.xerial:sqlite-jdbc:3.46.1.3 | ||
spark.sql.extensions org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions | ||
spark.sql.catalog.spark_catalog org.apache.iceberg.spark.SparkSessionCatalog | ||
spark.sql.catalog.spark_catalog.type hive | ||
spark.sql.catalog.local org.apache.iceberg.spark.SparkCatalog | ||
spark.sql.catalog.local.type hadoop | ||
spark.sql.catalog.local.type jdbc | ||
spark.sql.catalog.local.uri jdbc:sqlite:iceberg_catalog_db.sqlite | ||
spark.sql.catalog.local.warehouse $PWD/warehouse | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
spark.sql.defaultCatalog local | ||
``` | ||
|
||
!!! note | ||
If your Iceberg catalog is not set as the default catalog, you will have to switch to it by executing `USE local;` | ||
|
||
#### Configuring REST Catalog | ||
|
||
The REST catalog provides a language-agnostic way to manage Iceberg tables through a RESTful service. | ||
|
||
This configuration creates a REST-based catalog named `rest` for tables under `s3://warehouse/` and adds support for Iceberg tables to Spark's built-in catalog. | ||
The REST catalog uses the `apache/iceberg-rest-fixture` docker container from the `docker-compose.yml` above as the backend service with MinIO for S3-compatible storage. | ||
|
||
=== "CLI" | ||
|
||
```sh | ||
spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:{{ icebergVersion }},org.xerial:sqlite-jdbc:3.46.1.3 \ | ||
--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \ | ||
--conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog \ | ||
--conf spark.sql.catalog.spark_catalog.type=hive \ | ||
--conf spark.sql.catalog.rest=org.apache.iceberg.spark.SparkCatalog \ | ||
--conf spark.sql.catalog.rest.type=rest \ | ||
--conf spark.sql.catalog.rest.uri=http://localhost:8181 \ | ||
--conf spark.sql.catalog.rest.warehouse=s3://warehouse/ \ | ||
--conf spark.sql.catalog.rest.io-impl=org.apache.iceberg.aws.s3.S3FileIO \ | ||
--conf spark.sql.catalog.rest.s3.endpoint=http://localhost:9000 \ | ||
--conf spark.sql.catalog.rest.s3.path-style-access=true \ | ||
--conf spark.sql.catalog.rest.s3.access-key-id=admin \ | ||
--conf spark.sql.catalog.rest.s3.secret-access-key=password \ | ||
--conf spark.sql.catalog.rest.s3.region=us-east-1 \ | ||
--conf spark.sql.defaultCatalog=rest | ||
``` | ||
|
||
=== "spark-defaults.conf" | ||
|
||
```sh | ||
spark.jars.packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:{{ icebergVersion }} | ||
spark.sql.extensions org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions | ||
spark.sql.catalog.spark_catalog org.apache.iceberg.spark.SparkSessionCatalog | ||
spark.sql.catalog.spark_catalog.type hive | ||
spark.sql.catalog.rest org.apache.iceberg.spark.SparkCatalog | ||
spark.sql.catalog.rest.type rest | ||
spark.sql.catalog.rest.uri http://localhost:8181 | ||
spark.sql.catalog.rest.warehouse s3://warehouse/ | ||
spark.sql.catalog.rest.io-impl org.apache.iceberg.aws.s3.S3FileIO | ||
spark.sql.catalog.rest.s3.endpoint http://localhost:9000 | ||
spark.sql.catalog.rest.s3.access-key-id admin | ||
spark.sql.catalog.rest.s3.secret-access-key password | ||
spark.sql.catalog.rest.s3.path-style-access true | ||
spark.sql.catalog.rest.s3.region us-east-1 | ||
spark.sql.defaultCatalog rest | ||
``` | ||
|
||
!!! note | ||
If your Iceberg catalog is not set as the default catalog, you will have to switch to it by executing `USE rest;` | ||
|
||
### Next steps | ||
|
||
#### Adding Iceberg to Spark | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note, there are two "getting started" docs
this one and
site/docs/spark-quickstart.md