-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[docs] Replace examples of Hadoop catalog with JDBC catalog #11845
base: main
Are you sure you want to change the base?
[docs] Replace examples of Hadoop catalog with JDBC catalog #11845
Conversation
00ca569
to
6fe50e1
Compare
496e51e
to
63c9a1a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note, there are two "getting started" docs
this one and site/docs/spark-quickstart.md
@@ -269,42 +273,104 @@ To read a table, simply use the Iceberg table's name. | |||
|
|||
### Adding A Catalog | |||
|
|||
Iceberg has several catalog back-ends that can be used to track tables, like JDBC, Hive MetaStore and Glue. | |||
Catalogs are configured using properties under `spark.sql.catalog.(catalog_name)`. In this guide, | |||
we use JDBC, but you can follow these instructions to configure other catalog types. To learn more, check out |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
weird that the guide already mention JDBC here, but the example is still hadoop
- [Configuring JDBC Catalog](#configuring-jdbc-catalog) | ||
- [Configuring REST Catalog](#configuring-rest-catalog) | ||
- [Next steps](#next-steps) | ||
- [Adding Iceberg to Spark](#adding-iceberg-to-spark) | ||
- [Learn More](#learn-more) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--conf spark.sql.catalog.local.warehouse=$PWD/warehouse | ||
``` | ||
|
||
For example configuring a REST-based catalog, see [Configuring REST Catalog](/spark-quickstart#configuring-rest-catalog) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of repeating here for configuring REST catalog, just link to site/docs/spark-quickstart.md
. I double checked the link here locally
--conf spark.sql.catalog.local.type=jdbc \ | ||
--conf spark.sql.catalog.local.uri=jdbc:sqlite:$PWD/iceberg_catalog_db.sqlite \ | ||
--conf spark.sql.catalog.local.warehouse=$PWD/warehouse \ | ||
--conf spark.sql.defaultCatalog=local |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add defaultCatalog
to match other pages
spark.sql.extensions org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions | ||
spark.sql.catalog.spark_catalog org.apache.iceberg.spark.SparkSessionCatalog | ||
spark.sql.catalog.spark_catalog.type hive | ||
spark.sql.catalog.local org.apache.iceberg.spark.SparkCatalog | ||
spark.sql.catalog.local.type hadoop | ||
spark.sql.catalog.local.warehouse $PWD/warehouse |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$PWD
does not expand in spark-defaults.conf
. keeping this here will create a folder named $PWD
|
||
=== "CLI" | ||
|
||
```sh | ||
spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:{{ icebergVersion }}\ | ||
spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:{{ icebergVersion }},org.xerial:sqlite-jdbc:3.46.1.3 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
taking on this extra dep since i dont see any iceberg specific package i can use. there is a hive-jdbc
package
Closes #11284
devlist discussion
This PR replaces examples of Hadoop catalog with examples of JDBC catalog and add examples of setting up a REST catalog
Testing
spark-quickstart.md
using JDBC catalogUsing
spark-sql
CLI config:Using
spark-defaults.conf
file:spark-quickstart.md
using REST catalogWith
spark-sql
CLI config:With
spark-defaults.conf
file:Rendered Docs
site/docs/spark-quickstart.md
(http://127.0.0.1:8000/spark-quickstart/#adding-catalogs
)docs/docs/spark-getting-started.md
(http://127.0.0.1:8000/docs/nightly/spark-getting-started/#adding-catalogs
)site/docs/how-to-release.md
(http://127.0.0.1:8000/how-to-release/#verifying-with-spark
)