[DOCS] Add dedicated Iceberg page (#1830)

Co-authored-by: Jay Chia <[email protected]@users.noreply.github.com>
Eventual-Inc · Feb 1, 2024 · 4734862 · 4734862
1 parent 59af587
commit 4734862
Show file tree

Hide file tree

Showing 5 changed files with 42 additions and 50 deletions.
diff --git a/docs/source/user_guide/cheatsheet.rst b/docs/source/user_guide/cheatsheet.rst
diff --git a/docs/source/user_guide/index.rst b/docs/source/user_guide/index.rst
@@ -5,7 +5,6 @@ Daft User Guide
     :hidden:
     :maxdepth: 1
 
-    cheatsheet
     basic_concepts
     daft_in_depth
     poweruser
@@ -39,7 +38,7 @@ aims to help Daft users master the usage of the Daft *DataFrame* for all your da
     code you may wish to take a look at these resources:
 
     1. :doc:`../10-min`: Itching to run some Daft code? Hit the ground running with our 10 minute quickstart notebook.
-    2. :doc:`cheatsheet`: Quick reference to commonly-used Daft APIs and usage patterns - useful to keep next to your laptop as you code!
+    2. (Coming soon!) Cheatsheet: Quick reference to commonly-used Daft APIs and usage patterns - useful to keep next to your laptop as you code!
     3. :doc:`../api_docs/index`: Searchable documentation and reference material to Daft's public Python API.
 
 Table of Contents

diff --git a/docs/source/user_guide/integrations.rst b/docs/source/user_guide/integrations.rst
@@ -3,4 +3,4 @@ Integrations
 
 .. toctree::
 
-    integrations/data_catalogs
+    integrations/iceberg
diff --git a/docs/source/user_guide/integrations/data_catalogs.rst b/docs/source/user_guide/integrations/data_catalogs.rst
diff --git a/docs/source/user_guide/integrations/iceberg.rst b/docs/source/user_guide/integrations/iceberg.rst
@@ -0,0 +1,40 @@
+Apache Iceberg
+==============
+
+`Apache Iceberg <https://iceberg.apache.org/>`_ is an open-sourced table format originally developed at Netflix for large-scale analytical datasets.
+
+To read from the Apache Iceberg table format, use the :func:`daft.read_iceberg` function.
+
+We integrate closely with `PyIceberg <https://py.iceberg.apache.org/>`_ (the official Python implementation for Apache Iceberg) and allow the reading of Daft dataframes easily from PyIceberg's Table objects.
+
+.. code:: python
+
+    # Access a PyIceberg table as per normal
+    from pyiceberg.catalog import load_catalog
+
+    catalog = load_catalog("my_iceberg_catalog")
+    table = catalog.load_table("my_namespace.my_table")
+
+    # Create a Daft Dataframe
+    import daft
+
+    df = daft.read_iceberg(table)
+
+Daft currently natively supports:
+
+1. **Distributed Reads:** Daft will fully distribute the I/O of reads over your compute resources (whether Ray or on multithreading on the local PyRunner)
+2. **Skipping filtered data:** Daft uses ``df.where(...)`` filter calls to only read data that matches your predicates
+3. **All Catalogs From PyIceberg:** Daft is natively integrated with PyIceberg, and supports all the catalogs that PyIceberg does!
+
+Selecting a Table
+*****************
+
+Daft currently leverages PyIceberg for catalog/table discovery. Please consult `PyIceberg documentation <https://py.iceberg.apache.org/api/#load-a-table>`_ for more details on how to load a table!
+
+Roadmap
+*******
+
+Here are features of Iceberg that are works-in-progress.
+
+1. Iceberg V2 merge-on-read features
+2. Writing back to an Iceberg table (appends, overwrites, upserts)
Original file line number	Diff line number	Diff line change
Expand Up		@@ -3,4 +3,4 @@ Integrations

		.. toctree::

		integrations/data_catalogs
		integrations/iceberg