Skip to content

Commit

Permalink
[DOCS] Add dedicated Iceberg page (#1830)
Browse files Browse the repository at this point in the history
Co-authored-by: Jay Chia <[email protected]@users.noreply.github.com>
  • Loading branch information
jaychia and Jay Chia authored Feb 1, 2024
1 parent 59af587 commit 4734862
Show file tree
Hide file tree
Showing 5 changed files with 42 additions and 50 deletions.
4 changes: 0 additions & 4 deletions docs/source/user_guide/cheatsheet.rst

This file was deleted.

3 changes: 1 addition & 2 deletions docs/source/user_guide/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@ Daft User Guide
:hidden:
:maxdepth: 1

cheatsheet
basic_concepts
daft_in_depth
poweruser
Expand Down Expand Up @@ -39,7 +38,7 @@ aims to help Daft users master the usage of the Daft *DataFrame* for all your da
code you may wish to take a look at these resources:

1. :doc:`../10-min`: Itching to run some Daft code? Hit the ground running with our 10 minute quickstart notebook.
2. :doc:`cheatsheet`: Quick reference to commonly-used Daft APIs and usage patterns - useful to keep next to your laptop as you code!
2. (Coming soon!) Cheatsheet: Quick reference to commonly-used Daft APIs and usage patterns - useful to keep next to your laptop as you code!
3. :doc:`../api_docs/index`: Searchable documentation and reference material to Daft's public Python API.

Table of Contents
Expand Down
2 changes: 1 addition & 1 deletion docs/source/user_guide/integrations.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,4 @@ Integrations

.. toctree::

integrations/data_catalogs
integrations/iceberg
43 changes: 0 additions & 43 deletions docs/source/user_guide/integrations/data_catalogs.rst

This file was deleted.

40 changes: 40 additions & 0 deletions docs/source/user_guide/integrations/iceberg.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
Apache Iceberg
==============

`Apache Iceberg <https://iceberg.apache.org/>`_ is an open-sourced table format originally developed at Netflix for large-scale analytical datasets.

To read from the Apache Iceberg table format, use the :func:`daft.read_iceberg` function.

We integrate closely with `PyIceberg <https://py.iceberg.apache.org/>`_ (the official Python implementation for Apache Iceberg) and allow the reading of Daft dataframes easily from PyIceberg's Table objects.

.. code:: python
# Access a PyIceberg table as per normal
from pyiceberg.catalog import load_catalog
catalog = load_catalog("my_iceberg_catalog")
table = catalog.load_table("my_namespace.my_table")
# Create a Daft Dataframe
import daft
df = daft.read_iceberg(table)
Daft currently natively supports:

1. **Distributed Reads:** Daft will fully distribute the I/O of reads over your compute resources (whether Ray or on multithreading on the local PyRunner)
2. **Skipping filtered data:** Daft uses ``df.where(...)`` filter calls to only read data that matches your predicates
3. **All Catalogs From PyIceberg:** Daft is natively integrated with PyIceberg, and supports all the catalogs that PyIceberg does!

Selecting a Table
*****************

Daft currently leverages PyIceberg for catalog/table discovery. Please consult `PyIceberg documentation <https://py.iceberg.apache.org/api/#load-a-table>`_ for more details on how to load a table!

Roadmap
*******

Here are features of Iceberg that are works-in-progress.

1. Iceberg V2 merge-on-read features
2. Writing back to an Iceberg table (appends, overwrites, upserts)

0 comments on commit 4734862

Please sign in to comment.