Skip to content

Commit

Permalink
Restructure
Browse files Browse the repository at this point in the history
  • Loading branch information
Jay Chia committed Sep 27, 2024
1 parent fe3bdba commit 2305df0
Show file tree
Hide file tree
Showing 5 changed files with 25 additions and 15 deletions.
4 changes: 2 additions & 2 deletions docs/source/10-min.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -569,7 +569,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"See: [Expressions](user_guide/basic_concepts/expressions.rst)\n",
"See: [Expressions](user_guide/expressions.rst)\n",
"\n",
"Expressions are an API for defining computation that needs to happen over your columns.\n",
"\n",
Expand Down Expand Up @@ -1516,7 +1516,7 @@
"source": [
"### User-Defined Functions\n",
"\n",
"See: [UDF User Guide](user_guide/daft_in_depth/udf)"
"See: [UDF User Guide](user_guide/udf)"
]
},
{
Expand Down
8 changes: 5 additions & 3 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -87,11 +87,13 @@
"learn/user_guides/remote_cluster_execution": "distributed-computing.html",
"learn/quickstart": "learn/10-min.html",
"learn/10-min": "../10-min.html",
"user_guide/basic_concepts/*": "user_guide/basic_concepts.html",
"user_guide/basic_concepts/expressions": "user_guide/expressions",
"user_guide/basic_concepts/dataframe_introduction": "user_guide/basic_concepts",
"user_guide/basic_concepts/introduction": "user_guide/basic_concepts",
"user_guide/daft_in_depth/aggregations": "user_guide/aggregations",
"user_guide/daft_in_depth/dataframe-operations": "user_guide/dataframe-operations",
"user_guide/daft_in_depth/datatypes": "user_guide/datatypes",
"user_guide/daft_in_depth/udf": "user_guide/udf",
"user_guide/datatypes": "user_guide/datatypes",
"user_guide/udf": "user_guide/udf",
}

# Resolving code links to github
Expand Down
6 changes: 3 additions & 3 deletions docs/source/migration_guides/coming_from_dask.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ Daft does not use an index

Dask aims for as much feature-parity with pandas as possible, including maintaining the presence of an Index in the DataFrame. But keeping an Index is difficult when moving to a distributed computing environment. Dask doesn’t support row-based positional indexing (with .iloc) because it does not track the length of its partitions. It also does not support pandas MultiIndex. The argument for keeping the Index is that it makes some operations against the sorted index column very fast. In reality, resetting the Index forces a data shuffle and is an expensive operation.

Daft drops the need for an Index to make queries more readable and consistent. How you write a query should not change because of the state of an index or a reset_index call. In our opinion, eliminating the index makes things simpler, more explicit, more readable and therefore less error-prone. Daft achieves this by using the [Expressions API](../user_guide/basic_concepts/expressions).
Daft drops the need for an Index to make queries more readable and consistent. How you write a query should not change because of the state of an index or a reset_index call. In our opinion, eliminating the index makes things simpler, more explicit, more readable and therefore less error-prone. Daft achieves this by using the [Expressions API](../user_guide/expressions).

In Dask you would index your DataFrame to return row ``b`` as follows:

Expand Down Expand Up @@ -80,7 +80,7 @@ For example:
res = ddf.map_partitions(my_function, **kwargs)
Daft implements two APIs for mapping computations over the data in your DataFrame in parallel: :doc:`Expressions <../user_guide/basic_concepts/expressions>` and :doc:`UDFs <../user_guide/daft_in_depth/udf>`. Expressions are most useful when you need to define computation over your columns.
Daft implements two APIs for mapping computations over the data in your DataFrame in parallel: :doc:`Expressions <../user_guide/expressions>` and :doc:`UDFs <../user_guide/udf>`. Expressions are most useful when you need to define computation over your columns.

.. code:: python
Expand Down Expand Up @@ -113,7 +113,7 @@ Daft is built as a DataFrame API for distributed Machine learning. You can use D
Daft supports Multimodal Data Types
-----------------------------------

Dask supports the same data types as pandas. Daft is built to support many more data types, including Images, nested JSON, tensors, etc. See :doc:`the documentation <../user_guide/daft_in_depth/datatypes>` for a list of all supported data types.
Dask supports the same data types as pandas. Daft is built to support many more data types, including Images, nested JSON, tensors, etc. See :doc:`the documentation <../user_guide/datatypes>` for a list of all supported data types.

Distributed Computing and Remote Clusters
-----------------------------------------
Expand Down
4 changes: 2 additions & 2 deletions docs/source/user_guide/fotw/fotw-001-images.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -447,7 +447,7 @@
"metadata": {},
"source": [
"### Create Thumbnails\n",
"[Expressions](../basic_concepts/expressions) are a Daft API for defining computation that needs to happen over your columns. There are dedicated `image.(...)` Expressions for working with images.\n",
"[Expressions](../expressions) are a Daft API for defining computation that needs to happen over your columns. There are dedicated `image.(...)` Expressions for working with images.\n",
"\n",
"You can use the `image.resize` Expression to create a thumbnail of each image:"
]
Expand Down Expand Up @@ -527,7 +527,7 @@
"\n",
"We'll define a function that uses a pre-trained PyTorch model [ResNet50](https://pytorch.org/vision/main/models/generated/torchvision.models.resnet50.html) to classify the dog pictures. We'll then pass the `image` column to this PyTorch model and send the classification predictions to a new column `classify_breed`. \n",
"\n",
"You will use Daft [User-Defined Functions (UDFs)](../daft_in_depth/udf) to do this. Daft UDFs which are the best way to run computations over multiple rows or columns.\n",
"You will use Daft [User-Defined Functions (UDFs)](../udf) to do this. Daft UDFs which are the best way to run computations over multiple rows or columns.\n",
"\n",
"#### Setting up PyTorch\n",
"\n",
Expand Down
18 changes: 13 additions & 5 deletions docs/source/user_guide/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ Daft User Guide
dataframe-operations
sql
aggregations
udf
poweruser
integrations
tutorials
Expand Down Expand Up @@ -41,16 +42,23 @@ The Daft User Guide is laid out as follows:

High-level overview of Daft interfaces and usage to give you a better understanding of how Daft will fit into your day-to-day workflow.

Daft in Depth
*************

Core Daft concepts all Daft users will find useful to understand deeply.

* :doc:`read-and-write`
* :doc:`expressions`
* :doc:`datatypes`
* :doc:`dataframe-operations`
* :doc:`aggregations`
* :doc:`udf`

:doc:`Structured Query Language (SQL) <sql>`
********************************************

A look into Daft's SQL interface and how it complements Daft's Pythonic DataFrame APIs.

:doc:`Daft in Depth <daft_in_depth>`
************************************

Core Daft concepts all Daft users will find useful to understand deeply.

:doc:`The Daft Poweruser <poweruser>`
*************************************

Expand Down

0 comments on commit 2305df0

Please sign in to comment.