fix end of line

Eventual-Inc · Dec 19, 2024 · 54bea1d · 54bea1d
1 parent be595a0
commit 54bea1d
Show file tree

Hide file tree

Showing 19 changed files with 23 additions and 20 deletions.
diff --git a/docs-v2/advanced/distributed.md b/docs-v2/advanced/distributed.md
@@ -71,4 +71,5 @@ You can take the IP address and port and pass it to Daft:
     ╰───────╯
 
     (Showing first 2 of 2 rows)
-    ```
+    ```
+
diff --git a/docs-v2/advanced/memory.md b/docs-v2/advanced/memory.md
@@ -63,4 +63,4 @@ There are some options available to you.
 
 5. Increase the number of partitions in your dataframe (hence making each partition smaller) using something like: `df.into_partitions(df.num_partitions() * 2)`
 
-If your workload continues to experience OOM issues, perhaps Daft could be better estimating the required memory to run certain steps in your workload. Please contact Daft developers on our forums!
+If your workload continues to experience OOM issues, perhaps Daft could be better estimating the required memory to run certain steps in your workload. Please contact Daft developers on our forums!
diff --git a/docs-v2/advanced/partitioning.md b/docs-v2/advanced/partitioning.md
@@ -110,4 +110,4 @@ Note that many of these methods will change both the *number of partitions* as w
     |   Estimated Scan Bytes = 72000000
     |   Clustering spec = { Num partitions = 3 }
     |   ...
-```
+```
diff --git a/docs-v2/core_concepts.md b/docs-v2/core_concepts.md
@@ -2332,4 +2332,4 @@ Let’s turn the bytes into human-readable images using [`image.decode()`](https
 - [:fontawesome-solid-equals: **Partitioning**](advanced/partitioning.md)
 - [:material-distribute-vertical-center: **Distributed Computing**](advanced/distributed.md)
 
-</div>
+</div>
diff --git a/docs-v2/install.md b/docs-v2/install.md
@@ -44,4 +44,4 @@ pip install -U getdaft --pre --extra-index-url https://pypi.anaconda.org/daft-ni
 pip install -U https://github.com/Eventual-Inc/Daft/archive/refs/heads/main.zip
 ```
 
-Please note that Daft requires the Rust toolchain in order to build from source.
+Please note that Daft requires the Rust toolchain in order to build from source.
diff --git a/docs-v2/integrations/aws.md b/docs-v2/integrations/aws.md
@@ -52,4 +52,5 @@ pass a different [`daft.io.S3Config`](https://www.getdaft.io/projects/docs/en/st
     ```python
     # Perform some I/O operation but override the IOConfig
     df2 = daft.read_csv("s3://my_bucket/my_other_path/**/*", io_config=io_config)
-    ```
+    ```
+
diff --git a/docs-v2/integrations/azure.md b/docs-v2/integrations/azure.md
@@ -78,4 +78,5 @@ If you are connecting to storage in OneLake or another Microsoft Fabric service,
     )
 
     df = daft.read_deltalake('abfss://[WORKSPACE]@onelake.dfs.fabric.microsoft.com/[LAKEHOUSE].Lakehouse/Tables/[TABLE]', io_config=io_config)
-    ```
+    ```
+
diff --git a/docs-v2/integrations/delta_lake.md b/docs-v2/integrations/delta_lake.md
@@ -124,4 +124,4 @@ Here are Delta Lake features that are on our roadmap. Please let us know if you
 3. Writing new Delta Lake tables ([issue](https://github.com/Eventual-Inc/Daft/issues/1967)).
 <!-- ^ this needs an update, issue has been closed -->
 
-4. Writing back to an existing table with appends, overwrites, upserts, or deletes ([issue](https://github.com/Eventual-Inc/Daft/issues/1968)).
+4. Writing back to an existing table with appends, overwrites, upserts, or deletes ([issue](https://github.com/Eventual-Inc/Daft/issues/1968)).
diff --git a/docs-v2/integrations/hudi.md b/docs-v2/integrations/hudi.md
@@ -73,4 +73,4 @@ Support for more Hudi features are tracked as below:
 1. Support incremental query for Copy-on-Write tables [issue](https://github.com/Eventual-Inc/Daft/issues/2153)).
 2. Read support for 1.0 table format ([issue](https://github.com/Eventual-Inc/Daft/issues/2152)).
 3. Read support (snapshot) for Merge-on-Read tables ([issue](https://github.com/Eventual-Inc/Daft/issues/2154)).
-4. Write support ([issue](https://github.com/Eventual-Inc/Daft/issues/2155)).
+4. Write support ([issue](https://github.com/Eventual-Inc/Daft/issues/2155)).
diff --git a/docs-v2/integrations/huggingface.md b/docs-v2/integrations/huggingface.md
@@ -66,4 +66,5 @@ to get around this, you can read all files using a glob pattern *(assuming they
 
     ```python
     df = daft.read_parquet("hf://datasets/username/my_private_dataset/**/*.parquet", io_config=io_config) # Works
-    ```
+    ```
+
diff --git a/docs-v2/integrations/iceberg.md b/docs-v2/integrations/iceberg.md
@@ -107,4 +107,4 @@ Here are some features of Iceberg that are works-in-progress:
 2. More extensive usage of Iceberg-provided statistics to further optimize queries
 3. Copy-on-write and merge-on-read writes
 
-A more detailed Iceberg roadmap for Daft can be found on [our Github Issues page](https://github.com/Eventual-Inc/Daft/issues/2458).
+A more detailed Iceberg roadmap for Daft can be found on [our Github Issues page](https://github.com/Eventual-Inc/Daft/issues/2458).
diff --git a/docs-v2/integrations/ray.md b/docs-v2/integrations/ray.md
@@ -87,4 +87,4 @@ ray job submit \
 
     The runtime env parameter specifies that Daft should be installed on the Ray workers. Alternative methods of including Daft in the worker dependencies can be found [here](https://docs.ray.io/en/latest/ray-core/handling-dependencies.html).
 
-For more information about Ray jobs, see [Ray docs -> Ray Jobs Overview](https://docs.ray.io/en/latest/cluster/running-applications/job-submission/index.html).
+For more information about Ray jobs, see [Ray docs -> Ray Jobs Overview](https://docs.ray.io/en/latest/cluster/running-applications/job-submission/index.html).
diff --git a/docs-v2/integrations/sql.md b/docs-v2/integrations/sql.md
@@ -157,4 +157,4 @@ You could modify the SQL query to add the filters and projections yourself, but
 Here are the SQL features that are on our roadmap. Please let us know if you would like to see support for any of these features!
 
 1. Write support into SQL databases.
-2. Reads via [ADBC (Arrow Database Connectivity)](https://arrow.apache.org/docs/format/ADBC.html).
+2. Reads via [ADBC (Arrow Database Connectivity)](https://arrow.apache.org/docs/format/ADBC.html).
diff --git a/docs-v2/integrations/unity_catalog.md b/docs-v2/integrations/unity_catalog.md
@@ -66,4 +66,4 @@ See also [Delta Lake](delta_lake.md) for more information about how to work with
 
 2. Unity Iceberg integration for reading tables using the Iceberg interface instead of the Delta Lake interface
 
-Please make issues on the [Daft repository](https://github.com/Eventual-Inc/Daft) if you have any use-cases that Daft does not currently cover!
+Please make issues on the [Daft repository](https://github.com/Eventual-Inc/Daft) if you have any use-cases that Daft does not currently cover!
diff --git a/docs-v2/migration/dask_migration.md b/docs-v2/migration/dask_migration.md
@@ -129,4 +129,4 @@ Daft provides a [`read_sql()`](https://www.getdaft.io/projects/docs/en/stable/ap
 
 ## Daft combines Python with Rust and Pyarrow for optimal performance
 
-Daft combines Python with Rust and Pyarrow for optimal performance (see [Benchmarks](../resources/benchmarks/tpch.md)). Under the hood, Table and Series are implemented in Rust on top of the Apache Arrow specification (using the Rust arrow2 library). This architecture means that all the computationally expensive operations on Table and Series are performed in Rust, and can be heavily optimized for raw speed. Python is most useful as a user-facing API layer for ease of use and an interactive data science user experience (see [Architecture](../resources/architecture.md)).
+Daft combines Python with Rust and Pyarrow for optimal performance (see [Benchmarks](../resources/benchmarks/tpch.md)). Under the hood, Table and Series are implemented in Rust on top of the Apache Arrow specification (using the Rust arrow2 library). This architecture means that all the computationally expensive operations on Table and Series are performed in Rust, and can be heavily optimized for raw speed. Python is most useful as a user-facing API layer for ease of use and an interactive data science user experience (see [Architecture](../resources/architecture.md)).
diff --git a/docs-v2/resources/architecture.md b/docs-v2/resources/architecture.md
@@ -90,4 +90,3 @@ Each Partition of a DataFrame is represented as a Table object, which is in turn
 Under the hood, Table and Series are implemented in Rust on top of the Apache Arrow specification (using the Rust arrow2 library). We expose Python API bindings for Table using PyO3, which allows our PhysicalPlan to define operations that should be run on each Table.
 
 This architecture means that all the computationally expensive operations on Table and Series are performed in Rust, and can be heavily optimized for raw speed. Python is most useful as a user-facing API layer for ease of use and an interactive data science user experience.
-
diff --git a/docs-v2/resources/benchmarks/tpch.md b/docs-v2/resources/benchmarks/tpch.md
@@ -150,4 +150,4 @@ For benchmarking Spark we used AWS EMR, the official managed Spark solution prov
 | Dask (failed, multiple retries)| 1000 | 4  | 1. s3://daft-public-data/benchmarking/logs/dask.2023_5_0.1tb.4-i32xlarge.q126.log |
 | Dask (multiple retries) | 100 | 4 | 1. s3://daft-public-data/benchmarking/logs/dask.2023_5_0.100gb.4-i32xlarge.0.log <br> 2. s3://daft-public-data/benchmarking/logs/dask.2023_5_0.100gb.4-i32xlarge.0.log <br> 3. s3://daft-public-data/benchmarking/logs/dask.2023_5_0.100gb.4-i32xlarge.1.log |
 | Modin (failed, multiple retries) | 1000 | 16 | 1. s3://daft-public-data/benchmarking/logs/modin.0_20_1.1tb.16-i32xlarge.0.log <br> 2. s3://daft-public-data/benchmarking/logs/modin.0_20_1.1tb.16-i32xlarge.1.log |
-| Modin (failed, multiple retries) | 100  | 4  | 1. s3://daft-public-data/benchmarking/logs/modin.0_20_1.100gb.4-i32xlarge.log |
+| Modin (failed, multiple retries) | 100  | 4  | 1. s3://daft-public-data/benchmarking/logs/modin.0_20_1.100gb.4-i32xlarge.log |
diff --git a/docs-v2/resources/dataframe_comparison.md b/docs-v2/resources/dataframe_comparison.md
@@ -73,4 +73,4 @@ Ray Datasets make it easy to feed data really efficiently into Ray's model train
 
 However, Ray Datasets are not a fully-fledged Dataframe abstraction (and [it is explicit in not being an ETL framework for data science](https://docs.ray.io/en/latest/data/overview.html#ray-data-overview)) which means that it lacks key features in data querying, visualization and aggregations.
 
-Instead, Ray Data is a perfect destination for processed data from DaFt Dataframes to be sent to with a simple [`df.to_ray_dataset()`](https://www.getdaft.io/projects/docs/en/stable/api_docs/doc_gen/dataframe_methods/daft.DataFrame.to_ray_dataset.html#daft.DataFrame.to_ray_dataset) call. This is useful as an entrypoint into your model training and inference ecosystem!
+Instead, Ray Data is a perfect destination for processed data from DaFt Dataframes to be sent to with a simple [`df.to_ray_dataset()`](https://www.getdaft.io/projects/docs/en/stable/api_docs/doc_gen/dataframe_methods/daft.DataFrame.to_ray_dataset.html#daft.DataFrame.to_ray_dataset) call. This is useful as an entrypoint into your model training and inference ecosystem!
diff --git a/docs-v2/resources/telemetry.md b/docs-v2/resources/telemetry.md
@@ -14,9 +14,9 @@ We **do not** sell or buy any of the data that is collected in telemetry.
 
 ## What data do we collect?
 
-To audit what data is collected, please see the implementation of ``AnalyticsClient`` in the ``daft.analytics`` module.
+To audit what data is collected, please see the implementation of `AnalyticsClient` in the `daft.analytics` module.
 
 In short, we collect the following:
 
 1. On import, we track system information such as the runner being used, version of Daft, OS, Python version, etc.
-2. On calls of public methods on the DataFrame object, we track metadata about the execution: the name of the method, the walltime for execution and the class of error raised (if any). Function parameters and stacktraces are not logged, ensuring that user data remains private.
+2. On calls of public methods on the DataFrame object, we track metadata about the execution: the name of the method, the walltime for execution and the class of error raised (if any). Function parameters and stacktraces are not logged, ensuring that user data remains private.
-Original file line number
+Diff line change
@@ Expand Up / @@ -71,4 +71,5 @@ You can take the IP address and port and pass it to Daft: @@
         ╰───────╯
         (Showing first 2 of 2 rows)
-        ```
+        ```
Original file line number	Diff line number	Diff line change
Expand Up		@@ -63,4 +63,4 @@ There are some options available to you.

		5. Increase the number of partitions in your dataframe (hence making each partition smaller) using something like: `df.into_partitions(df.num_partitions() * 2)`

		If your workload continues to experience OOM issues, perhaps Daft could be better estimating the required memory to run certain steps in your workload. Please contact Daft developers on our forums!
		If your workload continues to experience OOM issues, perhaps Daft could be better estimating the required memory to run certain steps in your workload. Please contact Daft developers on our forums!
Original file line number	Diff line number	Diff line change
Expand Up		@@ -87,4 +87,4 @@ ray job submit \

		The runtime env parameter specifies that Daft should be installed on the Ray workers. Alternative methods of including Daft in the worker dependencies can be found [here](https://docs.ray.io/en/latest/ray-core/handling-dependencies.html).

		For more information about Ray jobs, see [Ray docs -> Ray Jobs Overview](https://docs.ray.io/en/latest/cluster/running-applications/job-submission/index.html).
		For more information about Ray jobs, see [Ray docs -> Ray Jobs Overview](https://docs.ray.io/en/latest/cluster/running-applications/job-submission/index.html).
Original file line number	Diff line number	Diff line change
Expand Up		@@ -66,4 +66,4 @@ See also [Delta Lake](delta_lake.md) for more information about how to work with

		2. Unity Iceberg integration for reading tables using the Iceberg interface instead of the Delta Lake interface

		Please make issues on the [Daft repository](https://github.com/Eventual-Inc/Daft) if you have any use-cases that Daft does not currently cover!
		Please make issues on the [Daft repository](https://github.com/Eventual-Inc/Daft) if you have any use-cases that Daft does not currently cover!
Original file line number	Diff line number	Diff line change
Expand Up		@@ -129,4 +129,4 @@ Daft provides a [`read_sql()`](https://www.getdaft.io/projects/docs/en/stable/ap

		## Daft combines Python with Rust and Pyarrow for optimal performance

		Daft combines Python with Rust and Pyarrow for optimal performance (see [Benchmarks](../resources/benchmarks/tpch.md)). Under the hood, Table and Series are implemented in Rust on top of the Apache Arrow specification (using the Rust arrow2 library). This architecture means that all the computationally expensive operations on Table and Series are performed in Rust, and can be heavily optimized for raw speed. Python is most useful as a user-facing API layer for ease of use and an interactive data science user experience (see [Architecture](../resources/architecture.md)).
		Daft combines Python with Rust and Pyarrow for optimal performance (see [Benchmarks](../resources/benchmarks/tpch.md)). Under the hood, Table and Series are implemented in Rust on top of the Apache Arrow specification (using the Rust arrow2 library). This architecture means that all the computationally expensive operations on Table and Series are performed in Rust, and can be heavily optimized for raw speed. Python is most useful as a user-facing API layer for ease of use and an interactive data science user experience (see [Architecture](../resources/architecture.md)).
Original file line number	Diff line number	Diff line change
Expand Up		@@ -90,4 +90,3 @@ Each Partition of a DataFrame is represented as a Table object, which is in turn
		Under the hood, Table and Series are implemented in Rust on top of the Apache Arrow specification (using the Rust arrow2 library). We expose Python API bindings for Table using PyO3, which allows our PhysicalPlan to define operations that should be run on each Table.

		This architecture means that all the computationally expensive operations on Table and Series are performed in Rust, and can be heavily optimized for raw speed. Python is most useful as a user-facing API layer for ease of use and an interactive data science user experience.
Original file line number	Diff line number	Diff line change
Expand Up		@@ -73,4 +73,4 @@ Ray Datasets make it easy to feed data really efficiently into Ray's model train

		However, Ray Datasets are not a fully-fledged Dataframe abstraction (and [it is explicit in not being an ETL framework for data science](https://docs.ray.io/en/latest/data/overview.html#ray-data-overview)) which means that it lacks key features in data querying, visualization and aggregations.

		Instead, Ray Data is a perfect destination for processed data from DaFt Dataframes to be sent to with a simple [`df.to_ray_dataset()`](https://www.getdaft.io/projects/docs/en/stable/api_docs/doc_gen/dataframe_methods/daft.DataFrame.to_ray_dataset.html#daft.DataFrame.to_ray_dataset) call. This is useful as an entrypoint into your model training and inference ecosystem!
		Instead, Ray Data is a perfect destination for processed data from DaFt Dataframes to be sent to with a simple [`df.to_ray_dataset()`](https://www.getdaft.io/projects/docs/en/stable/api_docs/doc_gen/dataframe_methods/daft.DataFrame.to_ray_dataset.html#daft.DataFrame.to_ray_dataset) call. This is useful as an entrypoint into your model training and inference ecosystem!