Skip to content

Commit

Permalink
updates
Browse files Browse the repository at this point in the history
  • Loading branch information
StanChan03 committed Dec 15, 2024
1 parent 1cbba52 commit 0eb4b6c
Show file tree
Hide file tree
Showing 6 changed files with 51 additions and 3 deletions.
Binary file removed docs/Logo-scaled.png
Binary file not shown.
4 changes: 2 additions & 2 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
.. image:: Logo-scaled.png
.. image:: logo_with_text.png
:width: 300px
:height: 175px
:height: 170px
:align: center

LOTUS Makes LLM-Powerd Data Processing Fast and Easy
Expand Down
14 changes: 13 additions & 1 deletion docs/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,4 +18,16 @@ You can install Lotus using pip:
$ conda create -n lotus python=3.10 -y
$ conda activate lotus
$ pip install lotus-ai
$ pip install lotus-ai
After you must install Faiss via conda:

.. code-block:: console
# CPU-only version
$ conda install -c pytorch faiss-cpu=1.9.0
# GPU(+CPU) version
$ conda install -c pytorch -c nvidia faiss-gpu=1.9.0
For more details, see `Installing FAISS via Conda <https://github.com/facebookresearch/faiss/blob/main/INSTALL.md#installing-faiss-via-conda>`_.
Binary file added docs/logo_with_text.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
16 changes: 16 additions & 0 deletions docs/sem_index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,17 @@ sem_index
:members:
:show-inheritance:

Overview
---------
The sem_index operator in LOTUS creates a semantic index for a given column in a DataFrame.
This index enables efficient retrieval and ranking of records based on semantic similarity, making
it easier to query and analyze large datasets with natural language or contextual search criteria.

Motivation
-----------
Traditional search techniques struggle with context-dependent queries or subtle semantic nuances.
The sem_index operator addresses this by leveraging language models to create an index that supports semantic search

Example
----------
.. code-block:: python
Expand Down Expand Up @@ -76,3 +87,8 @@ Output:
| 13 | Introduction to Computer Networks |
+----+---------------------------------------------+


Required Parameters
--------------------
- **col_name** : The column name to index.
- **index_dir** : The directory to save the index.
20 changes: 20 additions & 0 deletions docs/sem_partition.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,20 @@ sem_partition_by
:members:
:show-inheritance:

Overview
---------
The sem_partition_by operator in LOTUS enables semantic partitioning of data based on contextual similarities.
It divides a DataFrame into subsets, which can then be independently analyzed or aggregated. This operator works
seamlessly with other LOTUS components, like sem_index for creating embeddings and sem_agg for performing
aggregations on clustered subsets, to build scalable and efficient workflows.

Motivation
----------
Real-world data often requires grouping based on meaning rather than exact matches, which traditional methods GROUP BY
cannot handle. The sem_partition_by operator solves this by clustering data semantically, allowing for
meaningful partitioning of natural language or context-dependent entries.


Example
----------
.. code-block:: python
Expand Down Expand Up @@ -33,3 +47,9 @@ Example
out = df.sem_agg("Summarize all {Course Name}")._output[0]
print(out)
Required Parameters
--------------------
- **partition_fn** : The partitioning function.


0 comments on commit 0eb4b6c

Please sign in to comment.