diff --git a/README.md b/README.md
index 6fffa434..d17dec61 100644
--- a/README.md
+++ b/README.md
@@ -39,30 +39,14 @@ For example, imagine we have the following table.
BlendSQL allows us to ask the following questions by injecting "ingredients", which are callable functions denoted by double curly brackets (`{{`, `}}`).
The below examples work out of the box, but you are able to design your own ingredients as well!
-*Show me some gifts appropriate for a child.*
+*What Italian restaurants have I been to in California?*
```sql
SELECT DISTINCT description, merchant FROM transactions WHERE
- {{LLMMap('would this gift be appropriate for a child?', 'transactions::description')}} = TRUE
- AND child_category = 'Gifts'
-```
-
-*Ok, show me some gifts not appropriate for a child I bought in q2 this year*
-```sql
-SELECT DISTINCT description, merchant FROM transactions WHERE
- {{LLMMap('would this gift be appropriate for a child?', 'transactions::description')}} = TRUE
- AND {{DT('transactions::date', start='q2')}}
- AND child_category = 'Gifts'
-```
-
-*Forget gifts, I'm hungry. What Italian restaurants have I been to in California?*
-```sql
-SELECT DISTINCT description, merchant FROM transactions WHERE
- {{LLMMap('is this an italian restaurant?', 'transactions::merchant')}} = TRUE
+ {{LLMMap('Is this an italian restaurant?', 'transactions::merchant')}} = TRUE
AND {{
LLMMap(
- 'what state is this transaction from? Choose -1 when N.A.',
- 'transactions::description',
- example_outputs='TX;CA;MA;-1;',
+ 'What state is this transaction from?',
+ 'transactions::description'
)
}} = 'CA'
AND child_category = 'Restaurants & Dining'
@@ -72,7 +56,7 @@ SELECT DISTINCT description, merchant FROM transactions WHERE
```sql
SELECT merchant FROM transactions
WHERE {{
- LLMQA('most likely to sell burgers?', 'transactions::merchant', options='transactions::merchant')
+ LLMQA('Most likely to sell burgers?', 'transactions::merchant', options='transactions::merchant')
}}
```
@@ -80,17 +64,138 @@ SELECT merchant FROM transactions
```sql
{{
LLMQA(
- 'Summarize my coffee spending in 10 words.',
- (SELECT * FROM transactions WHERE child_category = "Coffee")
+ 'Summarize my coffee spending.',
+ (SELECT * FROM transactions WHERE child_category = 'Coffee')
+ )
+}}
+```
+
+### More Examples
+
+
+
+ HybridQA
+
+For this setting, our database contains 2 tables: a table from Wikipedia `w`, and a collection of unstructured Wikipedia articles in the table `docs`.
+
+*What is the state flower of the smallest state by area ?*
+```sql
+SELECT "common name" AS 'State Flower' FROM w
+WHERE state = {{
+ LLMQA(
+ 'Which is the smallest state by area?',
+ (SELECT title, content FROM docs),
+ options='w::state'
)
}}
```
+*Who were the builders of the mosque in Herat with fire temples ?*
+```sql
+{{
+ LLMQA(
+ 'Name of the builders?',
+ (
+ SELECT title AS 'Building', content FROM docs
+ WHERE title = {{
+ LLMQA(
+ 'Align the name to the correct title.',
+ (SELECT name FROM w WHERE city = 'herat' AND remarks LIKE '%fire temple%'),
+ options='docs::title'
+ )
+ }}
+ )
+ )
+}}
+```
+
+*What is the capacity of the venue that was named in honor of Juan Antonio Samaranch in 2010 after his death ?*
+```sql
+SELECT capacity FROM w WHERE venue = {{
+ LLMQA(
+ 'Which venue is named in honor of Juan Antonio Samaranch?',
+ (SELECT title AS 'Venue', content FROM docs),
+ options='w::venue'
+ )
+}}
+```
+
+
+
+
+
+
+
+
+ FEVEROUS
+
+Here, we deal not with questions, but truth claims given a context of unstructured and structured data.
+
+These claims should be judged as "SUPPORTS" or "REFUTES". Using BlendSQL, we can formulate this determination of truth as a function over facts.
+
+*Oyedaea is part of the family Asteraceae in the order Asterales.*
+```sql
+SELECT EXISTS (
+ SELECT * FROM w0 WHERE attribute = 'family:' and value = 'asteraceae'
+) AND EXISTS (
+ SELECT * FROM w0 WHERE attribute = 'order:' and value = 'asterales'
+)
+```
+
+*The 2006-07 San Jose Sharks season, the 14th season of operation (13th season of play) for the National Hockey League (NHL) franchise, scored the most points in the Pacific Division.*
+```sql
+SELECT (
+ SELECT (
+ {{
+ LLMQA('Is the Sharks 2006-07 season the 14th season (13th season of play)?', 'docs::content', options='t;f')
+ }}
+ ) = 't'
+)
+AND (
+ SELECT
+ (
+ SELECT filledcolumnname FROM w0 ORDER BY pts DESC LIMIT 1
+ ) = 'san jose sharks'
+)
+```
+
+*Lindfield railway station has 3 bus routes, in which the first platform services routes to Emu plains via Central and Richmond and Hornbys via Strathfield.*
+```sql
+SELECT EXISTS (
+ SELECT * FROM w0 WHERE platform = 1
+ AND {{
+ LLMMap(
+ 'Does this service to Emu plains via Central and Richmond?',
+ 'w0::stopping pattern'
+ )
+ }} = TRUE
+ ) AND EXISTS (
+ SELECT * FROM w0 WHERE platform = 1
+ AND {{
+ LLMMap(
+ 'Does this service to Hornbys via Strathfield?',
+ 'w0::stopping pattern'
+ )
+ }} = TRUE
+ ) AND EXISTS (
+ SELECT * FROM docs
+ WHERE {{
+ LLMMap(
+ 'How many bus routes operated by Transdev?',
+ 'docs::content'
+ )
+ }} = 3
+ )
+```
+
+
+
+
### Features
- Smart parsing optimizes what is passed to external functions 🧠
- Traverses AST to minimize external function calls
-- Accelerated LLM calls and caching 🚀
- - Enabled with [gptcache](https://github.com/zilliztech/GPTCache) via [guidance](https://github.com/guidance-ai/guidance)
+- Accelerated LLM calls, caching, and constrained decoding 🚀
+ - Enabled via [guidance](https://github.com/guidance-ai/guidance)
- Easy logging of execution environment with `smoothie.save_recipe()` 🖥️
- Enables reproducibility across machines
@@ -107,36 +212,25 @@ The below benchmarks were done on my local M1 Macbook Pro. by running the script
For a technical walkthrough of how a BlendSQL query is executed, check out [technical_walkthrough.md](./technical_walkthrough.md).
-## Setup
-### Prep Env
-To set up a `blendsql` conda environment, run the following command.
+## Install
```
-conda env create && conda activate blendsql && pre-commit install
+pip install blendsql
```
## Open Command Line BlendSQL Interpreter
```
-./blend {db_path}
+blendsql {db_path} {secrets_path}
```
![blend-cli](./img/blend_cli.png)
-
-### Run Examples
-`python -m examples.example`
-
-### Run Line Profiling
-First uncomment `@profile` above `blend()` in `grammar.py`.
-Make sure you've run `pip install line_profiler` first. This installs the tool here: https://github.com/pyutils/line_profiler
-
-`PYTHONPATH=$PWD:$PYTHONPATH kernprof -lv examples/benchmarks/with_blendsql.py`
-
## Example Usage
```python
-from blendsql import blend, SQLiteDBConnector, init_secrets
+from blendsql import blend, init_secrets
+from blendsql.db import SQLiteDBConnector
# Import our pre-built ingredients
from blendsql.ingredients.builtin import LLMMap, LLMQA, DT
@@ -153,7 +247,7 @@ SELECT merchant FROM transactions WHERE
smoothie = blend(
query=blendsql,
db=db,
- ingredients={LLMMap, DT},
+ ingredients={LLMMap, LLMQA, DT},
verbose=True
)
@@ -169,6 +263,10 @@ Ingredients are at the core of a BlendSQL script.
They are callable functions that perform one the task paradigms defined in [ingredient.py](./blendsql/ingredients/ingredient.py).
+At their core, these are not a new concept. [User-defined functions (UDFs)](https://docs.databricks.com/en/udf/index.html), or [Application-Defined Functions in SQLite](https://www.sqlite.org/appfunc.html) have existed for quite some time.
+
+However, ingredients in BlendSQL are intended to be optimized towards LLM-based functions, defining an order of operations for traversing the AST such that the minimal amount of data is passed into your expensive GPT-4/Llama 2/Mistral 7b/etc. prompt.
+
Ingredient calls are denoted by wrapping them in double curly brackets, `{{ingredient}}`.
The following ingredient types are valid.
@@ -199,10 +297,19 @@ Handles the logic of ambiguous, non-intuitive `JOIN` clauses between tables.
For example:
```sql
SELECT Capitals.name, State.name FROM Capitals
- {{
- LLMJoin('Align state to capital', 'States::name', options='Capitals::name')
- }}
+ JOIN {{
+ LLMJoin(
+ 'Align state to capital',
+ left_on='States::name',
+ right_on='Capitals::name'
+ )
+ }}
```
+The above example hints at a database schema that would make [E.F Codd](https://en.wikipedia.org/wiki/Edgar_F._Codd) very angry: why do we have two separate tables `States` and `Capitals` with no foreign key to join the two?
+
+However, BlendSQL was built to interact with tables "in-the-wild", and many (such as those on Wikipedia) do not have these convenient properties of well-designed relational models.
+
+For this reason, we can leverage the internal knowledge of a pre-trained LLM to do the `JOIN` operation for us.
### `QAIngredient`
Sometimes, simply selecting data from a given database is not enough to sufficiently answer a user's question.
@@ -291,19 +398,19 @@ Perhaps we want the answer to the above question in a different format. We can c
Running the above BlendSQL query, we get the output `two consecutive days!`.
This `options` argument can also be a reference to a given column.
-These options will be restricted to only the values exposed via the subquery (2nd arg in `LLMQA`).
-> [!WARNING]
-> This was changed to accommodate the HybridQA dataset.
-> For example:
-> ```sql
-> SELECT capacity FROM w WHERE venue = {{
-> LLMQA(
-> 'Which venue is named in honor of Juan Antonio Samaranch?',
-> (SELECT title, content FROM docs WHERE content LIKE '%venue%'),
-> options='w::venue'
-> )
->}}
+For example (from the [HybridQA dataset](https://hybridqa.github.io/)):
+```sql
+ SELECT capacity FROM w WHERE venue = {{
+ LLMQA(
+ 'Which venue is named in honor of Juan Antonio Samaranch?',
+ (SELECT title, content FROM docs WHERE content LIKE '%venue%'),
+ options='w::venue'
+ )
+}}
+```
+
+Or, from our running example:
```sql
{{
LLMQA(
@@ -368,115 +475,11 @@ def blend(*args, **kwargs) -> Smoothie:
...
```
-## Example Appendix
-
-### HybridQA
-For this setting, we database containing 2 tables: a table from Wikipedia `w`, and a collection of unstructured Wikipedia articles in the table `docs`.
-
-#### 'What is the state flower of the smallest state by area ?'
-```sql
-SELECT "common name" AS 'State Flower' FROM w
-WHERE state = {{
- LLMQA(
- 'Which is the smallest state by area?',
- (SELECT title, content FROM docs),
- options='w::state'
- )
-}}
-```
-
-#### 'Who were the builders of the mosque in Herat with fire temples ?'
-```sql
-{{
- LLMQA(
- 'Name of the builders?',
- (
- SELECT title AS 'Building', content FROM docs
- WHERE title = {{
- LLMQA(
- 'Align the name to the correct title.',
- (SELECT name FROM w WHERE city = 'herat' AND remarks LIKE '%fire temple%'),
- options='docs::title'
- )
- }}
- )
- )
-}}
-```
-
-#### 'What is the capacity of the venue that was named in honor of Juan Antonio Samaranch in 2010 after his death ?'
-```sql
-SELECT capacity FROM w WHERE venue = {{
- LLMQA(
- 'Which venue is named in honor of Juan Antonio Samaranch?',
- (SELECT title AS 'Venue', content FROM docs),
- options='w::venue'
- )
-}}
-```
-
-### FEVEROUS
-Here, we deal not with questions, but truth claims given a context of unstructured and structured data.
-These claims should be judged as "SUPPORTS" or "REFUTES". Using BlendSQL, we can formulate this determination of truth as a function over facts.
+### Appendix
-#### 'Oyedaea is part of the family Asteraceae in the order Asterales.'
-```sql
-SELECT CASE WHEN
- EXISTS (
- SELECT * FROM w0 WHERE attribute = 'family:' and value = 'asteraceae'
- ) AND EXISTS (
- SELECT * FROM w0 WHERE attribute = 'order:' and value = 'asterales'
- )
-THEN 'SUPPORTS' ELSE 'REFUTES' END
-```
-
-#### 'The 2006-07 San Jose Sharks season, the 14th season of operation (13th season of play) for the National Hockey League (NHL) franchise, scored the most points in the Pacific Division.'
-```sql
-SELECT CASE WHEN
- (
- SELECT (
- {{
- LLMQA('Is the Sharks 2006-07 season the 14th season (13th season of play)?', 'docs::content', options='t;f')
- }}
- ) = 't'
- )
- AND (
- SELECT
- (
- SELECT filledcolumnname FROM w0 ORDER BY pts DESC LIMIT 1
- ) = 'san jose sharks'
- )
-THEN 'SUPPORTS' ELSE 'REFUTES' END
-```
+#### Run Line Profiling
+First uncomment `@profile` above `blend()` in `grammar.py`.
+Make sure you've run `pip install line_profiler` first. This installs the tool here: https://github.com/pyutils/line_profiler
-#### 'Lindfield railway station has 3 bus routes, in which the first platform services routes to Emu plains via Central and Richmond and Hornbys via Strathfield.'
-```sql
-SELECT CASE WHEN
- EXISTS (
- SELECT * FROM w0 WHERE platform = 1
- AND {{
- LLMMap(
- 'Does this service to Emu plains via Central and Richmond?',
- 'w0::stopping pattern'
- )
- }} = TRUE
- ) AND EXISTS (
- SELECT * FROM w0 WHERE platform = 1
- AND {{
- LLMMap(
- 'Does this service to Hornbys via Strathfield?',
- 'w0::stopping pattern'
- )
- }} = TRUE
- ) AND EXISTS (
- SELECT * FROM docs
- WHERE {{
- LLMMap(
- 'How many bus routes operated by Transdev?',
- 'docs::content'
- )
- }} = 3
- )
-THEN 'SUPPORTS' ELSE 'REFUTES' END
-```
\ No newline at end of file
+`PYTHONPATH=$PWD:$PYTHONPATH kernprof -lv examples/benchmarks/with_blendsql.py`