diff --git a/README.md b/README.md index da213e45..64ca6ad1 100644 --- a/README.md +++ b/README.md @@ -25,6 +25,7 @@ pip install blendsql ``` ### ✨ News +- (10/26/24) New tutorial! [blendsql-by-example.ipynb](examples/blendsql-by-example.ipynb) - (10/18/24) Concurrent async requests in 0.0.29! OpenAI and Anthropic `LLMMap` calls are speedy now. - Customize max concurrent async calls via `blendsql.config.set_async_limit(10)` - (10/15/24) As of version 0.0.27, there is a new pattern for defining + retrieving few-shot prompts; check out [Few-Shot Prompting](#few-shot-prompting) in the README for more info diff --git a/docs/reference/examples/blendsql-by-example.ipynb b/docs/reference/examples/blendsql-by-example.ipynb new file mode 100644 index 00000000..00ee4ffa --- /dev/null +++ b/docs/reference/examples/blendsql-by-example.ipynb @@ -0,0 +1,1232 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 2, + "id": "initial_id", + "metadata": { + "collapsed": true, + "ExecuteTime": { + "end_time": "2024-10-26T20:02:30.404790Z", + "start_time": "2024-10-26T20:02:25.210780Z" + } + }, + "outputs": [], + "source": [ + "import pandas as pd\n", + "from functools import partial\n", + "import nest_asyncio\n", + "nest_asyncio.apply()\n", + "\n", + "from blendsql.db import Pandas\n", + "from blendsql.utils import tabulate\n", + "import blendsql" + ] + }, + { + "cell_type": "markdown", + "source": [ + "# BlendSQL by Example\n", + "\n", + "This notebook introduces BlendSQL, and some of the usecases it can support. \n", + "\n", + "Importantly, the novelty of BlendSQL isn't from the ability to constrain language models according to some regular expression or context-free grammar. We can credit projects like [guidance](https://github.com/guidance-ai/guidance) and [outlines](https://github.com/dottxt-ai/outlines) for that. Instead, the novelty of BlendSQL is its ability to **infer these constraints according to the surrounding SQL syntax** and **closely align generation to the structure of the database**.\n", + "\n", + "SQL, as a grammar, has a lot of rules. Just take [these SQLite syntax diagrams](https://www.sqlite.org/syntaxdiagrams.html) for example. These rules include things like, `IN` statement should be followed by a list of items, `<`, `>`, should contain numerics, but `=` could contain any datatype, etc. We can use these to inform language-model functions, which we call 'ingredients', and denote in double curly brackets (`{{` and `}}`)." + ], + "metadata": { + "collapsed": false + }, + "id": "21e929ec2204780a" + }, + { + "cell_type": "markdown", + "source": [ + "## A Note on Models\n", + "This demo, unless noted otherwise, uses the amazing Azure AI with serverside Guidance integration, described [here](https://github.com/guidance-ai/guidance?tab=readme-ov-file#azure-ai). It allows us to access a Phi-3.5-mini on Azure, and utilize it in a constrained setting (i.e. have it follow a regular expression pattern, interleave text with generation calls, etc.)\n", + "\n", + "If you don't have an Azure access key, you can swap out the model below for [any of the other model integrations that BlendSQL supports](https://parkervg.github.io/blendsql/reference/models/models/)." + ], + "metadata": { + "collapsed": false + }, + "id": "619f4d7f620a1e6" + }, + { + "cell_type": "markdown", + "source": [ + "To begin, let's set up a local database using `from blendsql.db import Pandas`. " + ], + "metadata": { + "collapsed": false + }, + "id": "c34ef7a567ec3d81" + }, + { + "cell_type": "code", + "execution_count": 3, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Eras\n", + "┌───────────┐\n", + "│ Years │\n", + "├───────────┤\n", + "│ 1800-1900 │\n", + "│ 1900-2000 │\n", + "│ 2000-Now │\n", + "└───────────┘\n", + "People\n", + "┌────────────────────┬──────────────────────────────────────────────────────┐\n", + "│ Name │ Known_For │\n", + "├────────────────────┼──────────────────────────────────────────────────────┤\n", + "│ George Washington │ Established federal government, First U.S. President │\n", + "│ John Quincy Adams │ XYZ Affair, Alien and Sedition Acts │\n", + "│ Thomas Jefferson │ Louisiana Purchase, Declaration of Independence │\n", + "│ James Madison │ War of 1812, Constitution │\n", + "│ James Monroe │ Monroe Doctrine, Missouri Compromise │\n", + "│ Alexander Hamilton │ Created national bank, Federalist Papers │\n", + "│ Sabrina Carpenter │ Nonsense, Emails I Cant Send, Mean Girls musical │\n", + "│ Charli XCX │ Crash, How Im Feeling Now, Boom Clap │\n", + "│ Elon Musk │ Tesla, SpaceX, Twitter/X acquisition │\n", + "│ Michelle Obama │ Lets Move campaign, Becoming memoir │\n", + "│ Elvis Presley │ 14 Grammys, King of Rock n Roll │\n", + "└────────────────────┴──────────────────────────────────────────────────────┘\n" + ] + } + ], + "source": [ + "people_db = Pandas(\n", + " {\n", + " \"People\": pd.DataFrame(\n", + " {\n", + " 'Name': [\n", + " 'George Washington', \n", + " 'John Quincy Adams', \n", + " 'Thomas Jefferson', \n", + " 'James Madison', \n", + " 'James Monroe', \n", + " 'Alexander Hamilton',\n", + " 'Sabrina Carpenter',\n", + " 'Charli XCX',\n", + " 'Elon Musk',\n", + " 'Michelle Obama',\n", + " 'Elvis Presley',\n", + " ],\n", + " 'Known_For': [\n", + " 'Established federal government, First U.S. President',\n", + " 'XYZ Affair, Alien and Sedition Acts',\n", + " 'Louisiana Purchase, Declaration of Independence',\n", + " 'War of 1812, Constitution',\n", + " 'Monroe Doctrine, Missouri Compromise',\n", + " 'Created national bank, Federalist Papers',\n", + " 'Nonsense, Emails I Cant Send, Mean Girls musical',\n", + " 'Crash, How Im Feeling Now, Boom Clap',\n", + " 'Tesla, SpaceX, Twitter/X acquisition',\n", + " 'Lets Move campaign, Becoming memoir',\n", + " '14 Grammys, King of Rock n Roll'\n", + " ]\n", + " }\n", + " ),\n", + " \"Eras\": pd.DataFrame(\n", + " {\n", + " 'Years': [\n", + " '1800-1900',\n", + " '1900-2000',\n", + " '2000-Now'\n", + " ]\n", + " }\n", + " )\n", + " }\n", + ")\n", + "# Print the tables in our database\n", + "for tablename in people_db.tables():\n", + " print(tablename)\n", + " print(tabulate(people_db.execute_to_df(f\"SELECT * FROM {tablename};\")))" + ], + "metadata": { + "collapsed": false, + "ExecuteTime": { + "end_time": "2024-10-26T20:02:30.601727Z", + "start_time": "2024-10-26T20:02:30.406602Z" + } + }, + "id": "ce784e49b71969a8" + }, + { + "cell_type": "code", + "execution_count": 4, + "outputs": [], + "source": [ + "# Define a utility function to make query execution easier\n", + "blend = lambda query, *args, **kwargs: blendsql.blend(\n", + " query,\n", + " db=kwargs.get(\"db\", people_db),\n", + " ingredients={blendsql.LLMQA, blendsql.RAGQA, blendsql.LLMMap, blendsql.LLMJoin},\n", + " # This model can be changed, according to what your personal setup is\n", + " default_model=kwargs.get(\"model\", blendsql.models.AzurePhiModel(env=\"..\", caching=False)),\n", + " verbose=True\n", + ")" + ], + "metadata": { + "collapsed": false, + "ExecuteTime": { + "end_time": "2024-10-26T20:02:30.606212Z", + "start_time": "2024-10-26T20:02:30.602658Z" + } + }, + "id": "b9bcd7005a97ebff" + }, + { + "cell_type": "markdown", + "source": [ + "## The Elephant in the Room - Aren't LLM Functions in SQL Super Slow?\n", + "Short answer - compared to nearly all native SQL operations, yes. \n", + "\n", + "However, when using remote APIs like OpenAI or Anthropic, we can dramatically speed up processing times by batching async requests. Below demonstrates that, for a table with 17,686 rows and 1,165 unique values in the column we process, *it takes only about 6.5 seconds to run our query with gpt-4o-mini* (or about 0.005 seconds per value).\n", + "\n", + "By default, we allow 10 concurrent async requests. Depending on your own quotas set by the API provider, you may be able to increase this number using:\n", + "\n", + "```python\n", + "import blendsql\n", + "\n", + "# Set the limit for max async calls at a given time below\n", + "blendsql.config.set_async_limit(20)\n", + "```\n", + "\n", + "#### A Note on Query Optimizations\n", + "Because LLM functions are relatively slow compared to other SQL functions, when we perform query optimizations behind the scenes, we make sure to execute all native SQL functions *before* any LLM-based functions. This ensures the language model only receives the smallest set of data it needs to faithfully evaluate a given SQL expression" + ], + "metadata": { + "collapsed": false + }, + "id": "19868215586a223f" + }, + { + "cell_type": "code", + "execution_count": 56, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "17686 total rows in the table\n", + "1165 total unique values in the 'City' column\n" + ] + } + ], + "source": [ + "db = blendsql.db.SQLite(blendsql.utils.fetch_from_hub(\"california_schools.db\"))\n", + "print(\"{} total rows in the table\".format(db.execute_to_list(\"SELECT COUNT(*) FROM schools LIMIT 10;\")[0]))\n", + "print(\"{} total unique values in the 'City' column\".format(db.execute_to_list(\"SELECT COUNT(DISTINCT City) FROM schools LIMIT 10;\")[0]))" + ], + "metadata": { + "collapsed": false, + "ExecuteTime": { + "end_time": "2024-10-26T19:38:27.197637Z", + "start_time": "2024-10-26T19:38:27.064507Z" + } + }, + "id": "58fa4778e5d3716f" + }, + { + "cell_type": "code", + "execution_count": 63, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "\u001B[36mExecuting \u001B[96m`SELECT * FROM schools` \u001B[36mand setting to `32d0_schools_0`...\u001B[39m\n", + "\u001B[90mCREATE TEMP TABLE \"32d0_schools_0\" (\n", + "\t\"CDSCode\" TEXT, \n", + "\t\"NCESDist\" TEXT, \n", + "\t\"NCESSchool\" TEXT, \n", + "\t\"StatusType\" TEXT, \n", + "\t\"County\" TEXT, \n", + "\t\"District\" TEXT, \n", + "\t\"School\" TEXT, \n", + "\t\"Street\" TEXT, \n", + "\t\"StreetAbr\" TEXT, \n", + "\t\"City\" TEXT, \n", + "\t\"Zip\" TEXT, \n", + "\t\"State\" TEXT, \n", + "\t\"MailStreet\" TEXT, \n", + "\t\"MailStrAbr\" TEXT, \n", + "\t\"MailCity\" TEXT, \n", + "\t\"MailZip\" TEXT, \n", + "\t\"MailState\" TEXT, \n", + "\t\"Phone\" TEXT, \n", + "\t\"Ext\" TEXT, \n", + "\t\"Website\" TEXT, \n", + "\t\"OpenDate\" TEXT, \n", + "\t\"ClosedDate\" TEXT, \n", + "\t\"Charter\" FLOAT, \n", + "\t\"CharterNum\" TEXT, \n", + "\t\"FundingType\" TEXT, \n", + "\t\"DOC\" TEXT, \n", + "\t\"DOCType\" TEXT, \n", + "\t\"SOC\" TEXT, \n", + "\t\"SOCType\" TEXT, \n", + "\t\"EdOpsCode\" TEXT, \n", + "\t\"EdOpsName\" TEXT, \n", + "\t\"EILCode\" TEXT, \n", + "\t\"EILName\" TEXT, \n", + "\t\"GSoffered\" TEXT, \n", + "\t\"GSserved\" TEXT, \n", + "\t\"Virtual\" TEXT, \n", + "\t\"Magnet\" FLOAT, \n", + "\t\"Latitude\" FLOAT, \n", + "\t\"Longitude\" FLOAT, \n", + "\t\"AdmFName1\" TEXT, \n", + "\t\"AdmLName1\" TEXT, \n", + "\t\"AdmEmail1\" TEXT, \n", + "\t\"AdmFName2\" TEXT, \n", + "\t\"AdmLName2\" TEXT, \n", + "\t\"AdmEmail2\" TEXT, \n", + "\t\"AdmFName3\" TEXT, \n", + "\t\"AdmLName3\" TEXT, \n", + "\t\"AdmEmail3\" TEXT, \n", + "\t\"LastUpdate\" TEXT\n", + ")\u001B[39m\n", + "\u001B[36mExecuting \u001B[96m `{{LLMMap('Is this in the Bay Area?', 'schools::City', options='t;f')}}`...\u001B[39m\n", + "\u001B[90mUsing options '['t', 'f']'\u001B[39m\n", + "Making calls to Model with batch_size 5: |\u001B[36m \u001B[39m| 234/? [00:00<00:00, 30475.61it/s]\n", + "\u001B[31mLLMMap with OpenaiLLM(gpt-4o-mini) only returned 1165 out of 1166 values\n", + "\u001B[33mFinished LLMMap with values:\n", + "{\n", + " \"Hayward\": true,\n", + " \"Newark\": true,\n", + " \"Oakland\": true,\n", + " \"Berkeley\": true,\n", + " \"San Leandro\": true,\n", + " \"-\": false,\n", + " \"Dublin\": true,\n", + " \"Fremont\": false,\n", + " \"Sacramento\": true,\n", + " \"Alameda\": null\n", + "}\u001B[39m\n", + "\u001B[36mCombining 1 outputs for table `schools`\u001B[39m\n", + "\u001B[90mCREATE TEMP TABLE \"32d0_schools\" (\n", + "\t\"CDSCode\" TEXT, \n", + "\t\"NCESDist\" TEXT, \n", + "\t\"NCESSchool\" TEXT, \n", + "\t\"StatusType\" TEXT, \n", + "\t\"County\" TEXT, \n", + "\t\"District\" TEXT, \n", + "\t\"School\" TEXT, \n", + "\t\"Street\" TEXT, \n", + "\t\"StreetAbr\" TEXT, \n", + "\t\"City\" TEXT, \n", + "\t\"Zip\" TEXT, \n", + "\t\"State\" TEXT, \n", + "\t\"MailStreet\" TEXT, \n", + "\t\"MailStrAbr\" TEXT, \n", + "\t\"MailCity\" TEXT, \n", + "\t\"MailZip\" TEXT, \n", + "\t\"MailState\" TEXT, \n", + "\t\"Phone\" TEXT, \n", + "\t\"Ext\" TEXT, \n", + "\t\"Website\" TEXT, \n", + "\t\"OpenDate\" TEXT, \n", + "\t\"ClosedDate\" TEXT, \n", + "\t\"Charter\" FLOAT, \n", + "\t\"CharterNum\" TEXT, \n", + "\t\"FundingType\" TEXT, \n", + "\t\"DOC\" TEXT, \n", + "\t\"DOCType\" TEXT, \n", + "\t\"SOC\" TEXT, \n", + "\t\"SOCType\" TEXT, \n", + "\t\"EdOpsCode\" TEXT, \n", + "\t\"EdOpsName\" TEXT, \n", + "\t\"EILCode\" TEXT, \n", + "\t\"EILName\" TEXT, \n", + "\t\"GSoffered\" TEXT, \n", + "\t\"GSserved\" TEXT, \n", + "\t\"Virtual\" TEXT, \n", + "\t\"Magnet\" FLOAT, \n", + "\t\"Latitude\" FLOAT, \n", + "\t\"Longitude\" FLOAT, \n", + "\t\"AdmFName1\" TEXT, \n", + "\t\"AdmLName1\" TEXT, \n", + "\t\"AdmEmail1\" TEXT, \n", + "\t\"AdmFName2\" TEXT, \n", + "\t\"AdmLName2\" TEXT, \n", + "\t\"AdmEmail2\" TEXT, \n", + "\t\"AdmFName3\" TEXT, \n", + "\t\"AdmLName3\" TEXT, \n", + "\t\"AdmEmail3\" TEXT, \n", + "\t\"LastUpdate\" TEXT, \n", + "\t\"Is this in the Bay Area?\" BOOLEAN\n", + ")\u001B[39m\n", + "\u001B[92mFinal Query:\n", + "SELECT \"32d0_schools\".City AS City, \"32d0_schools\".\"Is this in the Bay Area?\" AS \"In Bay Area?\" FROM \"32d0_schools\"\u001B[39m\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Finished in 6.575707912445068 seconds\n", + "┌─────────────┬────────────────┐\n", + "│ City │ In Bay Area? │\n", + "├─────────────┼────────────────┤\n", + "│ Hayward │ 1 │\n", + "│ Newark │ 1 │\n", + "│ Oakland │ 1 │\n", + "│ Berkeley │ 1 │\n", + "│ Oakland │ 1 │\n", + "│ Oakland │ 1 │\n", + "│ Oakland │ 1 │\n", + "│ Hayward │ 1 │\n", + "│ San Leandro │ 1 │\n", + "│ Hayward │ 1 │\n", + "└─────────────┴────────────────┘\n" + ] + } + ], + "source": [ + "smoothie = blend(\n", + " \"\"\"\n", + " SELECT City, {{LLMMap('Is this in the Bay Area?', 'schools::City', options='t;f')}} AS 'In Bay Area?' FROM schools;\n", + " \"\"\",\n", + " # Override the default database and model arguments\n", + " db=db,\n", + " model=blendsql.models.OpenaiLLM('gpt-4o-mini', caching=False, env='..')\n", + ")\n", + "print(f\"Finished in {smoothie.meta.process_time_seconds} seconds\")\n", + "print(tabulate(smoothie.df.head(10)))" + ], + "metadata": { + "collapsed": false, + "ExecuteTime": { + "end_time": "2024-10-26T19:42:46.743452Z", + "start_time": "2024-10-26T19:42:39.991516Z" + } + }, + "id": "b204aea592252c9c" + }, + { + "cell_type": "markdown", + "source": [ + "## Classification with 'LLMMap' and GROUP BY', Constrained by a Column's Values\n", + "Below, we set up a BlendSQL query leveraging the `LLMMap` ingredient. This is a unary function similar to the `LENGTH` or `ABS` functions in standard SQLite. It takes a single argument (a value from a column) and returns a transformed output, which is then assigned to a new column.\n", + "\n", + "Below, we set up a language-model function which takes in the values from the `Name` column of the `People` table, and outputs a value *exclusively* selected from the `Eras::Years` column." + ], + "metadata": { + "collapsed": false + }, + "id": "e451e810f6f0cf14" + }, + { + "cell_type": "code", + "execution_count": 66, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "\u001B[36mExecuting \u001B[96m `{{LLMMap('In which time period did the person live?', 'People::Name', options='Eras::Years')}}`...\u001B[39m\n", + "\u001B[90mUsing options '['2000-Now', '1900-2000', '1800-1900']'\u001B[39m\n", + "Making calls to Model with batch_size 5: |\u001B[36m \u001B[39m| 3/? [00:01<00:00, 1.85it/s] \n", + "\u001B[33mFinished LLMMap with values:\n", + "{\n", + " \"Elvis Presley\": \"1900-2000\",\n", + " \"John Quincy Adams\": \"1800-1900\",\n", + " \"James Monroe\": \"1800-1900\",\n", + " \"Elon Musk\": \"2000-Now\",\n", + " \"George Washington\": \"1800-1900\",\n", + " \"Alexander Hamilton\": \"1800-1900\",\n", + " \"James Madison\": \"1800-1900\",\n", + " \"Sabrina Carpenter\": \"2000-Now\",\n", + " \"Thomas Jefferson\": \"1800-1900\",\n", + " \"Charli XCX\": \"2000-Now\"\n", + "}\u001B[39m\n", + "\u001B[36mCombining 1 outputs for table `People`\u001B[39m\n", + "\u001B[36mCreated temp table e88a_People\u001B[39m\n", + "\u001B[92mFinal Query:\n", + "SELECT GROUP_CONCAT(Name, ', ') AS \"Names\", \"e88a_People\".\"In which time period did the person live?\" AS \"Lived During Classification\" FROM \"e88a_People\" GROUP BY \"Lived During Classification\"\u001B[39m\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "┌───────────────────────────────────────────────────────┬───────────────────────────────┐\n", + "│ Names │ Lived During Classification │\n", + "├───────────────────────────────────────────────────────┼───────────────────────────────┤\n", + "│ George Washington, John Quincy Adams, Thomas Jeffe... │ 1800-1900 │\n", + "│ Sabrina Carpenter, Charli XCX, Elon Musk │ 2000-Now │\n", + "│ Michelle Obama, Elvis Presley │ 1900-2000 │\n", + "└───────────────────────────────────────────────────────┴───────────────────────────────┘\n" + ] + } + ], + "source": [ + "smoothie = blend(\"\"\"\n", + "SELECT GROUP_CONCAT(Name, ', ') AS 'Names',\n", + "{{LLMMap('In which time period did the person live?', 'People::Name', options='Eras::Years')}} AS \"Lived During Classification\"\n", + "FROM People\n", + "GROUP BY \"Lived During Classification\"\n", + "\"\"\")\n", + "print(smoothie.df)" + ], + "metadata": { + "collapsed": false, + "ExecuteTime": { + "end_time": "2024-10-26T19:51:21.753413Z", + "start_time": "2024-10-26T19:51:19.683975Z" + } + }, + "id": "4d53d9c2de48693f" + }, + { + "cell_type": "markdown", + "source": [ + "## Constrained Decoding - The Presidents Challenge\n", + "Why does constrained decoding matter? Imagine we want to select all the information we have in our table about the first 3 presidents of the U.S. \n", + "In the absence of relevant data stored in our database, we turn to our language model. But one thing thwarts our plans - the language model doesn't know that we've stored the 2nd president's name in our database as `'John Quincy Adams'`, not `'John Adams'`." + ], + "metadata": { + "collapsed": false + }, + "id": "a1be8d0ae27266c3" + }, + { + "cell_type": "code", + "execution_count": 23, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "\u001B[36mExecuting \u001B[96m `{{LLMQA('First 3 presidents of the U.S?', output_type='List[str]')}}`...\u001B[39m\n", + "\u001B[92mFinal Query:\n", + "SELECT * FROM People WHERE People.Name IN ('George Washington','John Adams','Thomas Jefferson') \u001B[39m\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "┌───────────────────┬───────────────────────────────────────────────────────┐\n", + "│ Name │ Known_For │\n", + "├───────────────────┼───────────────────────────────────────────────────────┤\n", + "│ George Washington │ Established federal government, First U.S. Preside... │\n", + "│ Thomas Jefferson │ Louisiana Purchase, Declaration of Independence │\n", + "└───────────────────┴───────────────────────────────────────────────────────┘\n" + ] + } + ], + "source": [ + "# Setting `infer_gen_constraints=False` - otherwise, this counter-example would work\n", + "smoothie = blend(\"\"\"\n", + "SELECT * FROM People\n", + "WHERE People.Name IN {{LLMQA('First 3 presidents of the U.S?', output_type='List[str]')}}\n", + "\"\"\", infer_gen_constraints=False)\n", + "# The final query 'SELECT * FROM People WHERE Name IN ('George Washington','John Adams','Thomas Jefferson')' only yields 2 rows\n", + "print(smoothie.df)" + ], + "metadata": { + "collapsed": false, + "ExecuteTime": { + "end_time": "2024-10-26T19:22:47.765258Z", + "start_time": "2024-10-26T19:22:47.113298Z" + } + }, + "id": "ea77a9ee21a00efc" + }, + { + "cell_type": "markdown", + "source": [ + "Constrained decoding comes to our rescue. By specifying `infer_gen_constraints=True` (which is the default), BlendSQL infers from the surrounding SQL syntax that we expect a value from `People.Name`, and we force the generation to only select from values present in the `Name` column - which leads to the expected response." + ], + "metadata": { + "collapsed": false + }, + "id": "ed4e14cb4135c095" + }, + { + "cell_type": "code", + "execution_count": 38, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "\u001B[36mExecuting \u001B[96m `{{LLMQA('First 3 presidents of the U.S?')}}`...\u001B[39m\n", + "\u001B[90mUsing options '{'George Washington', 'James Monroe', 'Thomas Jefferson', 'James Madison', 'John Quincy Adams', 'Michelle Obama', 'Elon Musk', 'Charli XCX', 'Elvis Presley', 'Alexander Hamilton', 'Sabrina Carpenter'}'\u001B[39m\n", + "\u001B[92mFinal Query:\n", + "SELECT * FROM People WHERE People.Name IN ('George Washington','John Quincy Adams','James Monroe') \u001B[39m\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "┌───────────────────┬───────────────────────────────────────────────────────┐\n", + "│ Name │ Known_For │\n", + "├───────────────────┼───────────────────────────────────────────────────────┤\n", + "│ George Washington │ Established federal government, First U.S. Preside... │\n", + "│ John Quincy Adams │ XYZ Affair, Alien and Sedition Acts │\n", + "│ James Monroe │ Monroe Doctrine, Missouri Compromise │\n", + "└───────────────────┴───────────────────────────────────────────────────────┘\n" + ] + } + ], + "source": [ + "smoothie = blend(\"\"\"\n", + "SELECT * FROM People\n", + "WHERE People.Name IN {{LLMQA('First 3 presidents of the U.S?')}}\n", + "\"\"\", infer_gen_constraints=True)\n", + "print(smoothie.df)" + ], + "metadata": { + "collapsed": false, + "ExecuteTime": { + "end_time": "2024-10-26T19:28:12.632772Z", + "start_time": "2024-10-26T19:28:11.881862Z" + } + }, + "id": "b59038996327dc7" + }, + { + "cell_type": "markdown", + "source": [ + "## Constrained Decoding - The Alphabet Challenge\n", + "\n", + "In BlendSQL, we can utilize the power of constrained decoding to guide a language model's generation towards the structure we expect. In other words, rather than taking a \"prompt-and-pray\" approach in which we meticulously craft a natural language prompt which (hopefully) generates a list of 3 strings, we can interact with the logit space to ensure this is the case1.\n", + "\n", + "> [!NOTE] \n", + "> These guarantees are only made possible with open models, i.e. where we can access the underlying logits. For closed-models like OpenAI and Anthropic, we rely on prompting (i.e. 'Datatype: List[str]') and make predictions \"optimistically\"\n", + "\n", + "To demonstrate this, we can use the `LLMQA` ingredient. This ingredient optionally takes in a table subset as context, and returns either a scalar value or a list of scalars. \n", + "\n", + "Since BlendSQL can infer the shape of a valid generation according to the surrounding SQL syntax, when we use the `LLMQA` ingredient in a `VALUES` or `IN` clause, it will generate a list by default." + ], + "metadata": { + "collapsed": false + }, + "id": "8563f8913cefc01e" + }, + { + "cell_type": "code", + "execution_count": 26, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "\u001B[36mExecuting \u001B[96m `{{LLMQA('What are the first letters of the alphabet?')}}`...\u001B[39m\n", + "\u001B[92mFinal Query:\n", + "SELECT * FROM (VALUES ( 'A' ))\u001B[39m\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "┌────────┐\n", + "│ col0 │\n", + "├────────┤\n", + "│ A │\n", + "└────────┘\n" + ] + } + ], + "source": [ + "smoothie = blend(\"\"\"\n", + "SELECT * FROM ( VALUES {{LLMQA('What are the first letters of the alphabet?')}} )\n", + "\"\"\")\n", + "print(smoothie.df)" + ], + "metadata": { + "collapsed": false, + "ExecuteTime": { + "end_time": "2024-10-26T19:23:12.364987Z", + "start_time": "2024-10-26T19:23:11.956900Z" + } + }, + "id": "b6e9868188f0c338" + }, + { + "cell_type": "markdown", + "source": [ + "Ok, so we were able to generate the first letter of the alphabet... what if we want more? \n", + "\n", + "Rather than modify the prompt itself (which can be quite finicky), we can leverage the regex-inspired `modifier` argument. This will take either the strings `'*'` (zero-or-more) or `'+'` (one-or-more), in addition to tighter bounds of `'{3}'` (exactly 3) or `'{1,6}'` (between 1 and 6)." + ], + "metadata": { + "collapsed": false + }, + "id": "ea1ac65d394cf1e0" + }, + { + "cell_type": "code", + "execution_count": 27, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "\u001B[36mExecuting \u001B[96m `{{LLMQA('What are the first letters of the alphabet?', modifier='{3}')}}`...\u001B[39m\n", + "\u001B[92mFinal Query:\n", + "SELECT * FROM (VALUES ( 'A','B','C' ))\u001B[39m\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "┌────────┬────────┬────────┐\n", + "│ col0 │ col1 │ col2 │\n", + "├────────┼────────┼────────┤\n", + "│ A │ B │ C │\n", + "└────────┴────────┴────────┘\n" + ] + } + ], + "source": [ + "smoothie = blend(\"\"\"\n", + "SELECT * FROM ( VALUES {{LLMQA('What are the first letters of the alphabet?', modifier='{3}')}} )\n", + "\"\"\")\n", + "print(smoothie.df)" + ], + "metadata": { + "collapsed": false, + "ExecuteTime": { + "end_time": "2024-10-26T19:23:16.653773Z", + "start_time": "2024-10-26T19:23:16.147895Z" + } + }, + "id": "5d186b8588d96ac1" + }, + { + "cell_type": "markdown", + "source": [ + "What if we want to generate the letters of a different alphabet? We can use the `options` argument for this, which takes either a reference to another column in the form `'tablename::columnname'`, or a set of semicolon-separated strings." + ], + "metadata": { + "collapsed": false + }, + "id": "55e0405c455ecb9" + }, + { + "cell_type": "code", + "execution_count": 28, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "\u001B[36mExecuting \u001B[96m `{{LLMQA('What are the first letters of the alphabet?', options='α;β;γ;δ', modifier='{3}')}}`...\u001B[39m\n", + "\u001B[90mUsing options '{'γ', 'δ', 'β', 'α'}'\u001B[39m\n", + "\u001B[92mFinal Query:\n", + "SELECT * FROM (VALUES ( 'α','β','γ' ))\u001B[39m\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "┌────────┬────────┬────────┐\n", + "│ col0 │ col1 │ col2 │\n", + "├────────┼────────┼────────┤\n", + "│ α │ β │ γ │\n", + "└────────┴────────┴────────┘\n" + ] + } + ], + "source": [ + "smoothie = blend(\"\"\"\n", + "SELECT * FROM ( VALUES {{LLMQA('What are the first letters of the alphabet?', options='α;β;γ;δ', modifier='{3}')}} )\n", + "\"\"\")\n", + "print(smoothie.df)" + ], + "metadata": { + "collapsed": false, + "ExecuteTime": { + "end_time": "2024-10-26T19:23:20.332521Z", + "start_time": "2024-10-26T19:23:19.812117Z" + } + }, + "id": "7ffc171b4f97bf75" + }, + { + "cell_type": "markdown", + "source": [ + "### Agent-Based Inference with CTE Expressions\n", + "The above example opens up the opportunity to rewrite the query as more of an agent-based flow. SQL is a bit odd in that it's executed bottom-up, i.e. to execute the following query:\n", + "```sql\n", + "SELECT the_answer FROM final_table WHERE final_table.x IN \n", + " (SELECT some_field FROM initial_table)\n", + "```\n", + "...We first gather `some_field` from `initial_table`, and *then* go and fetch `the_answer`, despite the author (human or AI) having written the second step, first. This is similar to the point made by [Google in the pipe-syntax paper](https://research.google/pubs/sql-has-problems-we-can-fix-them-pipe-syntax-in-sql/) about how SQL syntactic clause order doesn't match semantic evaluation order.\n", + "\n", + "At the end of the day, we have two agents performing the following tasks - \n", + "1) Brainstorm some greek letters\n", + "2) Using the output of the previous task, select only the first 3 \n", + "\n", + "With BlendSQL, we can use common table expressions (CTEs) to more closely mimic this order of 'agents'." + ], + "metadata": { + "collapsed": false + }, + "id": "1e9841576ad74398" + }, + { + "cell_type": "code", + "execution_count": 29, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "\u001B[36mExecuting \u001B[96m `{{ LLMQA( 'What is the first letter of the alphabet?', options=(SELECT * FROM letter_agent_output) )}}`...\u001B[39m\n", + "\u001B[36mExecuting `SELECT * FROM (VALUES ({{LLMQA('List some greek letters')}}))` and setting to `letter_agent_output`\u001B[39m\n", + "\u001B[36mExecuting \u001B[96m `{{LLMQA('List some greek letters')}}`...\u001B[39m\n", + "\u001B[92mFinal Query:\n", + "SELECT * FROM (VALUES ( 'alpha','beta','gamma','delta','epsilon','zeta','eta','theta','iota','kappa','lambda','mu','nu','xi','omicron','pi','rho','sigma','tau','upsilon','phi','chi','psi','omega' ))\u001B[39m\n", + "\u001B[36mCreated temp table letter_agent_output\u001B[39m\n", + "\u001B[33mNo BlendSQL ingredients found in query:\u001B[39m\n", + "\u001B[93mSELECT * FROM letter_agent_output\u001B[39m\n", + "\u001B[33mExecuting as vanilla SQL...\u001B[39m\n", + "\u001B[90mUsing options '{'mu', 'sigma', 'kappa', 'lambda', 'eta', 'rho', 'gamma', 'theta', 'nu', 'phi', 'iota', 'zeta', 'psi', 'omega', 'upsilon', 'beta', 'xi', 'tau', 'pi', 'delta', 'chi', 'alpha', 'omicron', 'epsilon'}'\u001B[39m\n", + "\u001B[92mFinal Query:\n", + "SELECT 'alpha' \u001B[39m\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "┌───────────┐\n", + "│ 'alpha' │\n", + "├───────────┤\n", + "│ alpha │\n", + "└───────────┘\n" + ] + } + ], + "source": [ + "smoothie = blend(\"\"\"\n", + "WITH letter_agent_output AS (\n", + " SELECT * FROM (VALUES {{LLMQA('List some greek letters')}})\n", + ") SELECT {{\n", + " LLMQA(\n", + " 'What is the first letter of the alphabet?', \n", + " options=(SELECT * FROM letter_agent_output)\n", + " )}}\n", + "\"\"\")\n", + "print(smoothie.df)" + ], + "metadata": { + "collapsed": false, + "ExecuteTime": { + "end_time": "2024-10-26T19:23:25.695364Z", + "start_time": "2024-10-26T19:23:23.162246Z" + } + }, + "id": "ef5ffc11ff592e30" + }, + { + "cell_type": "markdown", + "source": [ + "## Using `output_type` to Influence Generation\n", + "BlendSQL does its best to infer datatytpes given surrounding syntax. Sometimes, though, the user may want to override those assumptions, or inject new ones that were unable to be inferred.\n", + "\n", + "The `output_type` argument takes a Python-style type annotation like `int`, `str`, `bool` or `float`. Below we use that to guide the generation towards one-or-more integer. " + ], + "metadata": { + "collapsed": false + }, + "id": "449a7d3d01dea38b" + }, + { + "cell_type": "code", + "execution_count": 30, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "\u001B[36mExecuting \u001B[96m `{{LLMQA('Count up, starting from 1', output_type='int', modifier='+')}}`...\u001B[39m\n", + "\u001B[90mUsing regex '(\\d{1,18})'\u001B[39m\n", + "\u001B[92mFinal Query:\n", + "SELECT * FROM (VALUES ( '1','2' ))\u001B[39m\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "┌────────┬────────┐\n", + "│ col0 │ col1 │\n", + "├────────┼────────┤\n", + "│ 1 │ 2 │\n", + "└────────┴────────┘\n" + ] + } + ], + "source": [ + "smoothie = blend(\"\"\"\n", + "SELECT * FROM ( VALUES {{LLMQA('Count up, starting from 1', output_type='int', modifier='+')}} )\n", + "\"\"\")\n", + "print(smoothie.df)" + ], + "metadata": { + "collapsed": false, + "ExecuteTime": { + "end_time": "2024-10-26T19:23:28.601366Z", + "start_time": "2024-10-26T19:23:28.200234Z" + } + }, + "id": "658ce9487a5b4485" + }, + { + "cell_type": "markdown", + "source": [ + "## RAG for Unstructured Reasoning\n", + "In addition to using the `LLMQA` ingredient as a method for generating with tight syntax-aware constraints, we can also relax a bit and let the model give us an unstructured generation for things like summarization.\n", + "\n", + "Also, we can use the `context` argument to provide relevant table context. This allows us to condition generation on a curated set of data (and do cool stuff with nested reasoning)." + ], + "metadata": { + "collapsed": false + }, + "id": "75a215007dbff91b" + }, + { + "cell_type": "code", + "execution_count": 5, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "\u001B[36mExecuting \u001B[96m `{{ LLMQA( 'Give me a very short summary of this person', context=( SELECT * FROM People WHERE People.Name = {{LLMQA('Who has a musical by Lin-Manuel Miranda written about them?', options='People::Name')}} ) ) }}`...\u001B[39m\n", + "\u001B[36mExecuting \u001B[96m `{{LLMQA ( 'Who has a musical by Lin-Manuel Miranda written about them?' , options= 'People::Name' ) }}`...\u001B[39m\n", + "\u001B[90mUsing options '{'Elon Musk', 'James Madison', 'Elvis Presley', 'Thomas Jefferson', 'Sabrina Carpenter', 'Michelle Obama', 'John Quincy Adams', 'Alexander Hamilton', 'James Monroe', 'Charli XCX', 'George Washington'}'\u001B[39m\n", + "\u001B[92mFinal Query:\n", + "SELECT * FROM People WHERE People.Name = 'Alexander Hamilton' \u001B[39m\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Warning: can't backtrack over \" Question‧:\"; this may confuse the model\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "\u001B[92mFinal Query:\n", + "SELECT 'Founder of national bank, author of Federalist Papers ' AS \"Summary\"\u001B[39m\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "┌───────────────────────────────────────────────────────┐\n", + "│ Summary │\n", + "├───────────────────────────────────────────────────────┤\n", + "│ Founder of national bank, author of Federalist Pap... │\n", + "└───────────────────────────────────────────────────────┘\n" + ] + } + ], + "source": [ + "# Give a short summary of the person who had a musical by Lin-Manuel Miranda written about them\n", + "smoothie = blend(\"\"\"\n", + "SELECT {{\n", + " LLMQA(\n", + " 'Give me a very short summary of this person', \n", + " context=(\n", + " SELECT * FROM People \n", + " WHERE People.Name = {{LLMQA('Who has a musical by Lin-Manuel Miranda written about them?', options='People::Name')}}\n", + " )\n", + " )\n", + "}} AS \"Summary\"\n", + "\"\"\")\n", + "print(smoothie.df)" + ], + "metadata": { + "collapsed": false, + "ExecuteTime": { + "end_time": "2024-10-26T20:02:38.732210Z", + "start_time": "2024-10-26T20:02:37.215387Z" + } + }, + "id": "74b6331ba74b95d8" + }, + { + "cell_type": "code", + "execution_count": 35, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "\u001B[36mExecuting \u001B[96m `{{LLMQA('Who wrote the song Espresso?')}}`...\u001B[39m\n", + "\u001B[36mExecuting `SELECT Name AS Name FROM People WHERE {{LLMMap('Is a singer?', 'People::Name')}} = TRUE` and setting to `Musicians`\u001B[39m\n", + "\u001B[36mExecuting \u001B[96m `{{LLMMap('Is a singer?', 'People::Name')}}`...\u001B[39m\n", + "When inferring `options` in infer_gen_kwargs, encountered a column node with no table specified!\n", + "Should probably mark `schema_qualify` arg as True\n", + "\u001B[90mUsing regex '(t|f)'\u001B[39m\n", + "Making calls to Model with batch_size 5: |\u001B[36m \u001B[39m| 3/? [00:01<00:00, 2.69it/s] \n", + "\u001B[33mFinished LLMMap with values:\n", + "{\n", + " \"Elvis Presley\": true,\n", + " \"Michelle Obama\": false,\n", + " \"George Washington\": false,\n", + " \"Alexander Hamilton\": false,\n", + " \"John Quincy Adams\": false,\n", + " \"James Monroe\": false,\n", + " \"Elon Musk\": false,\n", + " \"James Madison\": false,\n", + " \"Sabrina Carpenter\": true,\n", + " \"Thomas Jefferson\": false\n", + "}\u001B[39m\n", + "\u001B[36mCombining 1 outputs for table `People`\u001B[39m\n", + "\u001B[36mCreated temp table 9f31_People\u001B[39m\n", + "\u001B[92mFinal Query:\n", + "SELECT Name AS Name FROM \"9f31_People\" WHERE \"9f31_People\".\"Is a singer?\" = TRUE\u001B[39m\n", + "\u001B[36mCreated temp table Musicians\u001B[39m\n", + "\u001B[90mUsing options '{'Sabrina Carpenter', 'Charli XCX', 'Elvis Presley'}'\u001B[39m\n", + "\u001B[92mFinal Query:\n", + "SELECT Musicians.Name AS \"Espresso Singer\" FROM Musicians WHERE Musicians.Name = 'Sabrina Carpenter' \u001B[39m\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "┌───────────────────┐\n", + "│ Espresso Singer │\n", + "├───────────────────┤\n", + "│ Sabrina Carpenter │\n", + "└───────────────────┘\n" + ] + } + ], + "source": [ + "# A two-step reasoning problem:\n", + "# 1) Identify who, out of the table, is a singer using `LLMMap`\n", + "# 2) Where the previous step yields `TRUE`, select the one that wrote the song Espresso.\n", + "smoothie = blend(\"\"\"\n", + "WITH Musicians AS \n", + "(SELECT Name FROM People WHERE {{LLMMap('Is a singer?', 'People::Name')}} = TRUE)\n", + "SELECT Name AS \"Espresso Singer\" FROM Musicians WHERE Musicians.Name = {{LLMQA('Who wrote the song Espresso?')}}\n", + "\"\"\")\n", + "print(smoothie.df)" + ], + "metadata": { + "collapsed": false, + "ExecuteTime": { + "end_time": "2024-10-26T19:24:56.413033Z", + "start_time": "2024-10-26T19:24:54.959564Z" + } + }, + "id": "1a089867c40ebc41" + }, + { + "cell_type": "markdown", + "source": [ + "## Internet-Connected RAG \n", + "So we know how to use a table subset as a context, by writing subqueries. But what if the knowledge we need to answer a question isn't present in the universe of our table?\n", + "\n", + "For this, we have the `RAGQA` ingredient (retrieval-augmented generation question-answering). Currently it only supports Bing via Azure as a source, but the idea is that in the future, it will support more forms of unstructured retrieval. " + ], + "metadata": { + "collapsed": false + }, + "id": "14c549c0bc8e5563" + }, + { + "cell_type": "markdown", + "source": [ + "Let's ask a question that requires a bit more world-knowledge to answer." + ], + "metadata": { + "collapsed": false + }, + "id": "b18b91327f1a9874" + }, + { + "cell_type": "code", + "execution_count": 91, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "\u001B[36mExecuting \u001B[96m `{{LLMQA(\"Who's birthday is June 28, 1971?\")}}`...\u001B[39m\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Warning: can't backtrack over \"\\n\"; this may confuse the model\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "\u001B[92mFinal Query:\n", + "SELECT 'Not specified in the context' AS \"Answer\"\u001B[39m\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "┌──────────────────────────────┐\n", + "│ Answer │\n", + "├──────────────────────────────┤\n", + "│ Not specified in the context │\n", + "└──────────────────────────────┘\n" + ] + } + ], + "source": [ + "smoothie = blend(\"\"\"\n", + "SELECT {{LLMQA(\"Who's birthday is June 28, 1971?\")}} AS \"Answer\"\n", + "\"\"\")\n", + "print(smoothie.df)" + ], + "metadata": { + "collapsed": false, + "ExecuteTime": { + "end_time": "2024-10-26T18:22:41.144450Z", + "start_time": "2024-10-26T18:22:40.678444Z" + } + }, + "id": "f1a798ba024c1154" + }, + { + "cell_type": "markdown", + "source": [ + "Ok, that's fair.\n", + "\n", + "Now let's try again, using constrained decoding via `options` and using the `RAGQA` ingredient to fetch relevant context via a Bing web search first." + ], + "metadata": { + "collapsed": false + }, + "id": "5b006708b577b4b3" + }, + { + "cell_type": "code", + "execution_count": 92, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "\u001B[36mUnpacked alias `\u001B[96m{{RAGQA(\"Who's birthday is June 28, 1971?\", source='bing', options='People::Name')}}\u001B[36m` to `\u001B[96m\n", + "{{\n", + " LLMQA(\n", + " \"Who's birthday is June 28, 1971?\", \n", + " (\n", + " SELECT {{\n", + " BingWebSearch(\"Who's birthday is June 28, 1971?\")\n", + " }} AS \"Search Results\"\n", + " ), options='People::Name'\n", + " )\n", + "}}\n", + "`\u001B[39m\n", + "\u001B[36mExecuting \u001B[96m `{{RAGQA(\"Who's birthday is June 28, 1971?\", source='bing', options='People::Name')}}`...\u001B[39m\n", + "When inferring `options` in infer_gen_kwargs, encountered a column node with no table specified!\n", + "Should probably mark `schema_qualify` arg as True\n", + "\u001B[36mExecuting \u001B[96m `{{ BingWebSearch ( \"Who's birthday is June 28, 1971?\" ) }}`...\u001B[39m\n", + "\u001B[92mFinal Query:\n", + "SELECT '## DOCUMENT 1\n", + "\n", + "Elon Reeve Musk was born on June 28, 1971, in Pretoria, South Africa''s administrative capital. [7] [8] He is of British and Pennsylvania Dutch ancestry.[9] [10] His mother, Maye (née Haldeman), is a model and dietitian born in Saskatchewan, Canada, and raised in South Africa.[11] [12] [13] His father, Errol Musk, is a South African electromechanical engineer, pilot, sailor, consultant ...\n", + "\n", + "## DOCUMENT 2\n", + "\n", + "Weekday: June 28th, 1971 was a Monday. People born on June 28th, 1971 turned 53 this year (2024). Birthdays of famous people, actors, celebrities and stars on June 28th. With 365 days 1971 is a normal year and no leap year.' AS \"Search Results\"\u001B[39m\n", + "\u001B[90mUsing options '{'Thomas Jefferson', 'Charli XCX', 'James Madison', 'Sabrina Carpenter', 'Michelle Obama', 'John Quincy Adams', 'James Monroe', 'George Washington', 'Elvis Presley', 'Elon Musk', 'Alexander Hamilton'}'\u001B[39m\n", + "\u001B[92mFinal Query:\n", + "SELECT * FROM People WHERE Name = 'Elon Musk' \u001B[39m\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "┌───────────┬──────────────────────────────────────┐\n", + "│ Name │ Known_For │\n", + "├───────────┼──────────────────────────────────────┤\n", + "│ Elon Musk │ Tesla, SpaceX, Twitter/X acquisition │\n", + "└───────────┴──────────────────────────────────────┘\n" + ] + } + ], + "source": [ + "smoothie = blend(\"\"\"\n", + "SELECT * FROM People WHERE Name = {{RAGQA(\"Who's birthday is June 28, 1971?\", source='bing', options='People::Name')}}\n", + "\"\"\")\n", + "print(smoothie.df)" + ], + "metadata": { + "collapsed": false, + "ExecuteTime": { + "end_time": "2024-10-26T18:23:15.641730Z", + "start_time": "2024-10-26T18:23:14.521040Z" + } + }, + "id": "882c37cb31e10085" + }, + { + "cell_type": "markdown", + "source": [ + "Nice! Elon Musk was indeed born on June 28th, 1971. You can check out the BlendSQL logs above to validate this given the web context." + ], + "metadata": { + "collapsed": false + }, + "id": "7eb6f2202332c596" + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [], + "metadata": { + "collapsed": false + }, + "id": "a98dc2315cbd7cd4" + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 2 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython2", + "version": "2.7.6" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +}