Releases: parkervg/blendsql
v0.0.31
v0.0.30
🧠 Smarter LLMQA
with modifier
Arg
As described in blendsql-by-example.ipynb, LLMQA
can now generate constrained lists. This means the following query is valid:
SELECT * FROM People
WHERE People.Name IN {{LLMQA('First 3 presidents of the U.S?')}}
Or, even pseudo-agent-based processing like this:
WITH letter_agent_output AS (
SELECT * FROM (VALUES {{LLMQA('List some greek letters', modifier='{3}')}})
) SELECT {{
LLMQA(
'What is the first letter of the alphabet?',
options=(SELECT * FROM letter_agent_output)
)}}
Additionally, the AzurePhi
model allows for easy constrained decoding with a larger model, powered by guidance's server-side Azure AI integration: https://github.com/guidance-ai/guidance?tab=readme-ov-file#azure-ai
What's Changed
Full Changelog: v0.0.29...v0.0.30
v0.0.29
Added the ability to configure maximum concurrent async OpenAI/Anthropic calls via:
import blendql
# Optionally set how many async calls to allow concurrently
# This depends on your OpenAI/Anthropic/etc. rate limits
blendsql.config.set_async_limit(10)
The default is 10.
Full Changelog: v0.0.28...v0.0.29
v0.0.28
⚡ Async Batch Calls for LLMMap
This release adds async batch processing by default for the LLMMap
ingredient. Currently, this means that usage of OpenaiLLM
and AnthropicLLM
classes in a LLMMap
call will be much quicker, especially when the database context is large, or our batch_size
is small.
For example, taking this query from the README:
SELECT "Name",
{{ImageCaption('parks::Image')}} as "Image Description",
{{
LLMMap(
question='Size in km2?',
context='parks::Area'
)
}} as "Size in km" FROM parks
WHERE "Location" = 'Alaska'
ORDER BY "Size in km" DESC LIMIT 1
And assuming we've initialized our LLMMap ingredient via LLMMap.from_args(batch_size=1, k=0)
, meaning we are retrieving 0 few-shot examples per prompt (i.e. zero-shot learning), then we have 2 total values to map onto, since 2 parks meet our criteria where "Location" = 'Alaska'
.
With this update, we pass the two prompts into our OpenAI or Anthropic endpoint asynchronously:
Given a set of values from a database, answer the question row-by-row, in order.
Your outputs should be separated by ';'.
Question: Size in km2?
Source table: parks
Source column: Area
Values:
7,523,897.45 acres (30,448.1 km2)
Given a set of values from a database, answer the question row-by-row, in order.
Your outputs should be separated by ';'.
Question: Size in km2?
Source table: parks
Source column: Area
Values:
3,674,529.33 acres (14,870.3 km2)
Of course, the effects of this async processing will be felt more when we need to pass many values to the LLMMap
function.
Full Changelog: v0.0.27...v0.0.28
v0.0.27
Few-Shot Prompting + Retrieval for Ingredients
This release includes many new updates, most notably an interface allowing you to define custom few-shot examples for ingredient functions and dynamically retrieve the most relevant examples at runtime via a haystack-based retriever.
For example:
from blendsql import blend, LLMQA
from blendsql.ingredients.builtin import DEFAULT_QA_FEW_SHOT
ingredients = {
LLMQA.from_args(
few_shot_examples=[
*DEFAULT_QA_FEW_SHOT,
{
"question": "Which weighs the most?",
"context": {
{
"Animal": ["Dog", "Gorilla", "Hamster"],
"Weight": ["20 pounds", "350 lbs", "100 grams"]
}
},
"answer": "Gorilla",
# Below are optional
"options": ["Dog", "Gorilla", "Hamster"]
}
],
# Will fetch `k` most relevant few-shot examples using embedding-based retriever
k=2,
# Lambda to turn the pd.DataFrame to a serialized string
context_formatter=lambda df: df.to_markdown(
index=False
)
)
}
smoothie = blend(
query=blendsql,
db=db,
ingredients=ingredients,
default_model=model,
)
See this section in the README for more information.
Full Changelog: v0.0.26...v0.0.27
v0.0.26
Full Changelog: v0.0.25...v0.0.26
Fixing `ModuleNotFoundError` on pypi install
Full Changelog: v0.0.23...v0.0.25
v0.0.23
Speeding Things Up ⚡
This release moves the underlying constrained decoding engine from outlines to guidance.
The compilation of constraints + tokens to a FSM (finite-state machine) used in the outlines approach turned out to be a bottleneck for many BlendSQL operations. Instead, the trie-based guidance approach runs quicker in these settings where the constraints aren't known ahead of time, as is the case in many BlendSQL ingredients.
Below are the old/new runtimes for the benchmarks, using HuggingFaceTB/SmolLM-135M
.
Before:
Task | Average Runtime | # Unique Queries |
---|---|---|
financials | 0.0427749 | 7 |
rugby | 3.54232 | 4 |
national_parks | 2.63405 | 5 |
1966_nba_draft | 3.65771 | 2 |
After:
Task | Average Runtime | # Unique Queries |
---|---|---|
financials | 0.0487881 | 7 |
rugby | 0.909974 | 4 |
national_parks | 2.13209 | 5 |
1966_nba_draft | 1.39948 | 2 |
Anthropic models are also now supported.
What's Changed
Full Changelog: v0.0.21...v0.0.22
v0.0.21
What's Changed
unpack_options()
, adding options argument to MapIngredient by @parkervg in #28- With the release of skrub 0.2.0, we can now support python 3.9
Full Changelog: v0.0.20...v0.0.21
v0.0.20
What's Changed
On queries like benchmarks/national_parks/q05.sql
:
SELECT {{ImageCaption('parks::Image')}} as "Image Description" FROM parks LIMIT 1
We apply the LIMIT
clause (and any other surrounding filters) prior to calling the ImageCaption
ingredient
Benchmark Results:
Before:
Task | Average Runtime | # Unique Queries |
---|---|---|
financials | 0.0402491 | 7 |
rugby | 0.323255 | 4 |
national_parks | 2.07314 | 5 |
1966_nba_draft | 0.119926 | 2 |
After:
Task | Average Runtime | # Unique Queries |
---|---|---|
financials | 0.040797 | 7 |
rugby | 0.319473 | 4 |
national_parks | 0.865904 | 5 |
1966_nba_draft | 0.115434 | 2 |
Full Changelog: v0.0.19...v0.0.20