26 Oct 21:37

parkervg

fc4f9f8

v0.0.31 Latest

Latest

Bugfixes from previous release.

We need to set --prerelease=allow to use the guidance AzurePhi serverside control.

Full Changelog: v0.0.30...v0.0.31

Assets 2

26 Oct 21:17

parkervg

v0.0.30

818f294

v0.0.30

🧠 Smarter `LLMQA` with `modifier` Arg

As described in blendsql-by-example.ipynb, LLMQA can now generate constrained lists. This means the following query is valid:

SELECT * FROM People
WHERE People.Name IN {{LLMQA('First 3 presidents of the U.S?')}}

Or, even pseudo-agent-based processing like this:

WITH letter_agent_output AS (
    SELECT * FROM (VALUES {{LLMQA('List some greek letters', modifier='{3}')}})
) SELECT {{
LLMQA(
    'What is the first letter of the alphabet?', 
    options=(SELECT * FROM letter_agent_output)
)}}

Additionally, the AzurePhi model allows for easy constrained decoding with a larger model, powered by guidance's server-side Azure AI integration: https://github.com/guidance-ai/guidance?tab=readme-ov-file#azure-ai

What's Changed

_dialect.py Re-Work, modifier Argument for LLMQA, Documentation updates by @parkervg in #35

Full Changelog: v0.0.29...v0.0.30

Contributors

parkervg

Assets 2

18 Oct 14:01

parkervg

v0.0.29

e2072c2

v0.0.29

Added the ability to configure maximum concurrent async OpenAI/Anthropic calls via:

import blendql

# Optionally set how many async calls to allow concurrently
# This depends on your OpenAI/Anthropic/etc. rate limits
blendsql.config.set_async_limit(10)

The default is 10.

Full Changelog: v0.0.28...v0.0.29

Assets 2

18 Oct 01:52

parkervg

v0.0.28

a1df2e6

v0.0.28

⚡ Async Batch Calls for `LLMMap`

This release adds async batch processing by default for the LLMMap ingredient. Currently, this means that usage of OpenaiLLM and AnthropicLLM classes in a LLMMap call will be much quicker, especially when the database context is large, or our batch_size is small.

For example, taking this query from the README:

SELECT "Name",
{{ImageCaption('parks::Image')}} as "Image Description", 
{{
    LLMMap(
        question='Size in km2?',
        context='parks::Area'
    )
}} as "Size in km" FROM parks
WHERE "Location" = 'Alaska'
ORDER BY "Size in km" DESC LIMIT 1

And assuming we've initialized our LLMMap ingredient via LLMMap.from_args(batch_size=1, k=0), meaning we are retrieving 0 few-shot examples per prompt (i.e. zero-shot learning), then we have 2 total values to map onto, since 2 parks meet our criteria where "Location" = 'Alaska'.

With this update, we pass the two prompts into our OpenAI or Anthropic endpoint asynchronously:

Given a set of values from a database, answer the question row-by-row, in order.
Your outputs should be separated by ';'.

Question: Size in km2?
Source table: parks
Source column: Area

Values:
7,523,897.45 acres (30,448.1 km2)

Given a set of values from a database, answer the question row-by-row, in order.
Your outputs should be separated by ';'.

Question: Size in km2?
Source table: parks
Source column: Area

Values:
3,674,529.33 acres (14,870.3 km2)

Of course, the effects of this async processing will be felt more when we need to pass many values to the LLMMap function.

Full Changelog: v0.0.27...v0.0.28

Assets 2

16 Oct 00:16

parkervg

v0.0.27

c8e3058

v0.0.27

Few-Shot Prompting + Retrieval for Ingredients

This release includes many new updates, most notably an interface allowing you to define custom few-shot examples for ingredient functions and dynamically retrieve the most relevant examples at runtime via a haystack-based retriever.

For example:

from blendsql import blend, LLMQA
from blendsql.ingredients.builtin import DEFAULT_QA_FEW_SHOT

ingredients = {
    LLMQA.from_args(
        few_shot_examples=[
            *DEFAULT_QA_FEW_SHOT,
            {
                "question": "Which weighs the most?",
                "context": {
                    {
                        "Animal": ["Dog", "Gorilla", "Hamster"],
                        "Weight": ["20 pounds", "350 lbs", "100 grams"]
                    }
                },
                "answer": "Gorilla",
                # Below are optional
                "options": ["Dog", "Gorilla", "Hamster"]
            }
        ],
        # Will fetch `k` most relevant few-shot examples using embedding-based retriever
        k=2,
        # Lambda to turn the pd.DataFrame to a serialized string
        context_formatter=lambda df: df.to_markdown(
            index=False
        )
    )
}
smoothie = blend(
    query=blendsql,
    db=db,
    ingredients=ingredients,
    default_model=model,
)

See this section in the README for more information.

Ingredients rework by @parkervg in #34

Full Changelog: v0.0.26...v0.0.27

Contributors

parkervg

Assets 2

26 Sep 20:59

parkervg

v0.0.26

3fa7bd8

v0.0.26

Full Changelog: v0.0.25...v0.0.26

Assets 2

26 Sep 20:45

parkervg

v0.0.25

375e64e

Fixing `ModuleNotFoundError` on pypi install

Full Changelog: v0.0.23...v0.0.25

Assets 2

01 Sep 22:36

parkervg

v0.0.23

4d739d9

v0.0.23

Speeding Things Up ⚡

This release moves the underlying constrained decoding engine from outlines to guidance.
The compilation of constraints + tokens to a FSM (finite-state machine) used in the outlines approach turned out to be a bottleneck for many BlendSQL operations. Instead, the trie-based guidance approach runs quicker in these settings where the constraints aren't known ahead of time, as is the case in many BlendSQL ingredients.

Below are the old/new runtimes for the benchmarks, using HuggingFaceTB/SmolLM-135M.

Before:

Task	Average Runtime	# Unique Queries
financials	0.0427749	7
rugby	3.54232	4
national_parks	2.63405	5
1966_nba_draft	3.65771	2

After:

Task	Average Runtime	# Unique Queries
financials	0.0487881	7
rugby	0.909974	4
national_parks	2.13209	5
1966_nba_draft	1.39948	2

Anthropic models are also now supported.

What's Changed

Back to guidance by @parkervg in #30

Full Changelog: v0.0.21...v0.0.22

Contributors

parkervg

Assets 2

05 Jul 19:14

parkervg

v0.0.21

941937a

v0.0.21

What's Changed

unpack_options(), adding options argument to MapIngredient by @parkervg in #28
With the release of skrub 0.2.0, we can now support python 3.9

Full Changelog: v0.0.20...v0.0.21

Contributors

parkervg

Assets 2

23 Jun 00:52

parkervg

v0.0.20

a147549

v0.0.20

What's Changed

#27

On queries like benchmarks/national_parks/q05.sql:

SELECT {{ImageCaption('parks::Image')}} as "Image Description" FROM parks LIMIT 1

We apply the LIMIT clause (and any other surrounding filters) prior to calling the ImageCaption ingredient

Benchmark Results:

Before:

Task	Average Runtime	# Unique Queries
financials	0.0402491	7
rugby	0.323255	4
national_parks	2.07314	5
1966_nba_draft	0.119926	2

After:

Task	Average Runtime	# Unique Queries
financials	0.040797	7
rugby	0.319473	4
national_parks	0.865904	5
1966_nba_draft	0.115434	2

Full Changelog: v0.0.19...v0.0.20

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🧠 Smarter `LLMQA` with `modifier` Arg

What's Changed

Contributors

⚡ Async Batch Calls for `LLMMap`

Few-Shot Prompting + Retrieval for Ingredients

Contributors

Speeding Things Up ⚡

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Benchmark Results:

Releases: parkervg/blendsql

v0.0.31

v0.0.30

🧠 Smarter LLMQA with modifier Arg

What's Changed

Contributors

v0.0.29

v0.0.28

⚡ Async Batch Calls for LLMMap

v0.0.27

Few-Shot Prompting + Retrieval for Ingredients

Contributors

v0.0.26

Fixing `ModuleNotFoundError` on pypi install

v0.0.23

Speeding Things Up ⚡

What's Changed

Contributors

v0.0.21

What's Changed

Contributors

v0.0.20

What's Changed

Benchmark Results:

🧠 Smarter `LLMQA` with `modifier` Arg

⚡ Async Batch Calls for `LLMMap`