From a35701436708c242d3e93539e204fac9c7c308f4 Mon Sep 17 00:00:00 2001 From: Chad Juliano Date: Tue, 20 Feb 2024 20:54:38 -0600 Subject: [PATCH] Adding notebook for Kinetica chat. --- docs/docs/integrations/chat/kinetica.ipynb | 633 ++++++++++++++++++ docs/docs/integrations/providers/kinetica.mdx | 28 + .../integrations/vectorstores/kinetica.ipynb | 11 +- .../chat_models/kinetica.py | 2 +- 4 files changed, 672 insertions(+), 2 deletions(-) create mode 100644 docs/docs/integrations/chat/kinetica.ipynb create mode 100644 docs/docs/integrations/providers/kinetica.mdx diff --git a/docs/docs/integrations/chat/kinetica.ipynb b/docs/docs/integrations/chat/kinetica.ipynb new file mode 100644 index 0000000000000..685e4c5b826ee --- /dev/null +++ b/docs/docs/integrations/chat/kinetica.ipynb @@ -0,0 +1,633 @@ +{ + "cells": [ + { + "cell_type": "raw", + "metadata": {}, + "source": [ + "---\n", + "sidebar_label: Kinetica\n", + "---" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Kinetica SqlAssist LLM Demo\n", + "\n", + "This notebook demonstrates how to use Kinetica to transform natural language into SQL\n", + "and simplify the process of data retrieval. This demo is intended to show the mechanics\n", + "of creating and using a chain as opposed to the capabilities of the LLM.\n", + "\n", + "## Overview\n", + "\n", + "With the Kinetica LLM workflow you create an LLM context in the database that provides\n", + "information needed for infefencing that includes tables, annotations, rules, and\n", + "samples. Invoking ``load_messages_from_context()`` will retrieve the contxt\n", + "information from the database so that it can be used to create a chat prompt.\n", + "\n", + "The chat prompt consists of a ``SystemMessage`` and pairs of\n", + "``HumanMessage``/``AIMessage`` that contain the samples which are question/SQL\n", + "pairs. You can append pairs samples to this list but it is not intended to\n", + "facilitate a typical natural language conversation.\n", + "\n", + "When you create a chain from the chat prompt and execute it, the Kinetica LLM will\n", + "generate SQL from the input. Optionally you can use ``KineticaSqlOutputParser`` to\n", + "execute the SQL and return the result as a dataframe.\n", + "\n", + "Currently, 2 LLM's are supported for SQL generation: \n", + "\n", + "1. **Kinetica SQL-GPT**: This LLM is based on OpenAI ChatGPT API.\n", + "2. **Kinetica SqlAssist**: This LLM is purpose built to integrate with the Kinetica\n", + " database and it can run in a secure customer premise.\n", + "\n", + "For this demo we will be using **SqlAssist**. See the [Kinetica Documentation\n", + "site](https://docs.kinetica.com/7.1/sql-gpt/concepts/) for more information.\n", + "\n", + "## Prerequisites\n", + "\n", + "To get started you will need a Kinetica DB instance. If you don't have one you can\n", + "obtain a [free development instance](https://cloud.kinetica.com/trynow).\n", + "\n", + "You will need to install the following packages..." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Note: you may need to restart the kernel to use updated packages.\n", + "Note: you may need to restart the kernel to use updated packages.\n" + ] + } + ], + "source": [ + "# Install Langchain community and core packages\n", + "%pip install --upgrade --quiet langchain-core langchain-community\n", + "\n", + "# Install Kineitca DB connection package\n", + "%pip install --upgrade --quiet gpudb typeguard\n", + "\n", + "# Install packages needed for this tutorial\n", + "%pip install --upgrade --quiet faker" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Database Connection\n", + "\n", + "You must set the database connection in the following environment variables. If you are using a virtual environment you can set them in the `.env` file of the project:\n", + "* `KINETICA_URL`: Database connection URL\n", + "* `KINETICA_USER`: Database user\n", + "* `KINETICA_PASSWD`: Secure password.\n", + "\n", + "If you can create an instance of `KineticaChatLLM` then you are successfully connected." + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [], + "source": [ + "from langchain_community.chat_models.kinetica import KineticaChatLLM\n", + "kinetica_llm = KineticaChatLLM()\n", + "\n", + "# Test table we will create\n", + "table_name = \"demo.user_profiles\"\n", + "\n", + "# LLM Context we will create\n", + "kinetica_ctx = 'demo.test_llm_ctx'" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create test data\n", + "\n", + "Before we can generate SQL we will need to create a Kinetica table and an LLM context that can inference the table.\n", + "\n", + "### Create some fake user profiles\n", + "\n", + "We will use the `faker` package to create a dataframe with 100 fake profiles." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
usernamenamesexaddressmailbirthdate
id
0eduardo69Haley BeckF59836 Carla Causeway Suite 939\\nPort Eugene, I...meltondenise@yahoo.com1997-09-08
1lbarreraJoshua StephensM3108 Christina Forges\\nPort Timothychester, KY...erica80@hotmail.com1924-05-04
2bburtonPaula KaiserFUnit 7405 Box 3052\\nDPO AE 09858timothypotts@gmail.com1933-09-05
3melissa49Wendy ReeseF6408 Christopher Hill Apt. 459\\nNew Benjamin, ...dadams@gmail.com1988-07-27
4melissacarterManuel RiosM2241 Bell Gardens Suite 723\\nScottside, CA 38463williamayala@gmail.com1930-12-18
\n", + "
" + ], + "text/plain": [ + " username name sex \\\n", + "id \n", + "0 eduardo69 Haley Beck F \n", + "1 lbarrera Joshua Stephens M \n", + "2 bburton Paula Kaiser F \n", + "3 melissa49 Wendy Reese F \n", + "4 melissacarter Manuel Rios M \n", + "\n", + " address mail \\\n", + "id \n", + "0 59836 Carla Causeway Suite 939\\nPort Eugene, I... meltondenise@yahoo.com \n", + "1 3108 Christina Forges\\nPort Timothychester, KY... erica80@hotmail.com \n", + "2 Unit 7405 Box 3052\\nDPO AE 09858 timothypotts@gmail.com \n", + "3 6408 Christopher Hill Apt. 459\\nNew Benjamin, ... dadams@gmail.com \n", + "4 2241 Bell Gardens Suite 723\\nScottside, CA 38463 williamayala@gmail.com \n", + "\n", + " birthdate \n", + "id \n", + "0 1997-09-08 \n", + "1 1924-05-04 \n", + "2 1933-09-05 \n", + "3 1988-07-27 \n", + "4 1930-12-18 " + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import pandas as pd\n", + "from faker import Faker\n", + "from typing import Generator\n", + "\n", + "Faker.seed(5467)\n", + "faker = Faker(locale='en-US')\n", + "\n", + "def profile_gen(count: int) -> Generator:\n", + " for id in range(0, count):\n", + " rec = dict(id=id, **faker.simple_profile())\n", + " rec['birthdate'] = pd.Timestamp(rec['birthdate'])\n", + " yield rec\n", + "\n", + "load_df = pd.DataFrame.from_records(data=profile_gen(100), index='id')\n", + "load_df.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create a Kinetica table from the Dataframe" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
nametypeproperties
0usernamestring[char32]
1namestring[char32]
2sexstring[char1]
3addressstring[char64]
4mailstring[char32]
5birthdatelong[timestamp]
\n", + "
" + ], + "text/plain": [ + " name type properties\n", + "0 username string [char32]\n", + "1 name string [char32]\n", + "2 sex string [char1]\n", + "3 address string [char64]\n", + "4 mail string [char32]\n", + "5 birthdate long [timestamp]" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from gpudb import GPUdbTable\n", + "\n", + "gpudb_table = GPUdbTable.from_df(load_df, db=kinetica_llm.kdbc, \n", + " table_name=table_name, \n", + " clear_table=True,\n", + " load_data=True)\n", + "\n", + "# See the Kinetica column types\n", + "gpudb_table.type_as_df()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create the LLM context\n", + "\n", + "You can create an LLM Context using the Kinetica Workbench UI or you can manually create it with the `CREATE OR REPLACE CONTEXT` syntax. \n", + "\n", + "Here we create a context from the SQL syntax referencing the table we created." + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'status': 'OK',\n", + " 'message': '',\n", + " 'data_type': 'execute_sql_response',\n", + " 'response_time': 0.02299}" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# create an LLM context for the table.\n", + "\n", + "from gpudb import GPUdbException\n", + "\n", + "sql=f\"\"\"\n", + "CREATE OR REPLACE CONTEXT {kinetica_ctx}\n", + "(\n", + " TABLE = demo.test_profiles\n", + " COMMENT = 'Contains user profiles.'\n", + "),\n", + "(\n", + " SAMPLES = (\n", + " 'How many male users are there?' = \n", + " 'select count(1) as num_users\n", + " from demo.test_profiles\n", + " where sex = ''M'';')\n", + ")\n", + "\"\"\"\n", + "\n", + "def _check_error(response: dict) -> None:\n", + " status = response['status_info']['status']\n", + " if (status != 'OK'):\n", + " message = response['status_info']['message']\n", + " raise GPUdbException('[%s]: %s' % (status, message))\n", + "\n", + "response = kinetica_llm.kdbc.execute_sql(sql)\n", + "_check_error(response)\n", + "response['status_info']" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Use Langchain for inferencing\n", + "\n", + "In the example below we will create a chain from the previously created table and LLM context. This chain will generate SQL and return the resulting data as a dataframe.\n", + "\n", + "### Load the chat prompt from the Kinetica DB\n", + "\n", + "The `load_messages_from_context()` function will retrieve a context from the DB and convert it into a list of chat messages that we use to create a ``ChatPromptTemplate``." + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "================================\u001b[1m System Message \u001b[0m================================\n", + "\n", + "CREATE TABLE demo.test_profiles AS\n", + "(\n", + " username VARCHAR (32) NOT NULL,\n", + " name VARCHAR (32) NOT NULL,\n", + " sex VARCHAR (1) NOT NULL,\n", + " address VARCHAR (64) NOT NULL,\n", + " mail VARCHAR (32) NOT NULL,\n", + " birthdate TIMESTAMP NOT NULL\n", + ");\n", + "COMMENT ON TABLE demo.test_profiles IS 'Contains user profiles.';\n", + "\n", + "================================\u001b[1m Human Message \u001b[0m=================================\n", + "\n", + "How many male users are there?\n", + "\n", + "==================================\u001b[1m Ai Message \u001b[0m==================================\n", + "\n", + "select count(1) as num_users\n", + " from demo.test_profiles\n", + " where sex = 'M';\n", + "\n", + "================================\u001b[1m Human Message \u001b[0m=================================\n", + "\n", + "\u001b[33;1m\u001b[1;3m{input}\u001b[0m\n" + ] + } + ], + "source": [ + "from langchain_core.prompts import ChatPromptTemplate\n", + "\n", + "# load the context from the database\n", + "ctx_messages = kinetica_llm.load_messages_from_context(kinetica_ctx)\n", + "\n", + "# Add the input prompt. This is where input question will be substituted.\n", + "ctx_messages.append((\"human\", \"{input}\"))\n", + "\n", + "# Create the prompt template.\n", + "prompt_template = ChatPromptTemplate.from_messages(ctx_messages)\n", + "prompt_template.pretty_print()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create the chain\n", + "\n", + "The last element of this chain is `KineticaSqlOutputParser` that will execute the SQL and return a dataframe. This is optional and if we left it out then only SQL would be returned." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from langchain_community.chat_models.kinetica import KineticaSqlOutputParser, KineticaSqlResponse\n", + "chain = prompt_template | kinetica_llm | KineticaSqlOutputParser(kdbc=kinetica_llm.kdbc)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Generate the SQL\n", + "\n", + "The chain we created will take a question as input and return a KineticaSqlResponse containing the generated SQL and data. The question must be relevant to the to LLM context we used to create the prompt." + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "SQL: SELECT username\n", + " FROM demo.test_profiles\n", + " WHERE sex = 'F'\n", + " ORDER BY username;\n" + ] + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
username
0alexander40
1bburton
2brian12
3brownanna
4carl19
\n", + "
" + ], + "text/plain": [ + " username\n", + "0 alexander40\n", + "1 bburton\n", + "2 brian12\n", + "3 brownanna\n", + "4 carl19" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Here you must ask a question relevant to the LLM context provided in the prompt template.\n", + "response: KineticaSqlResponse = chain.invoke({\"input\": \"What are the female users ordered by username?\"})\n", + "\n", + "print(f\"SQL: {response.sql}\")\n", + "response.dataframe.head()" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "langchain", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.18" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/docs/docs/integrations/providers/kinetica.mdx b/docs/docs/integrations/providers/kinetica.mdx new file mode 100644 index 0000000000000..e9b4b532c3c99 --- /dev/null +++ b/docs/docs/integrations/providers/kinetica.mdx @@ -0,0 +1,28 @@ +# Kinetica + +[Kinetica](https://www.kinetica.com/) is a real-time database purpose built for enabling +analytics and generative AI on time-series & spatial data. + +## Chat Model + +The Kinetica LLM wrapper uses the [Kinetica SqlAssist +LLM](https://docs.kinetica.com/7.2/sql-gpt/concepts/) to transform natural language into +SQL to simplify the process of data retrieval. + +See [Kinetica SqlAssist LLM Demo](/docs/integrations/chat/kinetica) for usage. + +```python +from langchain_community.chat_models.kinetica import KineticaChatLLM +``` + +## Vector Store + +The Kinetca vectorstore wrapper leverages Kinetica's native support for [vector +similarity search](https://docs.kinetica.com/7.2/vector_search/). + +See [Kinetica Vectorsore API](/docs/integrations/vectorstores/kinetica) for usage. + +```python +from langchain_community.vectorstores import Kinetica +``` + diff --git a/docs/docs/integrations/vectorstores/kinetica.ipynb b/docs/docs/integrations/vectorstores/kinetica.ipynb index 3448588b53f7a..7859ee4c39e41 100644 --- a/docs/docs/integrations/vectorstores/kinetica.ipynb +++ b/docs/docs/integrations/vectorstores/kinetica.ipynb @@ -1,10 +1,19 @@ { "cells": [ + { + "cell_type": "raw", + "metadata": {}, + "source": [ + "---\n", + "sidebar_label: Kinetica\n", + "---" + ] + }, { "cell_type": "markdown", "metadata": {}, "source": [ - "# Kinetica\n", + "# Kinetica Vectorstore API\n", "\n", ">[Kinetica](https://www.kinetica.com/) is a database with integrated support for vector similarity search\n", "\n", diff --git a/libs/community/langchain_community/chat_models/kinetica.py b/libs/community/langchain_community/chat_models/kinetica.py index 1046fb210d14b..236bd638431c9 100644 --- a/libs/community/langchain_community/chat_models/kinetica.py +++ b/libs/community/langchain_community/chat_models/kinetica.py @@ -248,7 +248,7 @@ def create_kdbc( try: import gpudb - except ImportError: + except ModuleNotFoundError: raise ImportError( "Could not import Kinetica python package. " "Please install it with `pip install gpudb`."